IST 383 – Knowledge Discovery

 

  

Objectives of the course

Everybody seems to be doing “knowledge discovery” these days, but few people have a clear grasp of the categories of techniques available. Even fewer know how and why the techniques can be optimally used for their particular problems. In this class, we will look at the entire knowledge discovery process. As part of this process, we focus on cutting edge, interesting data mining techniques that can be used in a wide variety of settings (business, science, web). We will discuss algorithmic details, implementation issues, advantages and disadvantages, and look at many examples.

 

Upon successful completion of this class you will have a thorough understanding of the techniques used in data warehousing and data mining applications and their advantages and disadvantages. You will have experience working on data mining projects and applying the techniques. You will be able to do you own data mining projects and be successful in knowledge discovery.

 

Textbook

Recommended Book (but Optional):

Fundamentals of Natural Computing: Basic Concepts, Algorithms, and Applications (Chapman & Hall/Crc Computer and Information Sciences)

by Leandro Nunes de Castro (Author)

Publisher: Chapman & Hall/CRC; 1 edition (June 2, 2006)

ISBN-10: 1584886439

ISBN-13: 978-1584886433

 

You will be required to take notes in class. The instructor will not hand out class notes.

 

Links to open source software, interesting projects and other additional materials will be posted on the website.

 

Prerequisites

Students must have basic programming skills. The use of open source and other software is encouraged as well as helping each other solve problems. You should have access to a computer for development and project demonstrations

 

 

Grading Policy

Each student will work on a real data mining problem from start to finish. Class time will also be devoted to the projects. However, each student will be responsible for his or her project and will be graded individually.

 

Knowledge Discovery Project      50%
              Problem Choice     
  Data Preprocessing  10%  
  Algorithm Proposal            10%  
  Evaluation Proposal  10%  
  Project Presentation 5%  
  Final Project  15%  
Software Review 10%
Midterm Exam 20%
Final Exam 20%

          

Letter assignment: 90/100 = A, 80/100 = B, 70/100 = C, below 70 = U

 

Lectures and Academic Integrity

You are required to attend all lectures, including student presentations. It is your responsibility to obtain material from a fellow student if you miss a lecture. Office hours are not meant as individual lectures. Class notes will be vital to do well on exams. It will be your responsibility to study both the lecture notes and the chapters in the book.

 

Cheating during exams or any type of dishonesty will result in a failing grade for this class and will be reported to the University.

 

Although the use of open source software is allowed, it is the student’s responsibility to acknowledge this resource (with name and origin) and understand the software before using it.

 

Exams

The examinations will be closed book. Exams cannot be taken at a different time (even if the exam time differs from the one on the syllabus), unless permission to do so was requested and received at least two weeks before the exam. Failure to show up for the exam will result in a zero unless there was a documented emergency (doctor’s note, etc).

 

Programming assignments and exercises

Detailed assignments will be handed out in class. The assignments will also be posted on the class website. Assignments will be accepted late, however, the earned grade will be reduced by 20% for each day the assignment was handed in late. Assignments more than 5 days late will not be accepted. If you cannot attend class, it is still your responsibility to ensure your assignment is submitted by the deadline.

 

 Tentative course outline

 

Date

 

Topics

 

Deadlines & Other Info

(dates are subject to change)

 

 

Jan 22

 

Introduction

 

Class Overview

 

 

 

Jan 29

 

Associations Rules & A Priori Algorithm

 

Project Introduction

 

 

Assignment 1: Problem Choice

Wisconsin Advertising Project

Dataset Code Book

Additional Data from Prof. Merolla

 

 

Feb 5

 

Classification/Prediction: Decision Trees (Symbolic)

 

 

 

 

Feb 12

 

Classification/Prediction: Naïve Bayes (Statistical)

 

Discussion Project Topic

 

 

 

  

Be prepared to discuss your approach

 

Feb 19

 

Classification/Prediction: Neural Networks (Connectionist)

 

 

Assignment 1 Due: Problem Choice

Assignment 2: Software Review

 

Feb 26

 

Clustering: Classical

 

 

 

 

 

 

 

Mar 4

 

Software Review : Discussion

 

Review Midterm Exam

 

 

Assignment 2 Due: Software Review

Assignment 3: Data Preprocessing

 

Mar 11

 

 

MIDTERM EXAM

 

 

 

Mar 18

 

SPRING BREAK

 

 

 

Mar 25

 

Clustering: SOM (Connectionist)

 

Project Work Time

 

 

 

 

Be prepared to discuss your approach

 

Apr 1

 

Evaluating Results

 

Project Evaluation Discussion/Application

 

Project Work Time

 

 

Assignment Due: Data Preprocessing

Assignment 4: Algorithm Proposal

Assignment 5: Evaluation Proposal

 

Apr 8

 

Graph Search (Linear Search)

 

Project Work Time

 

 

 

Apr 15

 

Genetic Algorithm (Parallel Search)

 

Project Work Time

 

 

Assignment Due: Algorithm Choice

Assignment Due: Evaluation Choice

 

Apr 22

 

Project Work Time (or TBA)

 

 

 

 

Apr 29

 

Class Review

 

Project Work Time

 

 

Assignment 6: Project Presentation

Assignment 7: Final Project

 

May 6

 

Project Presentations

 

 

Assignment Due: Presentations

 

May 13

 

FINAL EXAM

 

 

Assignment Due: Final Project