CS 553
Intelligent Data Anaysis

Description: Differences between data and knowledge, assessing knowledge; Data analysis process, methods, tasks and tools; Practical data analysis; Data understanding, attribute understanding, data quality, data visualization, correlation analysis, outlier detection, missing values; Principles of modeling, model classes, fitting criteria and score functions, model fitting, types of errors; Data preparation, feature selection, dimensionality reduction, record selection, improving data quality; Use of machine learning and data mining techniques in intelligent data analysis. Credit units: 3.

Semester: Fall 2012
Classroom: EA-502
Schedule: Monday 10:40 - 12:30; Thursday 9:40 - 10:30
Office Hours: Wednesday 15:40 - 17:40
Classroom: EA-418
Instructor: H. Altay Güvenir

Main Text Book:
Michael R. Berthold, Christian Borgelt, Frank Höppner, Frank Klawonn Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data, Springer-Verlag, London 2010. Link to the web page at publisher

Related Journals:
Intelligent Data Analysis: An International Journal

Related Papers:


Weekly Schedule

  1. Introduction, motivation, Difference between data and Knowledge, Assessing knowledge, Data analysis process (CRISP-DM), Methods, Tasks and Tools
  2. Practical Data Analysis, Project Understanding
  3. Data Understanding, Attribute Understanding, Data Quality, Data Visualization
  4. Correlation Analysis, Outlier Detection, Missing Values, Data Understanding
  5. Tools: KNIME, R, Weka, RapidMiner, RIMARC
  6. Principles of Modeling, Model Classes
  7. Model Classes, Types of Errors
  8. Feature selection, Dimensionality Reduction, Record selection
  9. Improving Data Quality, Missing Values
  10. Workshop
  11. Workshop
  12. Workshop
  13. Workshop
  14. Workshop

Project Proposal (Due: Oct. 11, 2012)

Project Proposal should include information about the following:

Workshop (Nov. 22, 2012 - Dec. 27, 2012)

During the workshop, the students are expected to present the initial results of their analysis. Each presentation should include information about the goal of the work, related parameters, statistics about these parameters, including characteristics, missing feature values, noise in the data. Results of the initial experiements should be presented using public domain data mining tools. Finally, the further analysis planned should be described. Each presentation is expected to last about 40 minutes. We will have in class discussion about the presentation for about 10 minutes. The workshop grade will be based on parameters such as data collection, data cleaning, techniques used, tools used, interpretation of results and timing.

Final Report (Due: Jan. 10, 2013, 5:00 PM.)

The final report summarizes your term project. Although there are no rules about the format, it should have two parts. The first part is the cover page, which includes the title of the project, your name, the name of the course and the date of submission. The scond part is the report itself. The report should be written in a way that it can be submitted to a academic journal or conference. That is, it should have sections such as abstract, introduction, conclusion and references. The main criteria for the evaluation of your report are listed below:

The final report can be emailed in the pdf or doc format.

Grading Policy: Grades

Lecture Slides:

Free Packages: