CS 550: Machine Learning
Fall '20

Course Description

Instructor:Cigdem Gunduz Demir
EA 423 (Engineering Building), x3443
gunduz at cs bilkent edu tr
Lectures: Mon 15:30-17:20 (Zoom), Thu 8:00-9:00 (Zoom), online only
Mon 17:30-19:20 (Zoom), Thu 8:30-9:20 (EB 103), hybrid mode
Office hours:Through Zoom, by appointment
Website: http://www.cs.bilkent.edu.tr/~gunduz/teaching/cs550
References: R.O. Duda, P.E. Hart, D.G, Stork, Pattern Classification, Wiley-Interscience, 2001.
E. Alpaydin, Introduction to Machine Learning, MIT Press, 2004.
T.M. Mitchell, Machine Learning, McGraw-Hill, 1997.
P.-N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison-Wesley, 2005.

This course has two parts. The first part includes an introduction to the basic machine learning concepts and algorithms, which will also provide the basis for the second part of the course. The second part covers selected recent topics in machine learning. In particular, the course will cover the following main topics:

Part 1:
  • Introduction
  • CS550_Introduction.pdf
    CS550_BayesianDecisionTheory.pdf
    CS550_AlgorithmIndependentIssues.pdf
    CS550_DimensionalityReduction.pdf
  • Decision trees
  • CS550_DecisionTrees.pdf
  • Artificial neural networks
  • CS550_NeuralNetworks.pdf
  • Unsupervised learning and clustering
  • CS550_Clustering.pdf
  • Reinforcement learning
  • CS550_ReinforcementLearning.pdf
  • Genetic algorithms
  • CS550_GeneticAlgorithms.pdf
    Part 2:
  • Ensemble learning
  • CS550_EnsembleLearning.pdf
  • Cost-sensitive learning
  • CS550_CostSensitiveLearning.pdf
  • Active learning
  • Deep learning
  • CS550_DeepNeuralNetworks.pdf

    Grading

    Homework (30%)
    Midterm (35%)
    Survey (8%)
    Presentation (12%)
    Project (15%)

    Due to the YOK (Higher Education Council) regulations, I am taking attendance and will report it to the Department at the end of the semester. However, you may attend classes through Zoom.

    Homework assignments and late policy

    Homework assignments will be posted on this web site. Assignments will have some programming and non-programming parts and you are expected to work individually for the assignments. For the late assignments, 10 percent of the grade will be deducted per day after the assignment's due date.

  • Homework 1, due by 23:55 on Monday, November 23rd.
       You can download the dataset from here: ann-train.data ann-test.data ann-thyroid.cost
  • Homework 2, due by 23:55 on Monday, December 7th.
       You can download the datasets from here: train1 test1 train2 test2
  • Homework 3, due by 23:55 on Monday, December 21st.
       You can download the sample image from here: sample.jpg
  • Midterm

    The midterm will be given in the week of November 7-15th. It will be centrally scheduled by the University. You may use one A4 cheat sheet. You should prepare this cheat sheet by your handwriting. No photocopy is allowed.

    Survey

    You will work in a group of two. (If there is an odd number of students, one group will be of three.) Each group will prepare a survey on the topic of their interest by reading at least 15-20 scientific papers and writing a short report (maximum of 3 pages including citations).

    In your report, give the problem/topic definition, discuss the motivation behind the studies working on this problem/topic (just try to answer the question of "why have all these studies worked on this problem? is it really important?"), and then explain the studies. While explaining the studies, do NOT list the studies and do NOT explain them one by one. Instead, understand the contribution and methodology of each study, try to group the studies according to their contributions and methodologies, and then explain/discuss the studies as groups (like writing a good introduction section to a scientific paper). In your discussion, do not forget to give the common approach followed by each group also discussing the variations that exist within the studies of that group, give the advantages and disadvantages of each group's approach, and discuss the similarities and differences in between the approaches followed by different groups. The quality of the survey as well as those of the selected papers will affect your grade (select good papers published in prestigious conferences and journals). Addionally, the format, structure, and writing style of your report (including writing the citations properly) will be a part of your grade.

    Although there is no restriction for the topic that you will select (of course as long as it is related with the course contents), you should take my consent for your topic selection. Since you will make a presentation at the end and since we want to minimize overlaps in between these presentations, I will not allow two groups selecting a very similar topic; I will approve the selections on a first-come-first-serve basis. Examples of the topics include but are not limited to
  • Deep learning for medical image segmentation (or for something else)
  • Deep learning in robotics (or in something else)
  • Machine learning for telecommunication networks (or for something else)
  • Machine learning in finance (or in something else)
  • Machine learning for computer security (or for something else)
  • Active learning for remote sensing (or for something else)
  • Ensemble methods in text retrieval (or in something else)
  • Reinforcement learning for computer games (or for something else)
  • ...
  • Presentation

    Each group will make a presentation on their surveys in class. Every group member should take a part in presentation. The presentations should be in parallel with your report. You will have approximately 10-12 minutes for your presentation; we will have a discussion period of 5 minutes after the presentation. I will let you know the exact duration after the add-drop period.

    The presentation content, its format and layout, and the way that you present it will affect your grade. The interest that your presentation attracts from the audience will also affect your grade.

    Prepare your slides neatly and properly. It should contain at most 12 slides with reasonable content (only present the most important and interesting parts). Do not copy and paste any text/equation/table from a paper (if necessary, type them). If you need to use a figure (or an image) of a paper, take it but give a credit to this paper (so that we can understand how much afford you put in preparing your presentation).

    We will use online-only lecture hours (in the last part of the semester) for presentations. Thus, you will make your presentations through Zoom. We will have regular lectures in hybrid-mode lecture hours.

    Project

    You will also work in a group of two, with the same group-mate. You will have three options for the term project:

  • Choose one of the papers that your group will select for your presentation. Then implement the algorithm proposed by this paper and also implement one of the comparison algorithms used by this paper. Do not use any codes provided by the authors of the paper, if they are available. Run these two algorithms on the dataset you will select and compare their results, also using statistical tests. Additionally, follow a proper way of selecting the algorithms' parameters and also conduct parameter analysis. I expect you to select a recent paper that explains a not-so-straightforward algorithm.
  • Run a deep learning model for the dataset you will select. Here you may use the third-party codes, but you CANNOT select any dataset that was used to pretrain any of the deep learning models (e.g., you cannot use the ImageNet dataset to conduct your experiments). In this option, you are expected to get the model trained for your dataset and obtain reasonable test set accuracies. Additionally, explore the effects of different parameters in a deep learning model. Then, select two different models (deep neural networks) that you will have explored and compare their results, also using statistical tests. I expect you to select a not-so-easy dataset.
  • If you have a specific term project that you want to work on, please let me know. We need to talk the details.
  • Here I expect you to select a paper (for the first option) and a dataset (for the second one) by yourselves. The quality/difficulty of your selection will affect your grade. Of course, if you want to consult me on your selection, I will always give you a feedback.

    At the end, as a group, you will write a report (maximum of 5 pages). Give the details of the methodology you will follow and present your experimental results. The content of your report as well as its format, structure, and writing style will affect your grade. Similarly, do not copy and paste any text/equation/table from a paper (if necessary, type them). If you need to use a figure (or an image) of a paper, take it but give a credit to this paper.

    Deadlines for the survey/presentation/project

    You will lose points if you miss these deadlines.

  • Oct 5:
  • Your group preference, if any
  • Oct 19:
  • Topic selection for the survey (as a group)
  • Oct 26:
  • Term project selection (as a group)
  • Nov 7-15:
  • Midterm
  • Nov 19:
  • Final report for the survey
  • Nov 23 - Dec 21:
  • Presentations (only online-only lecture hours)
  • Jan 3:
  • Final report for the project

    Academic integrity

    This course follows the Bilkent University Code of Academic Integrity, as explained in the Student Disciplinary Rules and Regulation. Violations of the rules will not be tolerated.