User Tools

Site Tools


GE461: Introduction to Data Science - Spring 2022

Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus

Prerequisites: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)
Credits: 3

Course Management Systems: Moodle
Course Website:

Instructor Team

  • S. Aksoy, C. Alkan, S. Arashloo, F. Can, E. Çiçek, T. Çukur, S. Dayanık, H. Dibeklioğlu, A. Dündar, İ. Körpeoğlu, C. Tekin, E. Tüzün
  • Course Coordinator (contact point): S. Aksoy (saksoy AT


  • Mohsen Moradi (
  • Osama Zafar (

Classroom and Hours

  • Clasroom: B-204
  • Class hours:
    • Tue 10:30-12:20
    • Thu 15:30-17:20

Grading Policy

  • Final: 40 %
  • Projects: 60 %. Multiple computer/programming/exercise assignments of various sizes.
  • There will be 5 projects. Each project is 12 %.


  • Attendance is mandatory. A student who misses more than 9 hours will fail the course automatically.


  • The final exam will be held at EA-Z01 (for lastnames in the range ABDUL-GÖÇMEN) and EA-Z03 (for lastnames in the range GÖZÜBÜYÜK-YÜRÜTEN) during 9:00-12:00 on May 22, 2022.


  • Multiple computer/programming/exercise assignments of various sizes.
  • A project can be assigned earlier than the indicated date on the weekly plan.
  • Projects can be individual or group based. Instructors will decide.
  • Projects will be uploaded to Moodle.
  • Programming languages like Python, Java, R or Matlab can be used in the projects.
  • Gaining hands-on experience and experimenting will be important. Real world data sets can be used (economical/financial data sets, medical/biological data sets, image/video data sets, social network data sets, IT data sets, etc.).


  • Grades will be posted in SAPS.
  • There is no mandatory textbook for the course.

Week 1 (Feb 1, Feb 3)

Introduction; what is data science; data science applications. [Çiçek, Tüzün]
Topic Details: Introductory concepts in data science and applications. Overview of data science process.
Slides and Additional Material:ge_461_-_lecture_1_-_course_information_spring_2022.pdf
Topic Details: Software engineering applications.
Slides and Additional Material:ge461_lecture_2_datascienceinsoftwareengineering.pdf
Project/Exercise-Problem-Set/Homework: None this week.
Events: Classes begin (Jan 31).

Week 2 (Feb 8, Feb 10)

Data science applications; data science pipeline. [Alkan, Dibeklioğlu]
Topic Details: Genomics applications.
Slides and Additional Material:
Topic Details: Computer vision applications.
Slides and Additional Material:ge_461_-_lecture_4_-_computer_vision_applications_-_spring_2022.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015

Week 3 (Feb 15, Feb 17)

Data representation; preprocessing; preparation; crowdsourcing. [Arashloo, Çiçek]
Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).
Slides and Additional Material:preprocessing.pdf
Topic Details: Crowdsourcing applications and usage in data science.
Slides and Additional Material:ge_461_-_lecture_6_-_crowdsourcing.pdf
Project/Exercise-Problem-Set/Homework: None this week

Week 4 (Feb 22, Feb 24)

Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. [Körpeoğlu]
Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.
Slides and Additional Material: Slides
References: SQLite Pandas ApacheSpark Spark

Week 5 (Mar 1, Mar 3)

Basic models; parametric models; fitting. [S. Dayanık]
Topic Details: Exploratory data analysis, loess smoother, chi-squared test of independence, linear regression and least squares method, factors and dummy variables, all illustrated on Dodgers Advertising and Promotion case study with R, RStudio, and SQLite
Slides and Additional Material:


Week 6 (Mar 8, Mar 10)

Spring Break
Topic Details:
Events: Spring Break (March 10-13)

Week 7 (Mar 15, Mar 17)

Application to customer choice problems (conjoint analysis) [S. Dayanık]
Topic Details: Part worths, part importance, their estimations from product rankings with multiple regression
Slides and Additional Material:

Project/Exercise-Problem-Set/Homework: Dodgers Promotion Project due 19:00 on Saturday, April 9 to be submitted on Moodle page. Project details are in the dodgers.Rmd/dodgers.html files inside Week 7 course materials

Week 8 (Mar 22, Mar 24)

Conjoint analysis continued, and authorship problem [S. Dayanık]
Topic Details: New product design with market simulation to increase overall market share; who wrote the Federalists papers (identiciation of authorships by means of Bayesian classifiers, kNN)
Slides and Additional Material:


Week 9 (Mar 29, Mar 31)

Dimensionality reduction; visualization. [Aksoy]
Topic Details: Feature reduction, feature selection, high-dimensional data visualization.
Slides and Additional Material: Dimensionality slides, t-SNE slides
Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on April 21, 2022)
References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, t-SNE

Week 10 (Apr 5, Apr 7)

Unsupervised learning, clustering. [Aksoy]
Topic Details: K-means clustering, mixture models, hierarchical clustering.
Slides and Additional Material: Clustering slides
References: Matlab: cluster analysis, Scikit-learn: clustering
Events: Bilkent Day (April 3)

Week 11 (Apr 12, Apr 14)

Machine learning; supervised learning; classifiers; deep learning. [Dündar]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material: Part1, Part2

Week 12 (Apr 19, Apr 21)

Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Activation functions, convolutional neural networks, recurrent architectures.
Slides and Additional Material: ge461_deep_learning_2022s.pdf
Project/Exercise-Problem-Set/Homework: [Project Description | Data)] (due 23:55 on April 30, 2022)

Week 13 (Apr 26, Apr 28)

Machine learning in healthcare. [Çukur]
Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;
Slides and Additional Material: ge461_ml_in_healthcare.pdf
Project/Exercise-Problem-Set/Homework: (Due: 06/05/2022) ge461_pw13_description.pdf
References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5
Events: Spring Festival (Apr 29-30)

Week 14 (May 3, May 5)

Data mining; online data stream classification; applications. [Can]
Topic Details: Concept drift, ensemble-based classification, text mining.
Slides and Additional Material: DataStreamMining
Events: Feast of Ramadan holiday (May 2-4)


Week 15 (May 10, May 12)

Reinforcement learning; applications. [Tekin]
Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning
Slides and Additional Material: ge461_reinforcementlearning.pdf
Events: Last day of classes (May 13)


Similar / Complementary Courses

Tools, Libraries, Systems, Languages

start.txt · Last modified: 2022/05/18 21:07 by ge461