User Tools

Site Tools


start

GE461: Introduction to Data Science - Spring 2023

Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus

Prerequisites: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)
Credits: 3

Course Management Systems: Moodle
Course Website: http://www.cs.bilkent.edu.tr/~ge461/2023Spring

Instructor Team

  • S. Aksoy, C. Alkan, S. Arashloo, F. Can, T. Çukur, S. Dayanık, H. Dibeklioğlu, A. Dündar, İ. Körpeoğlu, C. Tekin, E. Tüzün
  • Course Coordinator (contact point): S. Aksoy (saksoy AT cs.bilkent.edu.tr)

TAs

  • Hakan Gökçesu (hgokcesu AT ee.bilkent.edu.tr)
  • Sayyed Ahmad Naghavi Nozad (ahmad.naghavi AT bilkent.edu.tr)

Classroom and Hours

  • Clasroom: B-Z06
  • Class hours:
    • Mon 08:30-10:20
    • Wed 13:30-15:20

Grading Policy

  • Final: 40 %
  • Projects: 60 %. Multiple computer/programming/exercise assignments of various sizes.
  • There will be 5 projects. Each project is 12 %.

Attendance

  • Attendance is mandatory. A student who misses more than 9 hours will fail the course automatically.

Exam

  • The final exam will be held at EB-103 (for lastnames in the range AKSOY-GÜZEY) and EB-104 (for lastnames in the range HAMURCU-YILDIZ) during 18:00-21:00 on June 10, 2023.

Projects

  • Multiple computer/programming/exercise assignments of various sizes.
  • A project can be assigned earlier than the indicated date on the weekly plan.
  • Projects can be individual or group based. Instructors will decide.
  • Projects will be uploaded to Moodle.
  • Programming languages like Python, Java, R or Matlab can be used in the projects.
  • Gaining hands-on experience and experimenting will be important. Real world data sets can be used (economical/financial data sets, medical/biological data sets, image/video data sets, social network data sets, IT data sets, etc.).

Other

  • Grades will be posted in SAPS.
  • There is no mandatory textbook for the course.

Week 1 (Jan 30, Feb 1)

Introduction; what is data science; data science applications. [Aksoy, Tüzün]
Topic Details: Introductory concepts in data science and applications. Overview of data science process.
Slides and Additional Material: ge461_lecture1_course_information.pdf
Topic Details: Software engineering applications.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework: None this week.
References:
Events: Classes begin (Jan 30).

Week 2 (Feb 6, Feb 8)

Data science applications; data science pipeline. [Alkan, Dibeklioğlu]
Topic Details: Genomics applications.
Slides and Additional Material: ge461_lectures_3_genomics_applications-spring2023.pdf
Topic Details: Computer vision applications.
Slides and Additional Material: ge461_applications_vision_2023s.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015
Events:

Week 3 (Feb 20, Feb 22)

Crowdsourcing; Data representation; preprocessing; preparation; [Arashloo]
Topic Details: Crowdsourcing applications and usage in data science.
Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).
Slides and Additional Material:crowdsourcing.pdf
Slides and Additional Material:preprocessing.pdf
Project/Exercise-Problem-Set/Homework: None this week
Events:

Week 4 (Feb 27, Mar 1)

Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. [Körpeoğlu]
Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.
Slides and Additional Material:data-storage-and-processing.pdf
Project/Exercise-Problem-Set/Homework:
References: SQLite Pandas ApacheSpark Spark
Events:

Week 5 (Mar 6, Mar 8)

Spring Break
Topic Details:
Slides:
Project/Exercise-Problem-Set/Homework:
References:
Events: Spring Break (Mar 6-8)

Week 6 (Mar 13, Mar 15)

Basic models; parametric models; fitting. [S. Dayanık]
Topic Details: Exploratory data analysis, loess smoother, chi-squared test of independence
Slides and Additional Material: s2023_week06.zip
Project/Exercise-Problem-Set/Homework:
References:
Events:

Week 7 (Mar 20, Mar 22)

Linear regression, goodness of fit [S. Dayanık]
Topic Details: linear regression and least squares method, factors and dummy variables, analysis of variance
Slides and Additional Material: s2023_week07.zip
Project/Exercise-Problem-Set/Homework:
References:
Events:

Week 8 (Mar 27, Mar 29)

Diagnostic plot, nested and unnested model comparisons [S. Dayanık]
Topic Details: Hypothesis testing, confidence intervals, prediction intervals
Slides and Additional Material: s2023_week08.zip
Project: Complete Analysis of Dodgers Advertising and Promotion Study due 19:00 on Sunday, April 23. Details are in dodgers.html in zip file
References:
Events:

Week 9 (Apr 3, Apr 5)

Dimensionality reduction; visualization. [Aksoy]
Topic Details: Feature reduction, feature selection, high-dimensional data visualization.
Slides and Additional Material: Dimensionality slides, t-SNE slides
Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on May 7, 2023)
References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, Matlab: data visualization, Matplotlib: data visualization, t-SNE
Events: Bilkent Day (April 3)

Week 10 (Apr 10, Apr 12)

Midterm Weak

Week 11 (Apr 17, Apr 19)

Unsupervised learning, clustering. [Aksoy]
Topic Details: K-means clustering, mixture models, hierarchical clustering.
Slides and Additional Material: Clustering slides
Project/Exercise-Problem-Set/Homework:
References: Matlab: cluster analysis, Scikit-learn: clustering, Scikit-learn: clustering examples
Events: Feast of Ramadan holiday (Apr 21-23), National Sovereignty and Children's Day holiday (Apr 23)

{{ :ge461_supervisedlearning_part2.pdf |}}

Week 12 (Apr 24, Apr 26)

Machine learning; supervised learning; classifiers; deep learning. [Dündar]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material: ge461_supervisedlearning_part1.pdf
Project/Exercise-Problem-Set/Homework:
References:
Events:

Week 13 (May 1, May 3)

Machine learning; supervised learning; classifiers; deep learning. [Dündar]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material: ge461_supervisedlearning_part2.pdf
Project/Exercise-Problem-Set/Homework:
References:
Events: Labor and Solidarity Day holiday (May 1)

Week 14 (May 8, May 10)

Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Activation functions, convolutional neural networks, recurrent architectures.
Slides and Additional Material: ge461_deep_learning_2023s.pdf
Project/Exercise-Problem-Set/Homework:[Project Description | Data] (due 23:55 on May 22, 2023)
References:
Events:

Week 15 (May 15, May 17)

Machine learning in healthcare. [Çukur]
Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;
Slides and Additional Material: ge461_ml_in_healthcare.pdf
Project: ge461_pw13_description.pdf ge461_pw13_data.zip
References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5
Events:

Week 16 (May 22, May 24)

Data mining; online data stream classification; applications. [Can]
Topic Details: Concept drift, ensemble-based classification, text mining.
Slides and Additional Material: ge461_datastreamminingspring23.pdf ge461_datastreamhwspringver2_2023.pdf
Project/Exercise-Problem-Set/Homework: ge461_datastreamhwspringver1_2023_2.pdf
References:
Events:

Week 17 (May 29, May 31)

Reinforcement learning; applications. [Tekin]
Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning, Multi-armed bandits
Slides and Additional Material: https://www.dropbox.com/s/65h9melvnvuml2x/ge461_reinforcementlearning.pdf?dl=0
Project/Exercise-Problem-Set/Homework:
References:
Events:


Textbooks

Similar / Complementary Courses

Tools, Libraries, Systems, Languages

start.txt · Last modified: 2023/06/10 07:20 by ge461