start

Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus

**Prerequisites**: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)

**Credits**: 3

**Course Management Systems:** Moodle

**Course Website:** http://www.cs.bilkent.edu.tr/~ge461/2023Spring

** Instructor Team**

- S. Aksoy, C. Alkan, S. Arashloo, F. Can, T. Çukur, S. Dayanık, H. Dibeklioğlu, A. Dündar, İ. Körpeoğlu, C. Tekin, E. Tüzün

- Course Coordinator (contact point): S. Aksoy (saksoy AT cs.bilkent.edu.tr)

**TAs**

- Hakan Gökçesu (hgokcesu AT ee.bilkent.edu.tr)
- Sayyed Ahmad Naghavi Nozad (ahmad.naghavi AT bilkent.edu.tr)

**Classroom and Hours**

- Clasroom:
**B-Z06** - Class hours:
- Mon 08:30-10:20
- Wed 13:30-15:20

**Grading Policy**

- Final: 40 %
- Projects: 60 %. Multiple computer/programming/exercise assignments of various sizes.
- There will be 5 projects.
**Each project is 12 %**.

** Attendance**

~~Attendance is mandatory. A student who misses~~**more than 9 hours**will fail the course automatically.

** Exam**

- The final exam will be held at EB-103 (for lastnames in the range AKSOY-GÜZEY) and EB-104 (for lastnames in the range HAMURCU-YILDIZ) during 18:00-21:00 on June 10, 2023.

** Projects**

- Multiple computer/programming/exercise assignments of various sizes.
- A project can be assigned earlier than the indicated date on the weekly plan.
- Projects can be individual or group based. Instructors will decide.
- Projects will be uploaded to Moodle.
- Programming languages like Python, Java, R or Matlab can be used in the projects.
- Gaining hands-on experience and experimenting will be important. Real world data sets can be used (economical/financial data sets, medical/biological data sets, image/video data sets, social network data sets, IT data sets, etc.).

** Other**

- Grades will be posted in SAPS.
- There is
**no mandatory textbook**for the course.

**Introduction; what is data science; data science applications.** [Aksoy, Tüzün]

Topic Details: Introductory concepts in data science and applications. Overview of data science process.

Slides and Additional Material: ge461_lecture1_course_information.pdf

Topic Details: Software engineering applications.

Slides and Additional Material:

Project/Exercise-Problem-Set/Homework: None this week.

References:

Events: Classes begin (Jan 30).

**Data science applications; data science pipeline.** [Alkan, Dibeklioğlu]

Topic Details: Genomics applications.

Slides and Additional Material: ge461_lectures_3_genomics_applications-spring2023.pdf

Topic Details: Computer vision applications.

Slides and Additional Material: ge461_applications_vision_2023s.pdf

Project/Exercise-Problem-Set/Homework: None this week.

References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015

Events:

**Crowdsourcing; Data representation; preprocessing; preparation;** [Arashloo]

Topic Details: Crowdsourcing applications and usage in data science.

Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).

Slides and Additional Material:crowdsourcing.pdf

Slides and Additional Material:preprocessing.pdf

Project/Exercise-Problem-Set/Homework: None this week

Events:

** Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. ** [Körpeoğlu]

Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.

Slides and Additional Material:data-storage-and-processing.pdf

Project/Exercise-Problem-Set/Homework:

References:
SQLite
Pandas
ApacheSpark
Spark

Events:

** Spring Break **

Topic Details:

Slides:

Project/Exercise-Problem-Set/Homework:

References:

Events: Spring Break (Mar 6-8)

**Basic models; parametric models; fitting. ** [S. Dayanık]

Topic Details: Exploratory data analysis, loess smoother, chi-squared test of independence

Slides and Additional Material: s2023_week06.zip

Project/Exercise-Problem-Set/Homework:

References:

Events:

** Linear regression, goodness of fit ** [S. Dayanık]

Topic Details: linear regression and least squares method, factors and dummy variables, analysis of variance

Slides and Additional Material: s2023_week07.zip

Project/Exercise-Problem-Set/Homework:

References:

Events:

** Diagnostic plot, nested and unnested model comparisons ** [S. Dayanık]

Topic Details: Hypothesis testing, confidence intervals, prediction intervals

Slides and Additional Material: s2023_week08.zip

Project: **Complete Analysis of Dodgers Advertising and Promotion Study** due 19:00 on Sunday, April 23. Details are in dodgers.html in zip file

References:

Events:

** Dimensionality reduction; visualization.** [Aksoy]

Topic Details: Feature reduction, feature selection, high-dimensional data visualization.

Slides and Additional Material: Dimensionality slides, t-SNE slides

Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on May 7, 2023)

References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, Matlab: data visualization,
Matplotlib: data visualization, t-SNE

Events: Bilkent Day (April 3)

** Midterm Weak **

** Unsupervised learning, clustering. ** [Aksoy]

Topic Details: K-means clustering, mixture models, hierarchical clustering.

Slides and Additional Material: Clustering slides

Project/Exercise-Problem-Set/Homework:

References: Matlab: cluster analysis, Scikit-learn: clustering, Scikit-learn: clustering examples

Events: Feast of Ramadan holiday (Apr 21-23), National Sovereignty and Children's Day holiday (Apr 23)

** Machine learning; supervised learning; classifiers; deep learning. ** [Dündar]

Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.

Slides and Additional Material: ge461_supervisedlearning_part1.pdf

Project/Exercise-Problem-Set/Homework:

References:

Events:

** Machine learning; supervised learning; classifiers; deep learning. ** [Dündar]

Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.

Slides and Additional Material: ge461_supervisedlearning_part2.pdf

Project/Exercise-Problem-Set/Homework:

References:

Events: Labor and Solidarity Day holiday (May 1)

** Machine learning; supervised learning; classifiers; deep learning.** [Dibeklioğlu]

Topic Details: Activation functions, convolutional neural networks, recurrent architectures.

Slides and Additional Material: ge461_deep_learning_2023s.pdf

Project/Exercise-Problem-Set/Homework:[Project Description | Data] (due 23:55 on May 22, 2023)

References:

Events:

** Machine learning in healthcare. ** [Çukur]

Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;

Slides and Additional Material: ge461_ml_in_healthcare.pdf

Project: ge461_pw13_description.pdf ge461_pw13_data.zip

References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5

Events:

** Data mining; online data stream classification; applications.** [Can]

Topic Details: Concept drift, ensemble-based classification, text mining.

Slides and Additional Material: ge461_datastreamminingspring23.pdf ge461_datastreamhwspringver2_2023.pdf

Project/Exercise-Problem-Set/Homework: ge461_datastreamhwspringver1_2023_2.pdf

References:

Events:

** Reinforcement learning; applications. ** [Tekin]

Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning, Multi-armed bandits

Slides and Additional Material: https://www.dropbox.com/s/65h9melvnvuml2x/ge461_reinforcementlearning.pdf?dl=0

Project/Exercise-Problem-Set/Homework:

References:

Events:

start.txt · Last modified: 2023/06/10 07:20 by ge461