start

Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus

**Prerequisites**: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)

**Credits**: 3

**Course Management Systems:** Moodle

**Course Website:** http://www.cs.bilkent.edu.tr/~ge461/2024Spring

** Instructor Team**

- S. Aksoy, C. Alkan, S. Arashloo, F. Can, E. Çiçek, T. Çukur, S. Dayanık, H. Dibeklioğlu, A. Dündar, İ. Körpeoğlu, C. Tekin, E. Tüzün

- Course Coordinator (contact point): S. Aksoy (saksoy AT cs.bilkent.edu.tr)

**TAs**

- Ali Azak (ali.azak AT bilkent.edu.tr)
- Hakan Gökçesu (hgokcesu AT ee.bilkent.edu.tr)

**Classroom and Hours**

- Clasroom:
**EE-317** - Class hours:
- Mon 13:30-15:20
- Thu 08:30-10:20

**Grading Policy**

- Final: 40 %
- Projects: 60 %. Multiple computer/programming/exercise assignments of various sizes.
- There will be 5 projects.
**Each project is 12 %**.

** Attendance**

- Attendance is mandatory. A student who misses
**more than 9 hours**will fail the course automatically.

** Exam**

- The final exam will be held at EA-Z01 (for lastnames in the range AAMIR-KOŞAY) and EA-Z03 (for lastnames in the range OĞUZTÜZÜN-YÜZLÜ) during 18:00-20:00 on May 23, 2024.

** Projects**

- Multiple computer/programming/exercise assignments of various sizes.
- A project can be assigned earlier than the indicated date on the weekly plan.
- Projects can be individual or group based. Instructors will decide.
- Projects will be uploaded to Moodle.
- Programming languages like Python, Java, R or Matlab can be used in the projects.
- Gaining hands-on experience and experimenting will be important. Real world data sets can be used (economical/financial data sets, medical/biological data sets, image/video data sets, social network data sets, IT data sets, etc.).

** Other**

- Grades will be posted in SAPS.
- There is
**no mandatory textbook**for the course.

**Introduction; what is data science; data science applications.** [Çiçek, Tüzün]

Topic Details: Introductory concepts in data science and applications. Overview of data science process.

Slides and Additional Material:

Topic Details: Software engineering applications.

Slides and Additional Material:ge_461_-_lecture_1_-_course_information_compressed.pdf

Project/Exercise-Problem-Set/Homework: None this week.

References:

Events:

**Data science applications; data science pipeline.** [Alkan, Dibeklioğlu]

Topic Details: Genomics applications.

Slides and Additional Material:

Topic Details: Computer vision applications.

Slides and Additional Material: ge461_applications_vision_2024s.pdf

Project/Exercise-Problem-Set/Homework: None this week.

References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015

Events:

**Data representation; preprocessing; preparation; crowdsourcing. ** [Arashloo, Çiçek]

Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).

Slides and Additional Material:2024_pre-processing.pdf

Topic Details: Crowdsourcing applications and usage in data science.

Slides and Additional Material:ge461-crowdsourcing.pdf

Project/Exercise-Problem-Set/Homework: None this week

Events:

** Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. ** [Körpeoğlu]

Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.

Slides and Additional Material: Slides

Project/Exercise-Problem-Set/Homework:

References:
SQLite
Pandas
MapReduce
ApacheHadoop
ApacheSpark

Events:

**Basic models; parametric models; fitting. ** [S. Dayanık]

Topic Details: Exploratory data analysis, loess smoother, chi-squared test of independence, linear regression and least squares method, factors and dummy variables, all illustrated on *Dodgers Advertising and Promotion* case study with R, RStudio, and SQLite

Slides and Additional Material: Week05 Dodgers.zip

Project/Exercise-Problem-Set/Homework: **Dodgers Project**

- Include all variables and conduct a full regression analysis of the problem.
- Write a report for Dodgers management. Discuss your findings in plain English and support them with data analysis.
- Submit Rmd and html files in a single zip file to Moodle by
**19:00 Saturday on March 16. Late submissions will not be accepted.** - Groups up to three people are fine. Name the zip file with the BilkentIDs of group members.

References: Posit (former RStudio) R SQLite R for Data Science Modern Data Science with R

- Thomas W. Miller, Modeling Techniques in Predictive Analytics With Python and R: A Guide to Data Science
- Baumer, Daniel, Kaplan and Horton, Modern Data Science with R, Second Edition
- Wickham, Cetinkaya-Rundel, and Grolemund, R for Data Science, Second Edition

Events:

** Application to customer choice problems (conjoint analysis) ** [S. Dayanık]

Topic Details: Part worths, part importance, their estimations from product rankings with multiple regression, new product design with market simulation to increase overall market share.

Slides: Conjoint Analysis and Market Simulation

Project/Exercise-Problem-Set/Homework:

References:

- B. K. Orme, Getting Started With Conjoint Analysis: Strategies for Product Design and Pricing Research
- Miller, Marketing Data Science: Modeling Techniques in Predictive Analytics With R and Python

Events: Spring Break (Mar 7-8)

** Authorship problem, text analysis, and topic modeling ** [S. Dayanık]

Topic Details: Who wrote the Federalists papers (identiciation of authorships by means of Bayesian classifiers, kNN)

Slides and Additional Material:

Federalist Papers Analysis
Latent Diriclet Allocation Graphical Model

Project/Exercise-Problem-Set/Homework:

References:

- Hamilton, Jay, and Madison, The Federalist Papers
- Mosteller and Wallace, Applied Bayesian and Classical Inference

Events:

** Dimensionality reduction; visualization.** [Aksoy]

Topic Details: Feature reduction, feature selection, high-dimensional data visualization.

Slides and Additional Material: Dimensionality slides, t-SNE slides

Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on April 7, 2024)

References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, Matlab: data visualization,
Matplotlib: data visualization, t-SNE

Events:

** Unsupervised learning, clustering. ** [Aksoy]

Topic Details: K-means clustering, mixture models, hierarchical clustering.

Slides and Additional Material: Clustering slides

Project/Exercise-Problem-Set/Homework:

References: Matlab: cluster analysis, Scikit-learn: clustering, Scikit-learn: clustering examples

Events:

** Machine learning; supervised learning; classifiers; deep learning. ** [Dündar]

Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.

Slides and Additional Material: supervisedlearning_part1, supervisedlearning_part2

Project/Exercise-Problem-Set/Homework:

References:

Events: Bilkent Day (April 3)

** Ramadan Holiday **

** Machine learning; supervised learning; classifiers; deep learning.** [Dibeklioğlu]

Topic Details: Activation functions, convolutional neural networks, recurrent architectures.

Slides and Additional Material: ge461_deep_learning_2024s.pdf

Project/Exercise-Problem-Set/Homework:[Project Description | Data] (due 23:55 on April 27, 2024)

References:

Events:

** Machine learning in healthcare. ** [Çukur]

Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;

Slides and Additional Material: ge461_ml_in_healthcare.pdf

Project/Exercise-Problem-Set/Homework: ge461_pw13_description.pdf ge461_pw13_data.zip

References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5

Events: National Sovereignty and Children's Day (Apr 23)

** Data mining; online data stream classification; applications.** [Can]

Topic Details: Concept drift, ensemble-based classification, text mining.

Slides and Additional Material:

Project/Exercise-Problem-Set/Homework:

References:

Events: Labor and Solidarity Day (May 1)

** Reinforcement learning; applications. ** [Tekin]

Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning

Slides and Additional Material: ge461_reinforcementlearning.pdf

Project/Exercise-Problem-Set/Homework:

References:

Events:

start.txt · Last modified: 2024/05/21 05:04 by ge461