start

Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus

**Prerequisites**: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)

**Credits**: 3

**Course Management Systems:** Moodle

**Course Website:** http://www.cs.bilkent.edu.tr/~ge461/2022Spring

** Instructor Team**

- S. Aksoy, C. Alkan, S. Arashloo, F. Can, E. Çiçek, T. Çukur, S. Dayanık, H. Dibeklioğlu, A. Dündar, İ. Körpeoğlu, C. Tekin, E. Tüzün

- Course Coordinator (contact point): S. Aksoy (saksoy AT cs.bilkent.edu.tr)

**TAs**

- Mohsen Moradi (moradi@ee.bilkent.edu.tr)
- Osama Zafar (osama.zafar@bilkent.edu.tr)

**Classroom and Hours**

- Clasroom:
**B-204** - Class hours:
- Tue 10:30-12:20
- Thu 15:30-17:20

**Grading Policy**

- Final: 40 %
- Projects: 60 %. Multiple computer/programming/exercise assignments of various sizes.
- There will be 5 projects.
**Each project is 12 %**.

** Attendance**

- Attendance is mandatory. A student who misses
**more than 9 hours**will fail the course automatically.

** Exam**

- The final exam will be held at EA-Z01 (for lastnames in the range ABDUL-GÖÇMEN) and EA-Z03 (for lastnames in the range GÖZÜBÜYÜK-YÜRÜTEN) during 9:00-12:00 on May 22, 2022.

** Projects**

- Multiple computer/programming/exercise assignments of various sizes.
- A project can be assigned earlier than the indicated date on the weekly plan.
- Projects can be individual or group based. Instructors will decide.
- Projects will be uploaded to Moodle.
- Programming languages like Python, Java, R or Matlab can be used in the projects.
- Gaining hands-on experience and experimenting will be important. Real world data sets can be used (economical/financial data sets, medical/biological data sets, image/video data sets, social network data sets, IT data sets, etc.).

** Other**

- Grades will be posted in SAPS.
- There is
**no mandatory textbook**for the course.

**Introduction; what is data science; data science applications.** [Çiçek, Tüzün]

Topic Details: Introductory concepts in data science and applications. Overview of data science process.

Slides and Additional Material:ge_461_-_lecture_1_-_course_information_spring_2022.pdf

Topic Details: Software engineering applications.

Slides and Additional Material:ge461_lecture_2_datascienceinsoftwareengineering.pdf

Project/Exercise-Problem-Set/Homework: None this week.

References:

Events: Classes begin (Jan 31).

**Data science applications; data science pipeline.** [Alkan, Dibeklioğlu]

Topic Details: Genomics applications.

Slides and Additional Material:

Topic Details: Computer vision applications.

Slides and Additional Material:ge_461_-_lecture_4_-_computer_vision_applications_-_spring_2022.pdf

Project/Exercise-Problem-Set/Homework: None this week.

References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015

Events:

**Data representation; preprocessing; preparation; crowdsourcing. ** [Arashloo, Çiçek]

Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).

Slides and Additional Material:preprocessing.pdf

Topic Details: Crowdsourcing applications and usage in data science.

Slides and Additional Material:ge_461_-_lecture_6_-_crowdsourcing.pdf

Project/Exercise-Problem-Set/Homework: None this week

Events:

** Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. ** [Körpeoğlu]

Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.

Slides and Additional Material: Slides

Project/Exercise-Problem-Set/Homework:

References:
SQLite
Pandas
ApacheSpark
Spark

Events:

**Basic models; parametric models; fitting. ** [S. Dayanık]

Topic Details: Exploratory data analysis, loess smoother, chi-squared test of independence, linear regression and least squares method, factors and dummy variables, all illustrated on *Dodgers Advertising and Promotion* case study with R, RStudio, and SQLite

Slides and Additional Material:

Project/Exercise-Problem-Set/Homework:

References:

Events:

** Spring Break **

Topic Details:

Slides:

Project/Exercise-Problem-Set/Homework:

References:

Events: Spring Break (March 10-13)

** Application to customer choice problems (conjoint analysis) ** [S. Dayanık]

Topic Details: Part worths, part importance, their estimations from product rankings with multiple regression

Slides and Additional Material:

Project/Exercise-Problem-Set/Homework:** Dodgers Promotion Project** due **19:00 on Saturday, April 9** to be submitted on Moodle page. Project details are in the dodgers.Rmd/dodgers.html files inside Week 7 course materials

References:

Events:

** Conjoint analysis continued, and authorship problem ** [S. Dayanık]

Topic Details: New product design with market simulation to increase overall market share; who wrote the Federalists papers (identiciation of authorships by means of Bayesian classifiers, kNN)

Slides and Additional Material:

Project/Exercise-Problem-Set/Homework:

References:

Events:

** Dimensionality reduction; visualization.** [Aksoy]

Topic Details: Feature reduction, feature selection, high-dimensional data visualization.

Slides and Additional Material: Dimensionality slides, t-SNE slides

Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on April 21, 2022)

References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, t-SNE

Events:

** Unsupervised learning, clustering. ** [Aksoy]

Topic Details: K-means clustering, mixture models, hierarchical clustering.

Slides and Additional Material: Clustering slides

Project/Exercise-Problem-Set/Homework:

References: Matlab: cluster analysis, Scikit-learn: clustering

Events: Bilkent Day (April 3)

** Machine learning; supervised learning; classifiers; deep learning. ** [Dündar]

Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.

Slides and Additional Material: Part1, Part2

Project/Exercise-Problem-Set/Homework:

References:

Events:

** Machine learning; supervised learning; classifiers; deep learning.** [Dibeklioğlu]

Topic Details: Activation functions, convolutional neural networks, recurrent architectures.

Slides and Additional Material: ge461_deep_learning_2022s.pdf

Project/Exercise-Problem-Set/Homework: [Project Description | Data)] (due 23:55 on April 30, 2022)

References:

Events:

** Machine learning in healthcare. ** [Çukur]

Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;

Slides and Additional Material: ge461_ml_in_healthcare.pdf

Project/Exercise-Problem-Set/Homework: (Due: 06/05/2022) ge461_pw13_description.pdf ge461_pw13_data.zip

References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5

Events: Spring Festival (Apr 29-30)

** Data mining; online data stream classification; applications.** [Can]

Topic Details: Concept drift, ensemble-based classification, text mining.

Slides and Additional Material: DataStreamMining

Project/Exercise-Problem-Set/Homework:

References:

Events: Feast of Ramadan holiday (May 2-4)

** Reinforcement learning; applications. ** [Tekin]

Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning

Slides and Additional Material: ge461_reinforcementlearning.pdf

Project/Exercise-Problem-Set/Homework:

References:

Events: Last day of classes (May 13)

start.txt · Last modified: 2022/06/07 14:44 by ge461