Bilkent University*
COMPUTER ENGINEERING DEPARTMENT

CS533: Information Retrieval Systems
Fall 2019
Tusday 10:40, 11:40; Friday 8:40 (Spare Hour), 9:40; EB502


INSTRUCTOR : Dr. FAZLI CAN
Office : EA511 (Muhendislik Fakultesi Binasi), e-mail: canf@cs.bilkent.edu.tr
Office Hours (Fall 2019): Tuesday 14:40-15:30, Friday 10:40-11:30, or by appointment.


COURSE OBJECTIVES
The main objective of this course is to learn the important concepts, algorithms, and data/file structures that are necessary to design, and implement Information Retrieval (IR) systems.


TENTATIVE COURSE SCHEDULE
IR Systems Overview, System Evaluation, Clustering and Cluster Validation, Automatic Indexing and Term Weighting; Fundamental File Structures: Inverted File, Signature Files, Query Processing, Data Fusion, n-gram-based Files, PAT trees, Data Stream Processing, New Event Detection and Tracking, Maximal Marginal Relevance, Information Filtering, Efficiency and Scalability Issues and other topics based on student projects.


PREREQUISITE
CS353 or consent of the instructor.


TEXTBOOK & OTHER READING MATERIAL
No particular textbook. A reading list will be provided. Some good resources are provided in the following.
TREC 6 Appendix A: Evaluation
The Art of Doing Science and Engineering: Learning to Learn, by Richard W. Hamming
Algorithms for Clustering Data, by A. K. Jain, R. C. Dubes (~36MB)
Information Retrieval, by K. van Rijsbergen
Information Retrieval: Data Structures & Algorithms, edited by W. B. Frakes, R. Baeza-Yates
Search Engines: Information Retrieval in Practice, by Donald Metzler, Trevor Strohman, and W. Bruce Croft
Modern Information Retrieval, by R. Baeza-Yates, B. Riberio-Neto (teaching material)
Introduction to Information Retrieval, by C. D. Manning, P. Raghavan, H. Schütze
Index structures for selective dissemination of information under the Boolean model,
by T. W. Yan, T. W., H. Garcia-Molina
Signature files: an integrated access method for formatted and unformatted databases, by D. Aktug, F. Can
Comparing inverted files and signature files for searching a large lexicon,
B. Carterette, F. Can
Partitioned signature files: design issues and performance evaluation,
D. L. Lee, C-W Leng
The anatomy of a large-scale hypertextual Web search engine,
S., Brin, L. Page
First large-scale information retrieval experiments on Turkish texts, F. Can, S. Kocberber, E. Balcik, C. Kaynak, H. Cagdas Ocalan, Onur M. Vursavas
Term weighting approaches in automatic text retrieval,
G. Salton, C. Buckley
Incremental clustering for dynamic information processing
, F. Can
Approximating block accesses in database organizations
, S. B. Yao
Data clustering: A review
, A. K. Jain, M. N. Murty, P. J. Flynn
Concepts and the effectiveness of the cover coefficient-based clustering methodology
, F. Can, E. A. Ozkarahan
An evaluation of retrieval effectiveness for a full-text document-retrieval system
, D. C. Blair, M. E. Maron
Another look at automatic text-retrieval systems, G. Salton
Inverted files for text search engines
, J. Zobel, A. Moffat (summary, by L. Koc)
Web page classification: features and algorithms, X. Qi, B. D. Davison
Creative Thinking, Claude E. Shannon

For ACM resources you may need to create a VPN account


ASSIGNMENTS & OTHER COURSE MATERIAL
No particular textbook. A reading list will be provided.


IMPORTANT DATES (202)      
Last day of classes                               : December 31, Tuesday
Final exams                                          : January 2-January 15


EXAM DATES (2019)
Midterm Exam                    :  November 12, 2019; Tuesday Class Time
Final (comprehensive)         : January 2, Thursday, 2020, 15:30, Location: EB101


GRADING POLICY
Midterm Exam                                         : 20%
Final exam (comprehensive)                      : 30%
Project & Assignments                             : 40%
Attendance & Participation                       : 10%
-------------------------                            ------
Total                                                        100 %

Letter grades will be determined according to the following table (if needed grades will be curved).
90 - 100 %: A
80 - 89 %: B
70 - 79 %: C
60 - 69 %: D
0 - 59 %: F


GENERAL POLICIES

  1. You are expected to do your homework assignments alone. Group working will be considered as cheating. You may discuss your ideas and approaches, but do not walk the line.  Group projects will be specified explicitly. 
  2. Your programs will be graded according to their correctness, algorithm design, readability, and neatness of presentation.
  3. Your assignments must be turned in on the due dates.  No late homework assignment will be accepted.  No make-up/extension can be given for excuses with no proof and no prior notification.
  4. Homework problems may be graded selectively (like 1 or 2 problems out of 5, however you have to solve all of them). The weights of individual assignments may vary.
  5. If you need to supply written documentation with your assign­ments provide a neat presentation using a word-processor.  This is a rule and exceptions will be specified explicitly.
  6. If individual review is needed due to a question on the grade (including exams) this must be no later than one week after receiving your assignment or exam.  This time limit is for consistency in grading.
  7. Attendance is mandatory.  If you miss a class it is your responsibility to catch up in terms of course material and announcements made in the class.  For each missed class 2% of your grade may be deducted.  You may miss two classes without a penalty.

 


ANNOUNCEMENTS

Date of last update: December 30, 2019, 4:50 pm.

Send comments to the author:

* The announcements section may change every day throughout the semester. Due to honest mistakes there can be some errors on this page and I keep the right of making corrections on it without a notice.