Buğra Gedik's Courses
@ Bilkent University

====== Homework III ====== Assume you are building an enrichment operator. For each tuple the operator received, you need to access a database to perform a key-based lookup and enrich the tuple with the information received from the database. The database is big enough that it is not practical to cache it in memory. Also, some items received may not have any matches in the database. You want to accelerate this process as much as possible. You have two ideas: * Use some of the memory for caching the items that exist in the database * Use a Bloom filter to quickly find items that do not exist in the database Assume that: * We have ''M'' bytes of main memory * There are ''N'' items in the database * Caching an item requires ''C'' bytes of memory * ''Y%'' of the tuples received do not exist in the database * Looking an item from the database takes ''d'' time * Looking an item from the cache takes ''c'' time (''c<d'') * Looking an item from the bloom filter takes ''b'' time (''b<c'') * Assume that items that exist in the database appear in the stream following a Zipf distribution with parameter ''z'' (assume an ideal cache that keeps the most frequent items) To minimize the average cost of enrichment: * How much of the space you should reserve for the cache (''L'' bytes) and how much for the Bloom filter (''R'' bytes)? Obviously, ''L+R <= M''. * Write a simple program, in the programming language of your choice, that, given the above configuration, computes ''L'' and ''R''. === Deliverables === Tar/gz or zip the contents of your homework in to a single file. Email me (bgedik@cs.bilkent.edu.tr) the homework. Do not include binaries or jar files. Please include a write-up explaining your formulation.

User Tools

Site Tools

Buğra Gedik's Courses@ Bilkent University

Page Tools

Buğra Gedik's Courses
@ Bilkent University