Assume you are building an enrichment operator. For each tuple the operator received, you need to access a database to perform a key-based lookup and enrich the tuple with the information received from the database. The database is big enough that it is not practical to cache it in memory. Also, some items received may not have any matches in the database. You want to accelerate this process as much as possible. You have two ideas:
Assume that:
M bytes of main memoryN items in the databaseC bytes of memoryY% of the tuples received do not exist in the databased timec time (c<d)b time (b<c)z (assume an ideal cache that keeps the most frequent items)To minimize the average cost of enrichment:
L bytes) and how much for the Bloom filter (R bytes)? Obviously, L+R <= M.L and R.Tar/gz or zip the contents of your homework in to a single file. Email me (bgedik@cs.bilkent.edu.tr) the homework. Do not include binaries or jar files. Please include a write-up explaining your formulation.