Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR

 

FUSOR: Space-efficient Taxonomic Classification Using Hybrid Hierarchical Interleaved Binary Fuse Filters with a Stash

 

Tuna Okçu
Master Student
(Supervisor:Assoc.Prof.Can Alkan)

Computer Engineering Department
Bilkent University

Abstract: The exponential growth of metagenomic data has transformed taxonomic classification into a primary computational bottleneck, shifting the research focus from simple processing speed to the critical need for space-efficient indexing. Current gold-standard tools often require massive memory resources, which significantly limit their deployment on portable or resource-constrained systems. Here we introduce FUSOR, a reference-based taxonomic classification method that uses our novel hybrid hierarchical interleaved binary fuse filter (HHIBFF) data structure in its index. While binary fuse filters (BFFs) are highly space-efficient, their application in an interleaved architecture has been impractical due to high construction failure rates. FUSOR overcomes this barrier by implementing a stash-based failure recovery mechanism. By providing an auxiliary space to evict failure-inducing insertions, and strictly bounding the eviction budget to a single item per filter, we make the construction of interleaved BFFs viable for the first time. Our results show that the final index we constructed is smaller than state-of-the-art tools such as Kraken2, ganon2, and Taxor, requiring less than a third of the space in certain configurations. Availability: FUSOR is available on https://github.com/BilkentCompGen/FUSOR.

 

DATE: April 06, Monday @ 15:50 Place: EA 502