Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR
Explanation-Guided Membership Inference Attacks in Label-Only Settings
Omar Hamdache
Master Student
(Supervisor: Asst.Prof. Sinem Sav)
Computer Engineering Department
Bilkent University
Abstract: Membership Inference Attacks (MIAs) aim to reveal whether a data sample was in the training data of a machine learning (ML) model. These attacks pose a significant privacy risk to ML models. Many attacks require access to the confidence vectors outputted by the models, which cannot be accessed in typical Machine Learning as a Service (MLaaS) where the service providers only return the labels. Label-Only MIAs exist, but they struggle with unstable results and high query volume to understand the membership of a sample. This work bridges this gap by introducing explanation-guided MIA where the attacker has access to attribution-based explanations of the classified samples. Major MLaaS providers (like AWS, Google, and Azure) offer services that provide explanations along with the model outputs. Our method uses the provided explanation to guide a label-only boundary distance attack on tabular data classifiers. We propose a method to extract the top-k most salient features from the feature-attribution map. This creates a feature mask that is applied to the HopSkipJump attack, the core building block of the boundary distance attack, guiding it to operate on the features determined to be the most important by the explanation tool. This "focused search" approach reduces noise from irrelevant features, leading to a more stable and efficient estimation of the boundary distance to infer model membership. Our work demonstrates that explanations, even when provided in a strict label-only setting, can be exploited to significantly enhance the efficacy of membership inference.
DATE: November 10, Monday @ 16:10 Place: EA 502