Date of Award
Master of Science (MS)
Dr. Long Cheng
Dr. Yingjie Lao
Dr. Federico Iuricich
Through artificial intelligence, algorithms can classify arrays of data, such as images or videos, into a predefined set of categories. With enough labeled data, a classifier can analyze an input’s components and calculate confidence scores for each category. However, machine learning relies heavily on approximation, which allows attackers to exploit classifiers by providing adversarial
examples. Specifically, attackers can modify their input so that the victim classifier cannot correctly label it, while a human observer would be unable to notice the difference.
This thesis proposes Gaslight, a system that uses deep reinforcement learning to generate adversarial examples against a victim classifier. Gaslight is a “black-box” and “hard-label” attacker, which means that it receives no information from the victim except the input shape, input range, and top-1 label output. Gaslight learns to attack the victim by modifying randomly generated inputs, rewarding the agent’s effectiveness for successful misclassifications and reduced distortions. After several iterations, Gaslight improves its ability to generate optimal perturbations for any
test input it is given. Once Gaslight finishes training an agent, it can be used on any input with the correct shape and range. Experiments on the CIFAR10 and ImageNet datasets have shown that Gaslight can successfully perturb inputs with a single query at a high success rate, improving upon existing methods that can take hundreds or even thousands of queries for a mislabel. When compared against other state-of-the-art hard-label attacks, Gaslight was able to achieve similar ℓ2 and ℓ∞ norms with 90% fewer queries. Gaslight’s code can be found at https://github.com/RajatSethi2001/Gaslight.
Sethi, Rajat, "Gaslight: Attacking Hard-Label Black-Box Classifiers via Deep Reinforcement Learning" (2023). All Theses. 4012.