Date of Award

8-2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

School of Computing

Committee Chair/Advisor

Dr. Brian Dean

Committee Member

Dr. Christopher McMahan

Committee Member

Dr. Alexander Feltus

Committee Member

Dr. Alexander Alekseyenko

Committee Member

Dr. Nina Hubig

Abstract

Most modern machine learning algorithms tend to focus on an "average-case" approach, where every data point contributes the same amount of influence towards calculating the fit of a model. This "per-data point" error (or loss) is averaged together into an overall loss and typically minimized with an objective function. However, this can be insensitive to valuable outliers. Inspired by game theory, the goal of this work is to explore the utility of incorporating an optimally-playing adversary into feature selection and regression frameworks. The adversary assigns weights to the data elements so as to degrade the modeler's performance in an optimal manner, thereby forcing the modeler to construct a more robust solution. A tuning parameter enables "tempering" of the power wielded by the adversary, allowing us to explore the spectrum between average case and worst case. By formulating our method as a linear program, it can be solved efficiently, and can accommodate sub-population constraints, a feature that other related methods cannot easily implement. We feel that the need to generate models while understanding the influence of sub-population constraints should be particularly prominent in biomedical literature, and though our method was developed in response to the ubiquity of sub-population data and outliers that exist in this realm, our method is generic and can be applied to data sets that are not exclusively biomedical in nature. We additionally explore the implementation of our method as an adversarial regression problem. Here, instead of providing the user with a fitting of parameters for the model, we provide the user with an ensemble of parameters which can be tuned based on sensitivity to outliers and various sub-population constraints. Finally, to help foster a better understanding of various data sets, we will discuss potential automated applications of our method which will enable data scientists to explore underlying relationships and sensitivities that may be a consequence of sub-populations and meaningful outliers.

Included in

Data Science Commons

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.