Date of Award

12-2009

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Legacy Department

Industrial Engineering

Advisor

Kurz, Mary Beth

Committee Member

Brookover , Robert

Committee Member

Cho , Byung Rae

Committee Member

Shappell , Scott

Abstract

Social scientists and other users of large data sets often desire a model to predict the probability that some condition exists, such as the probability that a person has diabetes or that a credit card transaction will be fraudulent. In general, this can be done by data mining techniques, which allow multiple records of data composed of numerous independent variables and one dependent variable to be examined in a statistical fashion to make a predictive model. One particular technique used is logistic regression. Logistic regression forms a predictive model based on a set of independent variables by assigning coefficients to these variables to maximize a non-linear function. The state-of-the-art for creating logistic regression models requires the modeler to select independent variables and the use of an iterative search technique to solve the underlying non-linear optimization.
This dissertation investigates the use of genetic algorithms for creating a logistic regression model. The use of this optimization technique facilitates resolution of two critiques of the state-of-the-art: (1) user selection of independent variables allows bias from the user to enter into the logistic regression model; (2) the iterative optimization method used can result in sub-optimal models being accepted. The use of genetic algorithms in the place of the current optimization technique effectively addresses these concerns.
Data of increasing complexity are considered, from one to several independent variables. In response, genetic algorithms that allow for increasing flexibility are developed. Through extensive computational studies, a robust genetic algorithm that allows for selection of independent variables and setting of parameter values is developed.
The power of the developed approach is demonstrated in a case study of general aviation accident data with five hundred cases and thirteen independent variables.

Share

COinS