Date of Award

8-2009

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Legacy Department

Computer Engineering

Advisor

Brooks, Richard R

Committee Member

Griffin , Christopher

Committee Member

Hoover , Adam

Committee Member

Post , Christopher

Abstract

To analyze real-world events, researchers collect observation data from an underlying process and construct models to represent the observed situation. In this work, we consider issues that affect the construction and usage of a specific type of model. Markov models are commonly used because their combination of discrete states and stochastic transitions is suited to applications with both deterministic and stochastic components. Hidden Markov Models (HMMs) are a class of Markov model commonly used in pattern recognition. We first demonstrate how to construct HMMs using only the observation data, and no a priori information, by extending a previously developed approach from J.P. Crutchfield and C.R. Shalizi. We also show how to determine with a level of statistical confidence whether or not the model fully encapsulates the underlying process.
Once models are constructed from observation data, the models are used to identify other types of observations. Traditional approaches consider the maximum likelihood that the model matches the observation, solving a classification problem. We present a new method using confidence intervals and receiver operating characteristic curves. Our method solves a detection problem by determining if observation data matches zero, one, or more than one model.
To detect the occurrence of a behavior in observation data, one must consider the amount of data required. We consider behaviors to be 'serial Markovian,' when the behavior can change from one model to another at any time. When analyzing observation data, considering too much data induces high delay and could lead to confusion in the system if multiple behaviors are observed in the data stream. If too little data is used, the system has a high false positive rate and is unable to correctly detect behaviors.
We demonstrate the effectiveness of all methods using illustrative examples and consumer behavior data.

Share

COinS