Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Electrical and Computer Engineering (Holcomb Dept. of)

Committee Member

Adam Hoover, Committee Chair

Committee Member

Jon Calhoun

Committee Member

Eric Muth

Committee Member

Yongqiang Wang


This dissertation considers the problem of recognizing wrist motions during eating. The wrist motion is tracked using accelerometer and gyroscope sensors worn in a watch-like device. The goal is to recognize a set of gestures that commonly occur during eating, such as taking a bite of food or consuming a drink of liquid. The wrist motion during a bite may consist of picking up a morsel of food, moving it the mouth for intake, and returning the hand to a rest position. Hidden Markov models (HMMs) are used to recognize these motions. An HMM is a doubly stochastic process where one set of stochastic processes generates the observables, in this case the sensor readings, and is controlled by another set of stochastic processes that is not observable, in this case the eating gestures. A benefit of an HMM is that it can encode the temporal structure of the signal, in this case the expected subsequence of motions comprising a gesture. The ideas pursued in this dissertation are motivated by methods used to improve the capa- bility of HMMs to recognize speech. For example, it is challenging to build a generic HMM to capture all the varieties of accents and dialects of speakers. People in different regions may speak the same language with variations in pronunciation, vocabularies, and grammars. Building HMMs for each dialect group can improve the robustness of the system under speech variations. This dissertation attempts the similar analysis of wrist motion during eating activities. Similar to dialects and accents in speech, we propose that the demographics (gender, age, ethnicity), utensil being used, or types of foods being eaten, may cause variations in the wrist motions while eating. Several variations on this concept are explored and compared to baseline recognition accuracies. In Chapter 2, work is first described to establish a baseline accuracy of a non-HMM method. The method uses a simple pattern matching algorithm that only detects one type of gesture (called “bites” but includes any food or liquid intake). The method was tested on 276 people eating a meal in a cafeteria and was evaluated on 24,088 bites. It achieved 75% sensitivity and 89% positive predictive value. Chapter 3 describes a larger vocabulary of eating actions using segment-based labeling. The set of gestures include taking a bite of food (bite), sipping a drink of liquid (drink), manipulating food for preparation of intake (utensiling), and not moving (rest). All other activities such as using a napkin or gesturing while talking are grouped into a non-eating category (other). The lexicography was tested by labeling segments of wrist motion according to the gesture set. A total of 18 human raters labeled the same data used described above. Inter-rater reliability was 92.5% demonstrating reasonable consistency of gesture definitions. Chapter 4 describes work that explores the complexity of HMMs and the amount of training data needed to adequately capture the motion variability across the large data set. Results found that HMMs needed a complexity of 13 states and 5 Gaussians to reach a plateau in accuracy, signifying that a minimum of 65 samples per gesture type are needed. Results also found that 500 training samples per gesture type were needed to identify the point of diminishing returns in recognition accuracy. Overall, it achieved 85.2% all gestures accuracy for HMM-S that models a single gesture as a sequence of sub-gestures. It also achieved 89.5% all gestures accuracy for HMM-1, where a sequence of one previous gesture was studied as context. Chapter 5 describes work that investigates contextual variables to recognize gestures using top-down and bottom-up approaches. Specifically, we consider if foreknowledge of the demographics (gender, age, hand used, ethnicity, BMI), meal level variables (utensil used for eating, food consumed), language variables (variations of bite, utensiling and other), and clustering based method can improve recognition accuracy. We investigate this hypothesis by building HMMs trained for each of these contextual variables, and compare their accuracy against the simple non-HMM algorithm and HMM-S. Results show that the highest accuracy of all gestures and intake gestures in contextual HMMs is 86.4% and 91.7%, improved by 1.2% and 6.7% over HMM-S, respectively. We also investigate the contextual variables along with one gesture history. It achieved all gestures accuracy up to 88.9% and intake gestures accuracy up to 93.0%, with 0.6% decreased for all gestures accuracy and 1.5% intake gestures accuracy improved over HMM-1.