Date of Award
Master of Science (MS)
Electrical and Computer Engineering
Dr. Adam Hoover
Dr. Harlan Russell
Dr. Richard Groff
This thesis considers the problem of identifying noisy labels in the ground truth of eating episodes (meals, snacks) as self-reported by participants collecting data in the wild. Participants wore a smartwatch-like device that tracked their wrist motion all day. They were instructed to press a button on the device at the start and end of each eating episode. The device and instructions were designed to be as simple to use as possible, but post-review of the ground truth provided by participants revealed a strong likelihood that a significant portion of the button presses may contain errors. For example, an error could be caused by a participant forgetting to press the button until halfway through a meal, thus misidentifying the start boundary. This thesis seeks to determine if these types of errors can be identified with confidence, how often they occurred, and if they can be fixed.
The correctness of the ground truth is important because it is used to generate labels for the wrist motion data to train a classifier to detect eating. If the button presses have errors, then data will be mislabeled, which will diminish classifier performance. The problem of noisy labels is well- known in other domains, such as image segmentation, where it is expected that a certain amount of pixels or images have been mislabeled. In the domain of wearable devices used to monitor human behavior or health, the concept of noisy labels is relatively new and less work has been done. In particular, this is the first work to consider the challenge of identifying errors in the identification of start and end times for eating episodes.
The data used for this work is the Clemson all-day (CAD) data set. It contains 354 days of wrist motion data from 351 different participants. The total length of the data set is 4,680 hours with 1,133 meals indicated by 2,266 button presses (start and end for each meal). This data was used previously to develop a classifier that outputs a continuous probability of eating P(E) all day for each recording. In this work, we visually compare the P(E) plot against the ground truth button presses reported by participants. This comparison highlights intervals where the classifier disagrees with the ground truth. We developed a schema for quantifying these disagreements, and had three raters independently use it to assess and modify the ground truth for 71 days of data. Two raters achieved an agreement of 79% on the adjustments, while three raters achieved an agreement of 64%.
To further test the viability of identifying and updating the ground truth, all 354 days of data were reviewed and adjusted by a single rater. This updated ground truth was then used to retrain the wrist motion classifier. Its performance was evaluated on the original unadjusted ground truth in order to prevent bias. When trained on the original ground truth, the classifier achieved a per-datum weighted accuracy of 79.1%, an episode true positive rate (TPR) of 87.5%, and a false positive to true positive ratio (FP/TP) of 1.9. When trained on the adjusted ground truth, the classifier achieved a per-datum weighted accuracy of 80.1%, an episode TPR of 85.9%, and a FP/TP of 1.7. These results indicate that adjusting the ground truth yielded a 1% improvement in weighted accuracy, and a decrease in the detection of false positive episodes, but also a decrease in the detection of true positive episodes.
Collectively, these results indicate that it is possible to identify button press errors in the ground truth of data used to train a wearable device for detecting eating. However, our methods also need improvement in order to obtain a higher inter-rater reliability, which would potentially yield additional improvement in classifier performance.
Zhang, Tianyi, "Identifying Noisy Labels in the Ground Truth of Eating Episodes Self-Reported by Button Press on a Wrist-worn Device" (2022). All Theses. 3745.