Date of Award
Master of Science (MS)
The process of image classification using convolutional neural networks (CNNs) often relies on access to large, annotated datasets and the use of cluster or cloud-based computing resources. However, many classification applications such as those in healthcare or defense introduce privacy concerns that prevent the collection of such data and the use of pre-existing large scale computing systems. Although many solutions to privacy preserving machine learning have previously been explored, the added computational complexity incurred with training on encrypted values inhibits these systems from executing in real-time. One of the most promising solutions that facilitates secure machine learning is secure multi-party computation (MPC), which relies on segmenting data across multiple devices such that the original data cannot be reconstructed without recombining each of the data segments.
This thesis explores the efficacy of training CNNs on encrypted data using MPC techniques and utilizes several optimization techniques to lessen the computational and communication overheads incurred from doing so. The goals are to create a privacy-preserving CNN framework that achieves testing accuracy similar to a non-secure model while introducing the least amount of computational overhead. To this end, a multi-party encryption scheme was used to encrypt all floating point values used in training, and federated learning was incorporated to reduce the effects of the computational overhead by parallelizing the training of the network.
The developed secure CNN was able to achieve validation accuracy within 1.1-2.8% of a baseline CNN on the MNIST dataset and 9.9-19.4% on the CIFAR-10 dataset. This decreased accuracy is caused by rounding errors incurred by performing multiple continuous arithmetic computations in the secure domain during training, however the accuracy results of the secure CNN indicate that training can be performed on encrypted values. The cost of performing training on encrypted values was found to range from between 8 - 21x more computation time in comparison to a non-secure baseline implementation due to the added computational complexity and communication overhead required to perform training on secure values. This additional training time, however, was shown to be able to be mitigated through the use of federated averaging by performing training on multiple devices in parallel.
Langbehn, David Karl, "Privacy-Preserving Image Classification Using Convolutional Neural Networks" (2021). All Theses. 3542.