Date of Award

May 2021

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Engineering

Committee Member

Melissa Smith

Committee Member

Adam Hoover

Committee Member

Jerome McClendon

Abstract

The process of image classification using convolutional neural networks (CNNs) often relies on access to large, annotated datasets and the use of cluster or cloud-based computing resources. However, many classification applications such as those in healthcare or defense introduce privacy concerns that prevent the collection of such data and the use of pre-existing large scale computing systems. Although many solutions to privacy preserving machine learning have previously been explored, the added computational complexity incurred with training on encrypted values inhibits these systems from executing in real-time. One of the most promising solutions that facilitates secure machine learning is secure multi-party computation (MPC), which relies on segmenting data across multiple devices such that the original data cannot be reconstructed without recombining each of the data segments.

This thesis explores the efficacy of training CNNs on encrypted data using MPC techniques and utilizes several optimization techniques to lessen the computational and communication overheads incurred from doing so. The goals are to create a privacy-preserving CNN framework that achieves testing accuracy similar to a non-secure model while introducing the least amount of computational overhead. To this end, a multi-party encryption scheme was used to encrypt all floating point values used in training, and federated learning was incorporated to reduce the effects of the computational overhead by parallelizing the training of the network.

The developed secure CNN was able to achieve validation accuracy within 1.1-2.8% of a baseline CNN on the MNIST dataset and 9.9-19.4% on the CIFAR-10 dataset. This decreased accuracy is caused by rounding errors incurred by performing multiple continuous arithmetic computations in the secure domain during training, however the accuracy results of the secure CNN indicate that training can be performed on encrypted values. The cost of performing training on encrypted values was found to range from between 8 - 21x more computation time in comparison to a non-secure baseline implementation due to the added computational complexity and communication overhead required to perform training on secure values. This additional training time, however, was shown to be able to be mitigated through the use of federated averaging by performing training on multiple devices in parallel.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.