Date of Award
Doctor of Philosophy (PhD)
School of Computing
Unsupervised contrastive learning has emerged as an important training strategy to learn representation by pulling positive samples closer and pushing negative samples apart in low-dimensional latent space. Usually, positive samples are the augmented versions of the same input and negative samples are from different inputs. Once the low-dimensional representations are learned, further analysis, such as clustering, and classification can be performed using the representations. Currently, there are two challenges in this framework. First, the empirical studies reveal that even though contrastive learning methods show great progress in representation learning on large model training, they do not work well for small models. Second, this framework has achieved excellent clustering results on small datasets but has limitations on the datasets with a large number of clusters such as ImageNet. In this dissertation, our research goal is to develop new unsupervised contrastive representation learning methods and apply them to knowledge distillation and clustering.
The knowledge distillation transfers knowledge from high-capacity teachers to small student models and then improves the performance of students. And the representational knowledge distillation methods try to distill the knowledge of representations from teachers to students. Current representational knowledge distillation methods undesirably push apart representations of samples from the same class in their correlation objectives, leading to inferior distillation results. Here, we introduce Dual-level Knowledge Distillation (DLKD) by explicitly combining knowledge alignment and knowledge correlation instead of using one single contrastive objective. We show that both knowledge alignment and knowledge correlation are necessary to improve distillation performance. The proposed DLKD is task-agnostic and model-agnostic and enables effective knowledge transfer from supervised or self-supervised trained teachers to students. Experiments demonstrate that DLKD outperforms other state-of-the-art methods in a large number of experimental settings including different (a) pretraining strategies (b) network architectures (c) datasets and (d) tasks.
Currently, the two-stage framework is widely used in deep learning-based clustering, namely, learning representation first, then clustering algorithms, such as K-means, are usually performed on representations to obtain cluster assignment. However, the learned representation may not be optimized for clustering in this two-stage framework. Here, we propose Contrastive Learning-based Clustering (CLC), which uses contrastive learning to directly learn cluster assignment. We decompose the representation into two parts: one encodes the categorical information under an equipartition constraint, and the other captures the instance-wise factors. We theoretically analyze the proposed contrastive loss and reveal that CLC sets different weights for the negative samples while learning cluster assignments. Therefore, the proposed loss has high expressiveness that enables us to efficiently learn cluster assignments. Experimental evaluation shows that CLC achieves overall state-of-the-art or highly competitive clustering performance on multiple benchmark datasets. In particular, we achieve 53.4% accuracy on the full ImageNet dataset and outperform existing methods by large margins (+ 10.2%).
Ding, Fei, "Unsupervised Contrastive Representation Learning for Knowledge Distillation and Clustering" (2022). All Dissertations. 3120.
Author ORCID Identifier