Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Genetics and Biochemistry

Committee Chair/Advisor

Liangjiang Wang

Committee Member

Lukasz Kozubowski

Committee Member

Hong Luo

Committee Member

Trudy Mackay


In the central nervous system, synapses are essential junctions that connect neurons and play important roles in neurotransmission and synaptic plasticity. While there are many challenges in human synapse genomics, machine learning techniques, which are capable of mining and interpreting large amounts of genomic data, may be utilized to facilitate the functional studies of human synapses. In this study, we have developed machine learning models for human synapse genomics to address several biological problems.

RNA localization plays an important role at the synapse, allowing local protein synthesis required for synaptic plasticity during brain development. Previous studies were conducted in mice and rats to investigate the subcellular localization of RNAs and its impact on synaptic plasticity. However, owing to the experimental difficulties of studying human synaptic transcriptome, the full population of human synaptic RNAs remains largely unclear. We have developed a new machine learning method, PredSynRNA, to predict the synaptic localization of human RNAs by using developmental brain gene expression data. The PredSynRNA method can be used to predict and prioritize candidate RNAs localized to human synapses, providing valuable targets for experimental investigations in neuronal studies.

Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs (ncRNAs) with little protein-coding potential due to the lack of an open reading frame. LncRNAs are emerging as important regulators in neuronal development, synaptic plasticity, and complex brain disorders. However, only a few synapse-related lncRNAs have been identified and characterized. In this study, we have built a new machine learning method – SynLnc to predict human synapse-related lncRNAs by mining the developmental brain gene expression data using collaborative embedding – a common technique used in recommender systems. High-confidence candidate lncRNAs shown to co-express with known synaptic genes within genomic proximity may be valuable experimental targets for future research.

Liquid-liquid phase separation (LLPS) is a physiological process essential for the formation of membraneless compartments that are pervasively found in cells and synaptic regions. While previous studies attempted to predict phase-separated proteins with conventional feature encoding and laborious feature engineering, natural language processing (NLP) techniques have not been sufficiently applied in this field. In this study, we applied the framework of the state-of-the-art deep protein language model to predict proteins with LLPS propensity and synaptic functions. The constructed models achieved good performances in both learning tasks, showing promise in deep sequence representation learning by advanced NLP techniques. As a whole, we expect the models and results can provide valuable information in studying human synapses.

Available for download on Sunday, December 31, 2023