Scalable Data-Driven Predictive Modeling and Analytics for CHO Process Development Optimization
Date of Award
Doctor of Philosophy (PhD)
In 1982, the FDA approved the first recombinant therapeutic protein, and since then, the biopharmaceutical industry has continued to develop innovative and highly effective biological drugs for various illnesses1. These drugs are produced using host organisms that are modified to hold the genetic encoding of the targeted protein1. Of the many host organisms, Chinese hamster ovary (CHO) cells are often used due to capability to perform posttranslational modification (PTM): which allows human-like synthesis of proteins unlikely to invoke immunogenicity in humans 1,2.
Despite all the positive attributes, many challenges are associated with CHO cell cultures, such as relatively slow growth rates, low volumetric yields of the target proteins, variable product quality, and genome instability, which can lead to failed production run. These fundamental issues limit biopharmaceutical companies' ability to meet high drug demands and contribute to high manufacturing costs that result in higher drug prices. These limitations are driving the industry to undergo a digital transformation to improve data transparency and boost automatic decision-making in CHO process development to minimize failure in production.
This dissertation provides insight into the roles of data-driven methods, specifically deep learning, in optimizing process development. The benefits of deploying data science applications and multivariate data analysis (MVDA) techniques were examined. Further focuses on data management, feature engineering, and model development for complex bioprocessing data established pipelines to store, rapidly access, and transform data through stable storage systems, like MongoDB. MVDA methods were used to uncover hidden patterns within the complex cell culture process, which could be exploited computationally to develop predictive models.
The study also focused on developing deep learning models as soft sensors to predict key process indicators, such as protein yield and cell growth. The developed deep learning pipeline includes the development training dataset, test dataset, neural network architecture, and model validation for bioprocessing data. The predictive performance of Feedforward (FF), Long Short-Term Memory (LSTM), and Gated Recurrent (GRU) neural networks (NN) was exposed in the development with four different optimizers. The optimizers were the Stochastic Gradient Descent (SGD), Adaptive Moment Estimation (Adam), Root Mean Square Propagation (RMSprop), Rectified Adam (RAdam). Additionally, five learning rates (LR) variations were used (1e-02- 1e-06). Furthermore, the robustness and predictive performance of multi-output LSTM were compared across different feature sets and architectures. Finally, it was demonstrated that a neural network developed for one recombinant CHO cell line could be transferred to another CHO cell line due to shared feature space. Specifically, a pre-trained LSTM neural network developed for CHO VRCO1 cell line was transferred over to predict the behavior of the CHOZN cell line.
The motivation behind this research was to enhance the knowledge space of data-driven methods in the bioprocessing domain by unveiling how these tools could be developed for different applications. This work revealed the efficacy of the LSTM with Adam optimizer for predicting VCD and titer. We also discovered that transfer learning is beneficial in reducing time for feature engineering, which could relate to reduced process development time and reduced manufacturing costs for new drugs.
1 Jayapal KP, Wlaschin KF, Hu WS, Yap MGS. Recombinant protein therapeutics from CHO Cells - 20 years and counting. Chem Eng Prog 2007;103:40–7.
2 van Beers MMC, Bardor M. Minimizing immunogenicity of biopharmaceuticals by controlling critical quality attributes of proteins. Biotechnol J 2012;7:1473–84. https://doi.org/10.1002/biot.201200065.
Mbiki, Sarah, "Scalable Data-Driven Predictive Modeling and Analytics for CHO Process Development Optimization" (2022). All Dissertations. 3177.
Available for download on Sunday, December 31, 2023
Computer and Systems Architecture Commons, Data Storage Systems Commons, Other Biomedical Engineering and Bioengineering Commons