Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Automotive Engineering

Committee Chair/Advisor

Dr. Beshah Ayalew

Committee Member

Dr. Zoran Filipi

Committee Member

Dr. Yunyi Jia

Committee Member

Dr. Robert Prucka


The transportation sector provides a significant opportunity to reduce global emissions, both through technological advancements and vehicular control strategies. Model-based control systems are popular methods for increasing the operating efficiency of vehicles. However, these systems often rely on models that require costly calibrations that still fail to capture the complexity of modern powertrain systems and the variations found in real-world driving. The recent availability of operational data through connected vehicle technology and/or edge devices has led to the emergence of data-driven control strategies that can learn optimal control policies through the interactions of the vehicle’s control system with the environment. Specifically, deep reinforcement learning (DRL) uses an engineered reward structure to update neural networks that make up the control policy. In this dissertation, we demonstrate the potential benefits of applying DRL to powertrain control to reduce fuel consumption while accommodating the driver’s desired acceleration and comfort through the optimization of gearshift decisions and traction torque control. While showcasing a remarkable 12% improvement in fuel economy over baseline model and table-based approaches, we found many challenges to the practical implementation of a DRL controller. We propose and evaluate various techniques to resolve these challenges, towards a viable approach to the potential application of DRL for vehicle control in real-world scenarios. The first challenge addressed in this application is the learning efficiency of DRL agents on physical applications. When learning a policy with newly initialized neural networks, extensive exploration of the action choices occurs, often resulting in unnecessary and poor action choices until the DRL agent learns an appropriate policy. To reduce this exploration, we investigated two methods that leverage a known source policy. The first method, which we refer to as residual policy learning (RPL), configures the DRL agent to only learn residual actions that adjust a known deterministic source policy for the individual vehicle, allowing it to adapt to variations in the powertrain, the driver, the road conditions, and the surrounding traffic. While the exploration was reduced to maintain actions close to the source policy, improving learning efficiency, a bias was observed in the asymptotic performance towards that of the source policy. Therefore, as a second method, an adaptive policy learning (APL) scheme was introduced to improve the exploration, the learning efficiency, and the asymptotic performance. This approach uses an attention network to weigh the actions of the source policy relative to the learned policy, eventually applying solely the optimal learned policy. Although the initial learning was improved, we observed instability and high variation in the asymptotic performance of the DRL agent when the vehicle operates on a large distribution of routes, which is common in the practical use of commercial vehicles and realistic driving conditions. Instead of solely relying on a deterministic policy and the experiences limited to individual vehicles, we next explored leveraging the knowledge gained by a fleet of vehicles. Two variants of cooperative/shared learning are proposed to aggregate the diverse experiences taken by multiple vehicles in a fleet that serves a common distribution of routes. The first approach computes a centralized group policy that regresses over the learned individual vehicle policies, and then distributes the group policy to use in the update of each local policy. A significant improvement in learning and overall fleet performance was demonstrated with this approach. The drawback is the computational bottleneck of using a centralized entity that limits the applicable fleet size for shared fleet learning to few vehicles depending on the diversity of the routes served. The second approach is a fully decentralized cooperative learning framework where we introduce novel ad hoc teaming concepts for mutual distillation between vehicles in the fleet to share policies. This leads to a highly scalable approach for shared fleet learning. By implementing various versions of the shared learning schemes in fleet simulations, a performance improvement of the order of 14% was observed over a baseline of individually learning vehicle agents. Furthermore, demonstrative results show how both variants of cooperative learning reduce the commonly seen variance in the performance within a fleet and enables DRL agents to learn robust polices to effectively adapt to new routes.

Author ORCID Identifier


Available for download on Saturday, May 31, 2025