Date of Award


Document Type


Degree Name

Master of Science (MS)


Industrial Engineering

Committee Member

Sandra D Eksioglu, Committee Chair

Committee Member

Michael Carbajales-Dale

Committee Member

Burak Eksioglu


Energy needs to be used very efficiently in today's world. With fast paced improvements in the industrial sector, demand is increasing, and energy efficiency programs become vital to reduce the energy wastage while also meeting the demand. The analysis of several scenarios used by policy makers suggest that for the global temperature to raise by less than 2° C by the end of this century, it is necessary to reduce industrial energy consumption increase by at least a half. To be on track with these scenarios and to achieve the desirable targets, it is important that we incorporate a dependable forecasting tool that can be used to predict the energy consumption based on several expected parameters. In this thesis, a survey is conducted on energy consumption forecasting algorithms to compare the advantages and disadvantages of each, explaining for what applications they would be the best fit. Also discussed in this thesis is a machine learning supported regression model that has a higher accuracy when compared to conventional regression models. The Industrial Assessment Center database contains data from all assessments conducted on manufacturing facilities that include plant area, production hours, number of employees, annual sales and the region the facility is from. These variables, along with average annual temperature are the independent variables and represent the various factors affecting energy consumption. The dependent variable is annual energy consumption. The suggested model incorporates random forest feature selection to identify the most important variables in the dataset. The dataset is first divided into 3 groups based on the value of the most important variable, production hours. Each of these groups is further divided into three groups based on the value of the second most important variable, plant area. The algorithm then fits linear, polynomial and support vector regression models to each of these 9 groups for training. While testing, the model uses the respective regression plane based on the testing data's value of the most important two variables. This approach to regression gave 23% lesser percentage deviation than conventional regression modeling. Polynomial regression works better for the entire dataset whereas linear regression performs equally good in the subsets, suggesting that the linearity of data increases as the dataset is divided into homogenous subsets. Production hours and plant area have the highest impact on energy consumption. To reduce energy consumption, these two factors must be analyzed. This model can be used by various Industrial Assessment centers to find out future expected energy consumption of clients to give a more accurate figure of the payback periods of various recommendations.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.