Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)


Civil Engineering

Committee Chair/Advisor

Dr. Tuyen (Robert) Le

Committee Member

Dr. Kalyan R. Piratla

Committee Member

Dr. Kapil Chalil Madathil

Committee Member

Dr. Da Li


Contract documents are a critical legal component of a construction project that specify all wishes and expectations of the owner toward the design, construction, and handover of a project. A single contract package, especially of a design-build (DB) project, comprises hundreds of documents including thousands of requirements. Precise comprehension and management of the requirements are critical to ensure that all important explicit and implicit requirements of the project scope are captured, managed, and completed. Since requirements are mainly written in a natural human language, the current manual methods impose a significant burden on practitioners to process and restructure them into a manageable format during different construction stages. The conventional manual methods may also involve human errors that could result in costly delays and legal disputes. With the advancement of natural language processing (NLP) techniques, there have been several efforts in automating the requirement processing and management. However, the existing automated models developed by previous researchers are highly domain-specific, application-oriented, and applicable to quantitative requirements only. The use of the specific dataset, categories, and rules in training those models has limited their applicability to certain applications only. To address these gaps, the current study proposes a novel requirement digitalization framework that utilizes natural language processing (NLP) techniques to process and restructure requirements in contracts. The proposed framework is comprised of four main models: (1) an NLP-based binary text classification model leveraging the rules and machine learning algorithms to extract all requirements from construction contracts, (2) an NLP-based multiclass text classification model to classify the requirement into different categories (such as design, construction, and operation and maintenance), (3) a syntactic rule-based requirement tagging model employing NLP to extract the project activities related information (such as actor, action, and object) from the requirements, and (4) a semantic NLP-based requirement prioritization model to rank requirements in terms of their severity levels. The models were evaluated in terms of different metrics including accuracy, precision, recall, and f-score. The evaluations were performed on datasets of unseen requirements extracted from contracts of real DB projects. The effectiveness of the proposed models was further investigated by conducting experimental studies to compare their performance with humans. The proposed models developed in this research yielded an impressive performance ranging from 80%-96%.

Author ORCID Identifier


Available for download on Wednesday, April 12, 2023