Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)

Legacy Department

Electrical Engineering


Brooks, Richard R

Committee Member

Brooks , Richard R

Committee Member

Russell , Harlan B

Committee Member

Hoover , Adam

Committee Member

Lund , Robert


Network traffic analysis is widely used to infer information from Internet
traffic. This is possible even if the traffic is encrypted. Previous work uses
traffic characteristics, such as port numbers, packet sizes, and frequency,
without looking for more subtle patterns in the network traffic. In this work,
we use stochastic grammars, hidden Markov models (HMMs) and probabilistic
context-free grammars (PCFGs), as pattern recognition tools for traffic
HMMs are widely used for pattern recognition and detection. We use a HMM
inference approach. With inferred HMMs, we use confidence intervals (CI) to
detect if a data sequence matches the HMM. To compare HMMs, we define a
normalized Markov metric. A statistical test is used to determine model
equivalence. Our metric systematically removes the least likely events from both
HMMs until the remaining models are statistically equivalent. This defines the
distance between models. We extend the use of HMMs to PCFGs, which have more
expressive power. We estimate PCFG production probabilities from data. A
statistical test is used for detection.
We present three applications of HMM and PCFG detection to network traffic
analysis. First, we infer the presence of protocol tunneling through Tor (the
onion router) anonymization network. The Markov metric quantifies the similarity
of network traffic HMMs in Tor to identify the protocol. It also measures
communication noise in Tor network.
We use HMMs to detect centralized botnet traffic. We infer HMMs from botnet
traffic data and detect botnet infections. Experimental results show that HMMs
can accurately detect Zeus botnet traffic.
To hide their locations better, newer botnets have P2P control structures.
Hierarchical P2P botnets contain recursive and hierarchical patterns. We use
PCFGs to detect P2P botnet traffic. Experimentation on real-world traffic data
shows that PCFGs can accurately differentiate between P2P botnet traffic and
normal Internet traffic.