Date of Award

12-2012

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Legacy Department

Electrical Engineering

Committee Chair/Advisor

Brooks, Richard R

Committee Member

Brooks , Richard R

Committee Member

Russell , Harlan B

Committee Member

Hoover , Adam

Committee Member

Lund , Robert

Abstract

Network traffic analysis is widely used to infer information from Internet
traffic. This is possible even if the traffic is encrypted. Previous work uses
traffic characteristics, such as port numbers, packet sizes, and frequency,
without looking for more subtle patterns in the network traffic. In this work,
we use stochastic grammars, hidden Markov models (HMMs) and probabilistic
context-free grammars (PCFGs), as pattern recognition tools for traffic
analysis.
HMMs are widely used for pattern recognition and detection. We use a HMM
inference approach. With inferred HMMs, we use confidence intervals (CI) to
detect if a data sequence matches the HMM. To compare HMMs, we define a
normalized Markov metric. A statistical test is used to determine model
equivalence. Our metric systematically removes the least likely events from both
HMMs until the remaining models are statistically equivalent. This defines the
distance between models. We extend the use of HMMs to PCFGs, which have more
expressive power. We estimate PCFG production probabilities from data. A
statistical test is used for detection.
We present three applications of HMM and PCFG detection to network traffic
analysis. First, we infer the presence of protocol tunneling through Tor (the
onion router) anonymization network. The Markov metric quantifies the similarity
of network traffic HMMs in Tor to identify the protocol. It also measures
communication noise in Tor network.
We use HMMs to detect centralized botnet traffic. We infer HMMs from botnet
traffic data and detect botnet infections. Experimental results show that HMMs
can accurately detect Zeus botnet traffic.
To hide their locations better, newer botnets have P2P control structures.
Hierarchical P2P botnets contain recursive and hierarchical patterns. We use
PCFGs to detect P2P botnet traffic. Experimentation on real-world traffic data
shows that PCFGs can accurately differentiate between P2P botnet traffic and
normal Internet traffic.

Recommended Citation

Lu, Chen, "Network Traffic Analysis Using Stochastic Grammars" (2012). All Dissertations. 1059.
https://tigerprints.clemson.edu/all_dissertations/1059

Download

Included in

Electrical and Computer Engineering Commons

COinS

All Dissertations

Network Traffic Analysis Using Stochastic Grammars

Date of Award

Document Type

Degree Name

Legacy Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Search

Browse by

Useful Links

All Dissertations

Network Traffic Analysis Using Stochastic Grammars

Author

Date of Award

Document Type

Degree Name

Legacy Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Included in

Share

Search

Browse by

Useful Links