"Random Access in Nondelimited Variable-length Record Collections for P" by Jason Anderson, Christopher Gropp et al.

Publications

Title

Random Access in Nondelimited Variable-length Record Collections for Parallel Reading with Hadoop

Authors

Jason Anderson, Clemson University
Christopher Gropp, Clemson University
Linh Ngo, Clemson University
Amy W. Apon, Clemson UniversityFollow

Document Type

Conference Proceeding

Publication Date

5-2017

Publisher

IEEE, IFIP, IEEE Communications Society

Abstract

The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed ﬁle and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modiﬁed version of RAPCAP for random access.

Comments

Presented at the 2nd IFIP/IEEE International Workshop on Analytics for Network and Service Management (AnNet 2017), held in conjunction with IFIP/IEEE International Symposium on Integrated Network Management

Publications

Title

Authors

Document Type

Publication Date

Publisher

Abstract

Comments

Recommended Citation

Included in

Search

Browse by

Useful Links

Publications

Title

Authors

Document Type

Publication Date

Publisher

Abstract

Comments

Recommended Citation

Included in

Share

Search

Browse by

Useful Links