Document Type

Conference Proceeding

Publication Date

5-2017

Publisher

IEEE, IFIP, IEEE Communications Society

Abstract

The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed file and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modified version of RAPCAP for random access.

Comments

Presented at the 2nd IFIP/IEEE International Workshop on Analytics for Network and Service Management (AnNet 2017), held in conjunction with IFIP/IEEE International Symposium on Integrated Network Management

May 8, 2017 in Lisbon, Portugal

Share

COinS