Date of Award
Master of Science (MS)
Electrical and Computer Engineering (Holcomb Dept. of)
Dr. Kuang-Ching Wang, Committee Chair
Dr. Harlan Russell
Dr. Ronald Gimbel
There have been eﬀorts towards improving the network performance using software deﬁned net-working solutions. One such work is Steroid OpenFlow Service (SOS), which utilizes multiple parallel TCP connections to enhance the network performance transparently to the user. SOS has shown signiﬁcant improvements in the memory-to-memory data transfer throughput; however, it’s perfor-mance for disk-to-disk data transfer hasn’t been studied. For computing applications involving big data, the data ﬁles are stored on non-volatile storage devices separate from the computing servers. Before computing can occur, large volumes of data must be fetched from the “remote” storage devices to the computing server’s local storage device. Since hard drives are the most commonly adopted storage devices today, the process is often called “disk-to-disk” data transfer. For production high performance computing facilities, specialized high throughput data transfer software will be provided for users to copy the data ﬁrst to a data transfer node before copying to the computing server. Disk-to-Disk data transfer’s throughput performance depends on the network throughput be-tween servers and disk access performance between each server and its storage device. Due to large data sizes the storage devices are typically parallel ﬁle systems spanning multiple disks. Disk oper-ations in the disk-to-disk data transfer includes disk read and write operations. The read operation in the transfer is to read the data from the disks and store it in memory. The second step in the transfer is to send out the data to the network through the network interface. Data reaching the destination server is then stored to the disk. Data transfer is faced by multiple delays and is limited at each step of the transfer. To date, one commonly adopted data transfer solution is GridFTP developed by the Argonne National Laboratory. It requires custom application installations and conﬁgurations on the hosts. SOS, on the other hand, is a transparent network application without special user software. In this thesis, disk-to-disk data transfer performance is studied with both GridFTP and SOS. The thesis focuses on to two topics, one is the detailed analysis of transfer components for each tool and the second part consists of a systematic experiment to study the two. The experimentation and analysis of the results shows that conﬁguring the data nodes and network with correct parameters results in maximum performance for disk-to-disk data transfer. The GridFTP, for example, is able to get to close to 7Gbps by using four parallel connections with TCP buﬀer size of 16MB. It achieves the maximum performance by ﬁlling the network pipe which has 10Gbps end-to-end link with round trip time (RTT) of 53ms.
Zulfiqar, Junaid, "Disk-to-Disk Data Transfer using A Software Defined Networking Solution" (2018). All Theses. 2896.