Master of Science in Computer Science Project

I finished all the work for a Master of Science in Computer Science at Boise State University in March 2013. The classwork was done before that, but that is when I defended the the culminating project. 

The project involved writing a couple programs named pct and Bpct for doing recursive directory copying on generic parallel distributed file systems such as Lustre, Gluster, PanFS, PVFS, etc. The pct program is written in C and submits a PBS job to perform the work. The master MPI process uses multiple threads to traverse the directory structure and uses MPI to distribute work to the consumer processes. The Bpct program is written in Java. The directory traversal is done at the command line, also using multiple threads. The work is divided up evenly and multiple Bash script jobs are submitted to PBS to perform the work. Both programs can also track progress of the submitted jobs.  There apparently were no readily available utilities to exploit the parallelism for copying. And since these file systems are often used for very large data sets, a single process cp command causes a rather long wait. I used a small cluster at BSU with a Lustre file system and two clusters at Idaho National Laboratory. One INL cluster had 2048 cores and the other had 12,512 cores; both used a PanFS file system. Rather than explain everything here, both what I did and why it was needed, I will merely include a copy of the abstract here and give a reference to the permanent location of the entire 77 page write up. (Some of those 77 pages contain mandatory academic filler, but it was still a fairly substantial report.)

Here is a link to a PDF that contains the slides I used during the defense of the project before a committee: DefenseSlides.pdf. Unfortunately, Google forces a view with their viewer, which does not retain the coloring and details. If you want the coloring and almost full quality, open it in the Google viewer by clicking on the link, then download it to your computer by going into the print options and choosing to save as a file. And then you can view it in Acrobat Reader. It's almost as good as the original.


Permanent Boise State University ScholarWorks Location

 This site contains a download for the complete write up, formatted as required by BSU: http://scholarworks.boisestate.edu/cs_gradproj/4/


Abstract

Title: Parallel Copying Tools for Distributed File Systems

Parallel distributed files systems are increasingly being used on clusters to allow greater throughput of data to the many compute nodes. They are also an effective way to store massive amounts of data. However, using the standard core utility cp does not make good use of the potential parallelism of the file systems. Using multiple cp commands has inherent problems too. 

Two utilities were created to help recursively copy directories containing large amounts of data on parallel distributed file systems. One of the test data sets contains very many files, and the other contains large files. One utility is a C program that submits a single job on a user specified number of nodes. The work of copying the files is dynamically distributed among those nodes using MPI communications. Multiple threads are used to traverse the directories. Speedups of 9.57 and 7.36 were attained for the many files set and the large files set, respectively. A second utility is written in Java. It also uses multiple threads to traverse the directories, but it performs the copying by creating Bash scripts and submitting them to the job scheduler. The work is balanced among those scripts and the number of jobs is specified by the user. It reached speedups of 3.67 and 7.32 for the same two data sets. Both utilities can also
be used to track the progress of the jobs they have submitted.

Ċ
Kevin Matthew Nuss,
Mar 22, 2013, 12:11 PM