Brock Palen and Jeff Squyres speak with Paul Hargrove of the Berkley Labratory Checkpoint Restart (BLCR) project, for checkpointing, restartaring and migrating HPC applications.
Notes: All library code is LGPL; kernel module and the (small) user-space utils are GPL. There is also support for BLCR in SLURM 2.0 which is not mentioned in the show.
Since 2000, Paul has been a Principle Investigator in the Future Technologies Group (FTG) at Lawrence Berkeley National Laboratory (LBNL). His general area of work can be described as systems software and runtime environments for High Performance Computing (HPC). His current research interests include Checkpoint/Restart, Partitioned Global Address Space (PGAS) languages, and high-performance cluster networks. Current projects include Berkeley Lab Checkpoint/Restart (BLCR) for Linux, Global Address Space Networking (GASNet), and Berkeley Unified Parallel C (UPC). Paul received his Ph.D. from Stanford University in 2003.