A Podcast for HPC Folk

A Podcast for HPC Folk

Podcast Feed

Sponsor

(Opens New Window)

www.mlds-networks.com

  • Websites
  • E-Mail
  • Streaming Audio
  • Podcast
  • Virtual Servers

Podcast

RCE 12: BLCR

MP3 (Right Click Save As)

Brock Palen and Jeff Squyres speak with Paul Hargrove of the Berkley Labratory Checkpoint Restart (BLCR) project, for checkpointing, restartaring and migrating HPC applications.

Notes:  All library code is LGPL; kernel module and the (small) user-space utils are GPL.  There is also support for BLCR in SLURM 2.0 which is not mentioned in the show.

Since 2000, Paul has been a Principle Investigator in the Future Technologies Group (FTG) at Lawrence Berkeley National Laboratory (LBNL).  His general area of work can be described as systems software and runtime environments for High Performance Computing (HPC). His current research interests include Checkpoint/Restart, Partitioned Global Address Space (PGAS) languages, and high-performance cluster networks.  Current projects include Berkeley Lab Checkpoint/Restart (BLCR) for Linux, Global Address Space Networking (GASNet), and Berkeley Unified Parallel C (UPC).  Paul received his Ph.D. from Stanford University in 2003.

RCE 11: UM Atlas

MP3 (Right Click Save As)

Brock Palen and Jeff Squyres speak with Shawn McKee from The University of Michigan about the Atlas (altas.ch) detector, part of the Large Hadron Collider (lhc.web.cern.ch/lhc/) at Cern.

Panda Information: https://twiki.cern.ch/twiki/bin/view/Atlas/Panda
Panda Production Interface: http://panda.cern.ch:25980/server/pandamon/query 

Shawn McKee (Ph.D., UM 1991) is a high-energy astrophysicist and research scientist at the University of Michigan. He is currently a member of the ATLAS experiment at the Large Hadron Collider at CERN. His work has been central to advancing computing technology to address the simulation and data analysis requirements of ATLAS. ATLAS, with its multi-Petabyte per year data flow, represents a significant challenge for even the existing computing infrastructure of 2009. Since 2006, Shawn has been the Director of the ATLAS Great Lakes Tier-2 (AGLT2) computing center, located at the University of Michigan and Michigan State University, one of five such centers in the US providing computing resources to support ATLAS simulation and data analysis.  In 2001, he was appointed Network Project Manager for US ATLAS to plan for and develop the necessary network environment to support the US ATLAS computing model. He is co-chair of the High-Energy and Nuclear Physics (HENP) Internet2 Working Group, which is addressing similar problems in the context of all of high-energy and nuclear physics experiments.

RCE 10: SLURM

MP3 (Right Click Save As)

Brock Palen and Jeff Squyres speak with Moe Jette and Danny Auble of LLNL about the SLURM Resource Manager.

Danny Auble joined the SLURM team in 2004 to develop the port to the IBM bluegene infrastructure.  Since then he has been the primary on many parts of SLURM including the tree fanout used for communication with the slurmd's, the accounting system, and the multifactor priority plugin.  He began working at Lawrence Livermore National Laboratory in 2001, soon after graduating from Brigham Young University. 

Morris Jette has been the SLURM project leader since its inception in 2001. He began systems programming at Lawrence Livermore National Laboratory in 1980, helping to develop the Cray Time-Sharing System
(CTSS) operating system. He has spent most of the past 20 years working on computing scheduling issues for the HPC environment including the development of several gang schedulers.

RCE 09: HDF5

MP3 (Right Click Save As)

Brock Palen and Jeff Squyres speak wtih Mike Folk and Quincey Koziol of The HDF Group about the HDF5 file API.

Mike Folk is President and Executive Director of The HDF Group. Mike led the NCSA HDF project from 1988 until 2006, when The HDF Group became an independent, non-profit company dedicated to meeting the needs of HDF users and assuring access to their data for the long haul.  Mike’s first programming job was in 1961, as a student at the University of North Carolina.  Later Mike taught high school math in the U.S. and Africa, got a PhD in CS from Syracuse University, then taught computer science at the university level for 18 years. Mike’s modest list of publications include the book File Structures, a Conceptual Toolkit, by Folk and Zoellick (1987, 1991).

Quincey Koziol has been with The HDF Group (THG) since its founding and started with the HDF group in 1991, when it was still part of the National Center for Supercomputing Applications.  He serves as the Director of Software Development for THG, overseeing the design and architecture of the HDF5 software, as well as providing software engineering leadership for THG.  Quincey received his Bachelor's degree in Electrical Engineering from the University of Illinois and is pursuing his Master's degree in Computer Science from the U of I also.

RCE 08: Torque Resource Manager

MP3 (Right Click Save As)

Brock Palen and Jeff Squyres speak with Josh Butikofer of Cluster Resources Inc. and Ake Sandgren of HPC2N about the Torque resource manager.

 
Ake has been a sysadmin for the past 20 odd years. Working at HPC2N (www.hpc2n.umu.se) since 97 and before that at the Department of Computing Science at Umeå University, Sweden (www.cs.umu.se). He has been running systems of various types since 87, iPSC/2, Alliant, IBM SP/2, various Linux clusters.
 

Josh Butikofer is the Director of Grid Technologies at Cluster Resources, Inc. He is primarily involved with overseeing and participating in development of the TORQUE Resource Manager and Moab family of products. Josh has been active in the HPC software industry for several years and has been involved in improving the scalability and performance of distributed software since 1999. He graduated summa cum laude from Brigham Young University.
 

RCE 07: Cluster Planning

MP3 (Right Click Save As)

 Brock Palen and Jeff Squyres have a short discussion with Douglas Eadline and Jeff Layton about planning of the construction of HPC clusters.

Douglas Eadline, Ph.D. has worked with parallel computers since 1988 (anyone remember the Inmos Transputer?). He has a large amount of experience (and opinions) with parallel software tools and and application performance. Doug has been building and using Linux clusters since 1995. One of his current interests is in Personal Clusters. Presently, he is Editor of ClusterMonkey.net, Senior HPC Editor at Linux Magazine, and an instructor/consultant.

Jeff Layton is a long-time cluster monkey and all round cluster enthusiast having been a customer, admin, developer, writer, and now works for a cluster vendor. He works at Dell as the HPC Enterprise Technologist. Previously he worked for Panasas, Linux Networx, Lockheed Martin, Boeing, NASA, and was a professor of Aeronautical Engineering for a time (this makes him seem much older that he really is). Jeff writes for Cluster Monkey (www.clustermonkey.net), Linux Magazine (www.linux-mag.com), Dell Tech Center (www.delltechcenter.com/page/hpcc), and his own feeble attempt at a blog - ClusterBuffer (clusterbuffer.wetpaint.com).

RCE06: VisIt

 

 

MP3 (Right Click Save As)

Brock Palen and Jeff Squyres speak with Sean Ahern and Jeremy Meredith on the VisIt (http://www.llnl.gov/visit) visualization project. Be sure to check out VisIt Users (http://visitusers.org/).

 

Sean Ahern is a computer scientist and the Visualization Task Leader for the National Center for Computational Sciences at Oak Ridge National Laboratory. He is the ORNL PI of the DOE SciDAC VACET visualization center.  He was Visualization Project Leader within the Advanced Scientific Computing (ASC) program at Lawrence Livermore National Laboratory. He has extensive experience with distributed visualization and large data processing on computational clusters. He has won two R&D 100 Awards for his work on the VisIt visualization system and the Chromium cluster rendering framework. He holds degrees in Computer Science and Mathematics from Purdue University.

 Jeremy Meredith is a computer scientist in the Future Technologies Group at Oak Ridge National Laboratory, where his research interests include scientific visualization and emerging computing architectures.   He received his MS in Computer Science from Stanford University and his BS from the University of Illinois at Urbana-Champaign, and he was a founding developer of the VisIt visualization system at Lawrence Livermore National Laboratory.   Jeremy is a winner of the 2008 ACM Gordon Bell Prize and a 2005 R&D 100 Award.

RCE 05: Open-MX

 

MP3 (Right Click Save As)

Brock Palen and Jeff Squyres speak with Brice Goglin of the Open-MX (http://www.open-mx.org) project, a software implementation of Myrinet Express, providing low latency over stock ethernet networks.

 

Brice Goglin is the primary developer of the Open-MX project. He works at the LaBRI laboratory in Bordeaux (France) as an INRIA researcher (the French institute for research in computer science and control). He has
been working on HPC software design for several years, especially on implementing software support for high-speed networks such as Myricom Myrinet and Myri-10G technologies. He earned his PhD at the Ecole
normale superieure de Lyon (France) in 2005. His research topics nowadays include high-speed networking in the context of the convergence between HPC and traditional Ethernet networks, as well as the design of high-performance runtime systems for upcoming NUMA architectures.
 

RCE04: Hadoop

 

 

MP3 (Right Click Save As)

Brock Palen and Jeff Squyres speak with Christophe Bisciglia of Cloudera (http://www.cludera.com) and the Hadoop (http://hadoop.apache.org/) Project, a free version implementation of MapReduce and Google FS.

 


Christophe Bisciglia joins Cloudera from Google, where he created and managed their Academic Cloud Computing Initiative. Starting in 2007, he began working with the University of Washington to teach students
about Google's core data management and processing technologies - MapReduce and GFS. This quickly brought Hadoop into the curriculum, and has since resulted in an extensive partnership with the National
Science Foundation (NSF) which makes Google-hosted Hadoop clusters available for research and education worldwide. Beyond his work with Hadoop, he holds patents related to search quality and personalization, and spent a year working in Shanghai. Christophe earned his degree, and remains a visiting scientist, at the University of Washington.

 
Joomla 1.5 Templates by Joomlashack