Resilience Workshops and Conferences
Large Conferences With Resilience Contributions
Most high performance computing (HPC), high end computing (HEC), and
supercomputing conferences have calls for papers related to fault-tolerance,
reliability, dependability, and resilience. While not directly focused in
this area, these are some good venues to attend, submit papers to, and be
- Supercomputing (SC) :
SC is generally accepted as the premier parallel and distributed computing
conference and tradeshow. SC is always hosted in the US.
- Dependable Systems and Networks (DSN) : DSN
is a 42+ year IEEE/IFIP international conference that generally alternates
between US and international venues yearly. While not focused on HPC, DSN
does publish work in the HPC area along with work in the general dependability
community. The FTXS workshop (below) associated with DSN focuses on HPC
- ACM Symposium on High-Performance Parallel
and Distributed Computing (HPDC) : HPDC is a 20+ year old conference in
the HPC area. The conference is almost always within the US but in recent years
some have been international.
International Symposium on Cluster, Cloud and Grid Computing (CCGRID) :
CCGrid has a bit of a cloud focus and has been around for 12+ years. There
does not appear to be a centralized CCGrid page so the above link goes to a
specific instance of CCGrid.
- Euro-Par : Euro-Par is a yearly
conference always held in Europe and focuses on parallel and distributed
- The Nuclear and Space Radiation Effects Conference (NSREC) : NSREC is a diversion from the conferences above and
focuses specifically on radiation effects in electronics. Resilience
researchers interested in soft errors, silent data corruption, and transients
may find this workshop enlightening.
Resilience and Fault-Tolerance Focused Workshops
These smaller workshops specifically focus on resilience and fault-tolerance
in the area of HPC, HEC, and supercomputing.
- Fault-Tolerance for HPC at eXtreme Scale (FTXS) : FTXS is organized by
Nathan DeBardeleben (LANL) and other collaborators and is associated with
the Dependable Systems and Networks (DSN)
conference when DSN is in the USA (and another conference when it is abroad).
- FTXS 2013 : New York City (with HPDC 2013)
FTXS 2012 : Boston, Massachusetts (with DSN 2012)
FTXS 2010 : Chicago, Illinois (with DSN 2010)
- Workshop on Resiliency in High-Performance Computing (Resilience)
- Resilience 2012 : with Euro-Par 2012, Rhodes Island, Greece
- Resilience 2011 :
with Euro-Par 2011, Bordeaux, France
- Resilience 2010 :
with CCGrid 2010, Melbourne, Australia
- Resilience 2009 :
with HPDC 2009, Munich, Germany
- Resilience 2008 :
with CCGrid 2008, Lyon, France
- Silicon Errors in Logic - System Effects (SELSE) : SELSE is a small, 8+ year workshop usually hosted at
Stanford or UIUC and involves very low level hardware issues (generally).
- Fault-Tolerance Spaceborne Computing Employing New Technologies : Generally referred to as "Spacecomputing",
this workshop usually involves radiation experts in the satellite and space-craft
field. In recent years they have sought out the expertise of the HPC field
realizing that we have many of the same problems from soft errors.
- Los Alamos Computer Science Symposium (LACSS) :
LACSS was put on hold in 2011 and has not yet resumed at this time. However,
for a few years this was a great venue for research and tight collaborations.
- 2009 National HPC Workshop on Resilience : in 2009 LANL, DOD, ORNL, and DARPA hosted
an inter-agency workshop on motivating a national effort to produce resilience
solutions in preparations for petascale and beyond.
Talks Given at LANL
Invited talks from collaborators:
Nathan DeBardeleben hosted a resilience seminar series at Los Alamos National
Laboratory in 2009. See the SEMINAR PAGE for specific
dates and topics. Guest speakers are always welcome.