Distributed MultiThreaded CheckPointing

Overview

Distributed MultiThreaded CheckPointing (DMTCP) is a library that can be used to add checkpointing to your code without having to do a code rewrite. DMTCP is designed to work codes that are serial or threaded, allowing users to create restarts on the fly.  DMTCP will not work with non-GPU, non-MPI codes. You will want to make sure to have sufficient storage space for any checkpointing dumps created by DMTCP.

Usage

DMTCP is provided as a module and can be loaded using module load dmtcp. It is recommended that users select a specific version of DMTCP and note which version they are using as different versions of DMTCP may not be compatible with each other. For more see the DMTCP documentation.

Bookmarkable Links

© The President and Fellows of Harvard College.
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.