Search Docs by Keyword

Table of Contents

Data Storage Workflow

Identification of an appropriate storage location for your research data is a critical step in the research data lifecycle, as it ensures research data remains usable. We recommend you review the available storage options at FAS Research Computing and select the preferred storage offering for your group’s intended workflow, keeping in mind how often the data will be consistently utilized and accessed. The offerings below are designed to store research data, rather than administrative data.

Each user is provided with a 100GB Home Directory for individual use. Each PI or Lab Account also receives a 4TB Lab Directory, for use by all members of the PI’s lab group and a 50TB allotment of scratch (networked scratch). See the matrix below for more details.

Home DirectoryLab DirectorynetscratchActive Lab Storage
(Tier 0)
Active Lab Storage
(Tier 1)
Active Lab Storage
(Tier 2)
Cold Storage
(Tape)
DescriptionPersonal user storage. Not recommended for computational purposes.General lab storage. Install software to be referenced from netscratch. Temporary storage location for high performance data analysis.Active storage location for analysis data; readily utilized and accessed.General purpose storage location for raw and project data. Intended for less active research data and recently completed projects.Long-term storage of inactive research data after project completion or for data retention purposes.
PerformanceModerateModerateHighHigh/ModerateModerateLow/ModerateNone
Size100GB (fixed)4TB (fixed)50TB (fixed)Available upon requestAvailable upon requestAvailable upon request20TB increments
Mount/n/homeNN/username/n/holylabs/n/netscratch/net/[server]/LABS/folder/n/[server]/rc_labs/folder/n/pi_labTransfer data to Tape using Globus
RetentionDaily snapshots weekly. Weekly snapshots every 4 weeks. Disaster recovery. No snapshots. No disaster recovery.No snapshots. No disaster recovery.
90-day retention policy.
No snapshots. No disaster recovery.Daily snapshots weekly. Weekly snapshots every 4 weeks. Disaster recovery.No snapshots. Includes disaster recovery.No snapshots. Includes disaster recovery.
CostNoneNoneNone$50/yr per TB$250/yr per TB$100/yr per TB$5/yr per TB
Security LevelUp to Level 2Up to Level 2Up to Level 2Up to Level 2 (Level 3 with FASSE)Up to Level 2 (Level 3 with FASSE)Up to Level 2 (Level 3 with FASSE)Up to Level 2
StorageFolder generated for each user when granted cluster access. Limited to 100GB.Folder generated for each approved PI and their group. Limited to 4TB.Accessible to group members.Request storage allocationRequest storage allocationRequest storage allocationRequest storage allocation

*Snapshots are copies of a directory taken at a specific moment in time. They offer labs a self-service recovery option for overwritten or deleted files within the specific time period. Disaster recovery is a copy of an entire file system that can be used internally by FASRC in case of system-wide failure.

Home Directory

  • Description: Individual user folder intended for other types of data (code, scripts, documentation, analysis data)
  • Moderate performance
  • Size: 100GB (cannot be expanded)
  • Mount: /n/homeNN/username
  • Daily snapshots for 7 days and weekly snapshots for 4 weeks. Includes disaster recovery.
  • Cost: Free
  • Security level: Up to Level 2 (Level 3 with FASSE)
  • Automatically generated when granted cluster access.
  • Is not visible to anyone but the owner. Not intended for sharing.

Lab Directory

  • Description: General lab folder intended for data, scripts with version control and documentation.
  • Moderate performance
  • Size: 4TB (cannot be expanded)
  • Mount: /n/holylabs
  • No snapshots. No disaster recovery.
  • Cost: Free
  • Security level: Up to Level 2
  • Automatically generated for approved PIs with two subfolders:
    • Lab: Subfolders are visible to everyone in the lab. We recommend housing most of the data in this subfolder.
    • Everyone: Subfolders visible to anyone on the cluster, great for collaboration between labs.

netscratch

  • Description: Temporary storage location for high performance data analysis.
  • High performance.
  • Size: 50TB per group, 100 million inodes
  • Mount: /n/netscratch
  • No snapshots. No disaster recovery.
  • Retention: 90-day retention policy.
  • Cost: Free
  • Security level: Up to Level 2 (Level 3 with FASSE)
  • Automatically accessible if a member of the lab group with two subfolders:
    • Lab: Subfolders are visible to everyone in the lab. We recommend housing most of the data in this subfolder.
    • Everyone: Subfolders visible to anyone on the cluster, great for collaboration between labs.

Active Lab Storage for Compute (Tier 0)

  • Description: Storage folder intended for active analysis research data connected to the high-performance compute cluster.
  • High performance.
  • Size: 1-1024TB
  • Mount:/n/server/LABS/folder
  • No snapshots. No disaster recovery.
  • Cost: $50/yr per TB
  • Security level: Up to Level 2
  • Request storage allocation

Active Lab Storage with Snapshots (Tier 1)

  • Description: General purpose storage location for data analysis and project data. Best for irrecoverable data like raw datasets as it comes with backups.
  • Moderate performance.
  • Size: 1-1024TB
  • Mount: /rc_labs/folder
  • Daily snapshots for 7 days and weekly snapshots for 4 weeks. Includes disaster recovery.
  • Cost: $250/yr per TB
  • Security level: Up to Level 2
  • Request storage allocation

Active Lab Storage with Disaster Recovery (Tier 2)

  • Description: Intended for intermediary storage of research data for ongoing and recent projects.
  • Low/moderate performance.
  • Size: 1-306TB
  • Mount: /n/pi_lab
  • No snapshots. Includes disaster recovery.
  • Cost: $100/yr per TB
  • Security level: Up to Level 2
  • Request storage allocation

Cold Storage (Tape)

  • Description: Long-term storage of inactive research data after project completion for data retention purposes.
  • No performance or access.
  • Size: 20TB increments. Ten thousand files per folder. File sizes between 1GB to 100 GB.
  • Access: Tape-based access with Globus or S3
  • No snapshots. Includes disaster recovery.
  • Cost: $5/yr per TB
  • Security level: Up to Level 2
  • Request storage allocation

FASSE (Secure Enclave) 

  • Description: Secure storage environment for analysis or sensitive data, such as data generated using Data Use Agreements (DUAs) or IRB
  • Can be applied to Cluster Storage, Lab Storage, or Tier 2 based on project need.
  • Security level: Up to Level 3

Default Directory Structure

Two subdirectories will be created by default within the parent directory for most storage on the cluster. This helps enable Globus transfers and provide initial guidance for how to organize storage. The directories are:

Lab: This directory is intended as the primary working directory. It is also the directory shared out via Globus. By default, folders in this subdirectory are visible to the whole lab. Individual users may update their permissions to adjust access as they like though we highly recommend keeping access open to all lab members to allow for easier collaboration and data cleanup after you leave the university.

Everyone: This directory is visible to any one on the HPC cluster and is intended for collaboration with other labs on the cluster. Data in this directory is by default owned by the lab who hosts the data. Note that this directory is not available on Globus and is intended only for internal sharing.

While this is the default structure, labs may request additional folders be set up. Please email rchelp@rc.fas.harvard.edu if you have questions.

Directory structures on the cluster may differ depending on when they were created. Some older storage folders may have a third subdirectory called Users. We have deprecated use of this folder due to issues related to data access by the lab and PI’s, especially after users have left the university. If you are migrating data from a storage system that has a Users subdirectory we recommend moving that data into the Lab directory and making it available to the lab to view and access.

If you have questions regarding the data storage options at FASRC, please email the Research Data Manager at rdm@rc.fas.harvard.edu.

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.