Search Docs by Keyword

Table of Contents

Data Transfers with rclone

 

Introduction

rclone is a convenient and performant command-line tool for transferring files and synchronizing directories directly between FAS RC file systems and Google Drive (or other supported cloud storage). If you are eligible, and don’t already have a Google Apps for Harvard account, see the Google Apps for Harvard Getting Started page. If you require help or support for your Harvard Google account or for Google Drive itself, please contact HUIT (ithelp@harvard.edu).

Configuring rclone

rclone must be configured before first use. Each cloud service has a specific configuration. Visit rclone documentation to find the specific cloud service that you need, click on its specific “config”, and follow the rclone config steps.

Google Shared Drives

To configure access to a Google shared drive, visit rclone google drive configuration. During the configuration, there is an option to select “Configure this as a Shared Drive (Team Drive)?”

Using rclone

rclone supports many subcommands (see the complete list of rclone subcommands). A few commonly-used subcommands (assuming a Google Drive configured as gdrive):

Listing / moving / deleting objects
rclone command analogous Unix command
rclone lsf gdrive:fasrc/subfolder ls fasrc/subdir
rclone lsf –format stp –separator ‘ ‘ gdrive:fasrc/subfolder ls -l fasrc/subdir
rclone mkdir gdrive:fasrc/subfolder mkdir fasrc/subdir
rclone move gdrive:fasrc/subfolder1/file1 gdrive:fasrc/subfolder2/ mv fasrc/subdir/file1 fasrc/subdir
rclone rmdir gdrive:fasrc/subfolder rmdir fasrc/subdir
rclone delete gdrive:fasrc/file rm fasrc/file
rclone purge gdrive:fasrc/subfolder rm -r fasrc/subdir

 

Transferring data

Small data transfers may be done on FAS RC cluster login nodes, while large data transfers should be done within an interactive job so that data transfer is done from a compute node; e.g.:

salloc -p test --mem 1G -t 6:00

Operands with the gdrive: prefix (assuming a Google Drive has been configured as gdrive) access Google Drive storage, while operands without gdrive: refer to a path on the FAS RC file system.

rclone copy gdrive:sourcepath destpath
rclone copy sourcepath gdrive:destpath

If sourcepath is a file, copy it to destpath.
If sourcepath is a directory/folder, recursively copy its contents to destpath. Contents of destpath that are not in sourcepath will be retained.

rclone sync –progress gdrive:sourcefolder destdir
rclone sync –progress sourcedir gdrive:destfolder

Replace contents of destdir/destfolder with the contents of sourcedir/sourcefolder (deleting any files not in the source).

 

Mounting Google Drive on a FAS RC compute node

Alternatively, rclone mount can make a Google Drive (subfolder) available on a FAS RC compute node as a regular file system (e.g., supporting common commands; such as cp, mv, and ls; that are used to interact with a POSIX file system), with limitations.

The directory on the FAS RC node at which the Google Drive will be made available as a file system (i.e., the mountpoint) must be on a node-local file system (such as /scratch) to avoid permissions issues when unmounting the file system. In particular, the mountpoint must not be within a file system in the /n/ directory, as these are all remote / network file systems.
The following example illustrates demonstrates this capability:

$ rclone lsf gdrive:fasrc/
cactus:2019.03.01--py27hdbcaa40_1.sif
ifxpong:1.4.7-ood.sif
jbrowse:1.16.5_2019-06-14.sif
subfolder/
$ mkdir /scratch/$USER
$ mkdir -m 700 /scratch/$USER/gdrive
$ rclone mount --daemon gdrive:fasrc /scratch/$USER/gdrive
$ ls -l /scratch/$USER/gdrive/
total 543900
-rw-r--r-- 1 fasrcuser fasrcgroup 495247360 May  1 16:27 cactus:2019.03.01--py27hdbcaa40_1.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 50700288 Aug 22 16:05 ifxpong:1.4.7-ood.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 11005952 Jun 14 15:16 jbrowse:1.16.5_2019-06-14.sif
drwxr-xr-x 1 fasrcuser fasrcgroup 0 Oct 24 10:21 subfolder
cactus_2019.09.03-623cfc5.sif  JBrowse-on-Cluster.tar.gz  MAKER-cluster-guide-for-review.tar.gz
$ fusermount -uz /scratch/$USER/gdrive/
[1]+  Done                    rclone mount gdrive:fasrc /scratch/$USER/gdrive

Comments:

  • The mountpoint (/scratch/$USER/gdrive) is created with appropriate permissions (via mkdir -m 700) to ensure only the owner has access.
  • The rclone mount command is executed asynchronously (“in the background”) using the & operator.
  • fusermount -uz explicitly unmounts the Google Drive (causing the rclone mount process to terminate).
    • This performs a “lazy unmount”, which requests that the OS perform the unmount when there are no processes whose current working directory is within the directory tree rooted at the mountpoint. To guard against accidentally leaving the directory mounted if a job or interactive session is prematurely terminated, the fusermount -uz command can be immediately issued after setting the working directory of the shell process that issues the rclone mount command can to the gdrive mountpoint; e.g.:
      rclone mount --daemon gdrive:fasrc /scratch/$USER/gdrive
      cd /scratch/$USER/gdrive && fusermount -uz .

      Then /scratch/$USER/gdrive will be automatically unmounted when the shell’s process has terminated or its working directory changed to a directory outside of /scratch/$USER/gdrive:

      cd ..
      [1]+ Done rclone mount gdrive:fasrc /scratch/$USER/gdrive
      

Limitations

At most 2 file transfers to Google Drive can be initiated per per second. Consider bundling many small files into a .zip or .tar(.gz) file.
Other Google drive limitations are listed in the rclone Google Drive documentation.

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.