Search Docs by Keyword
Data Transfers with rclone
Introduction
rclone is a convenient and performant command-line tool for transferring files and synchronizing directories directly between FAS RC file systems and Google Drive (or other supported cloud storage). If you are eligible, and don’t already have a Google Apps for Harvard account, see the Google Apps for Harvard Getting Started page. If you require help or support for your Harvard Google account or for Google Drive itself, please contact HUIT (ithelp@harvard.edu).
Configuring rclone
rclone must be configured before first use. Each cloud service has a specific configuration. Visit rclone documentation to find the specific cloud service that you need, click on its specific “config”, and follow the rclone config
steps.
To configure access to a Google shared drive, visit rclone google drive configuration. During the configuration, there is an option to select “Configure this as a Shared Drive (Team Drive)?”
Using rclone
rclone supports many subcommands (see the complete list of rclone subcommands). A few commonly-used subcommands (assuming a Google Drive configured as gdrive):
Listing / moving / deleting objects | |
rclone command | analogous Unix command |
rclone lsf gdrive:fasrc/subfolder | ls fasrc/subdir |
rclone lsf –format stp –separator ‘ ‘ gdrive:fasrc/subfolder | ls -l fasrc/subdir |
rclone mkdir gdrive:fasrc/subfolder | mkdir fasrc/subdir |
rclone move gdrive:fasrc/subfolder1/file1 gdrive:fasrc/subfolder2/ | mv fasrc/subdir/file1 fasrc/subdir |
rclone rmdir gdrive:fasrc/subfolder | rmdir fasrc/subdir |
rclone delete gdrive:fasrc/file | rm fasrc/file |
rclone purge gdrive:fasrc/subfolder | rm -r fasrc/subdir |
Transferring data
Small data transfers may be done on FAS RC cluster login nodes, while large data transfers should be done within an interactive job so that data transfer is done from a compute node; e.g.:
salloc -p test --mem 1G -t 6:00
Operands with the gdrive: prefix (assuming a Google Drive has been configured as gdrive) access Google Drive storage, while operands without gdrive: refer to a path on the FAS RC file system.
rclone copy gdrive:sourcepath destpath rclone copy sourcepath gdrive:destpath |
If sourcepath is a file, copy it to destpath. |
rclone sync –progress gdrive:sourcefolder destdir rclone sync –progress sourcedir gdrive:destfolder |
Replace contents of destdir/destfolder with the contents of sourcedir/sourcefolder (deleting any files not in the source). |
Mounting Google Drive on a FAS RC compute node
Alternatively, rclone mount can make a Google Drive (subfolder) available on a FAS RC compute node as a regular file system (e.g., supporting common commands; such as cp, mv, and ls; that are used to interact with a POSIX file system), with limitations.
The directory on the FAS RC node at which the Google Drive will be made available as a file system (i.e., the mountpoint) must be on a node-local file system (such as /scratch) to avoid permissions issues when unmounting the file system. In particular, the mountpoint must not be within a file system in the /n/ directory, as these are all remote / network file systems.
The following example illustrates demonstrates this capability:
$ rclone lsf gdrive:fasrc/
cactus:2019.03.01--py27hdbcaa40_1.sif
ifxpong:1.4.7-ood.sif
jbrowse:1.16.5_2019-06-14.sif
subfolder/
$ mkdir /scratch/$USER
$ mkdir -m 700 /scratch/$USER/gdrive
$ rclone mount --daemon gdrive:fasrc /scratch/$USER/gdrive
$ ls -l /scratch/$USER/gdrive/
total 543900
-rw-r--r-- 1 fasrcuser fasrcgroup 495247360 May 1 16:27 cactus:2019.03.01--py27hdbcaa40_1.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 50700288 Aug 22 16:05 ifxpong:1.4.7-ood.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 11005952 Jun 14 15:16 jbrowse:1.16.5_2019-06-14.sif
drwxr-xr-x 1 fasrcuser fasrcgroup 0 Oct 24 10:21 subfolder
cactus_2019.09.03-623cfc5.sif JBrowse-on-Cluster.tar.gz MAKER-cluster-guide-for-review.tar.gz
$ fusermount -uz /scratch/$USER/gdrive/
[1]+ Done rclone mount gdrive:fasrc /scratch/$USER/gdrive
Comments:
- The mountpoint (/scratch/$USER/gdrive) is created with appropriate permissions (via
mkdir -m 700
) to ensure only the owner has access. - The
rclone mount
command is executed asynchronously (“in the background”) using the&
operator. fusermount -uz
explicitly unmounts the Google Drive (causing therclone mount
process to terminate).- This performs a “lazy unmount”, which requests that the OS perform the unmount when there are no processes whose current working directory is within the directory tree rooted at the mountpoint. To guard against accidentally leaving the directory mounted if a job or interactive session is prematurely terminated, the
fusermount -uz
command can be immediately issued after setting the working directory of the shell process that issues therclone mount
command can to the gdrive mountpoint; e.g.:rclone mount --daemon gdrive:fasrc /scratch/$USER/gdrive cd /scratch/$USER/gdrive && fusermount -uz .
Then /scratch/$USER/gdrive will be automatically unmounted when the shell’s process has terminated or its working directory changed to a directory outside of /scratch/$USER/gdrive:
cd .. [1]+ Done rclone mount gdrive:fasrc /scratch/$USER/gdrive
- This performs a “lazy unmount”, which requests that the OS perform the unmount when there are no processes whose current working directory is within the directory tree rooted at the mountpoint. To guard against accidentally leaving the directory mounted if a job or interactive session is prematurely terminated, the
Limitations
At most 2 file transfers to Google Drive can be initiated per per second. Consider bundling many small files into a .zip or .tar(.gz) file.
Other Google drive limitations are listed in the rclone Google Drive documentation.