Search Docs by Keyword
rclone – transfer files to/from cloud storage
Introduction
rclone is a convenient and performant command-line tool for transferring files and synchronizing directories directly between FAS RC file systems and Google Drive (or other supported cloud storage). If you are eligible, and don’t already have a Google Apps for Harvard account, see the Google Apps for Harvard Getting Started page. If you require help or support for your Harvard Google account or for Google Drive itself, please contact HUIT (ithelp@harvard.edu).
Configuring rclone
rclone must be configured before first use. rclone can be granted access to a limited scope of Google Drive (e.g., read-only, or only files rclone creates). The following rclone config
command grants the rclone application access to only files/folders it creates in Google Drive; to grant full (read/write) access to your Google Drive, omit the scope drive.file
arguments. See the Scopes section of the rclone Google Drive documentation for more info on other scopes.
[fasrcuser@boslogin02 ~]$ module load rclone
[fasrcuser@boslogin02 ~]$ rclone config create gdrive drive config_is_local false scope drive.file
2019/10/17 17:03:49 NOTICE: Config file "/n/home10/fasrcuser/.config/rclone/rclone.conf" not found - using defaults
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
Auto confirm is set: answering No, override by setting config parameter config_is_local=true
If your browser doesn't open automatically go to the following link: https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=123456789012.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&state=abcdef123456789abcdef123456789ab
Log in and authorize rclone for access
Enter verification code>
Next:
- Copy the URL that begins with https://accounts.google.com/ into a web browser window.
- Sign in with your Google Apps for Harvard account credentials.
- Click “Allow” to allow rclone to access your Google Drive.
- Copy the resulting verification code, and paste into the terminal window containing the FASRC cluster SSH session.
The resulting rclone configuration will be stored in your FAS RC home directory at ${HOME}/.config/rclone/rclone.conf . This file includes an OAuth2 access token; rclone access using this token may be revoked from your Google Account Third-party apps with account access page.
To configure access to a Shared drive, append the team_drive <ID>
option to the rclone config
command above, replacing <ID>
with shared drive ID found by navigating to the Google Drive web interface, and noting the ID from the URL: https://drive.google.com/drive/folders/
<ID>
Using rclone
rclone supports many subcommands (see the complete list of rclone subcommands). A few commonly-used subcommands (assuming a Google Drive configured as gdrive):
Listing / moving / deleting objects | |
rclone command | analogous Unix command |
rclone lsf gdrive:fasrc/subfolder | ls fasrc/subdir |
rclone lsf –format stp –separator ‘ ‘ gdrive:fasrc/subfolder | ls -l fasrc/subdir |
rclone mkdir gdrive:fasrc/subfolder | mkdir fasrc/subdir |
rclone move gdrive:fasrc/subfolder1/file1 gdrive:fasrc/subfolder2/ | mv fasrc/subdir/file1 fasrc/subdir |
rclone rmdir gdrive:fasrc/subfolder | rmdir fasrc/subdir |
rclone delete gdrive:fasrc/file | rm fasrc/file |
rclone purge gdrive:fasrc/subfolder | rm -r fasrc/subdir |
Transferring data
Small data transfers may be done on FAS RC cluster login nodes, while large data transfers should be done within an interactive job so that data transfer is done from a compute node; e.g.:
salloc -p test --mem 1G -t 6:00
Operands with the gdrive: prefix (assuming a Google Drive has been configured as gdrive) access Google Drive storage, while operands without gdrive: refer to a path on the FAS RC file system.
rclone copy gdrive:sourcepath destpath rclone copy sourcepath gdrive:destpath |
If sourcepath is a file, copy it to destpath. |
rclone sync –progress gdrive:sourcefolder destdir rclone sync –progress sourcedir gdrive:destfolder |
Replace contents of destdir/destfolder with the contents of sourcedir/sourcefolder (deleting any files not in the source). |
Mounting Google Drive on a FAS RC compute node
Alternatively, rclone mount can make a Google Drive (subfolder) available on a FAS RC compute node as a regular file system (e.g., supporting common commands; such as cp, mv, and ls; that are used to interact with a POSIX file system), with limitations.
The directory on the FAS RC node at which the Google Drive will be made available as a file system (i.e., the mountpoint) must be on a node-local file system (such as /scratch) to avoid permissions issues when unmounting the file system. In particular, the mountpoint must not be within a file system in the /n/ directory, as these are all remote / network file systems.
The following example illustrates demonstrates this capability:
$ module load rclone
$ rclone lsf gdrive:fasrc/
cactus:2019.03.01--py27hdbcaa40_1.sif
ifxpong:1.4.7-ood.sif
jbrowse:1.16.5_2019-06-14.sif
subfolder/
$ mkdir /scratch/$USER
$ mkdir -m 700 /scratch/$USER/gdrive
$ rclone mount gdrive:fasrc /scratch/$USER/gdrive &
[1] 68913
$ ls -l /scratch/$USER/gdrive/
$ ls -l /scratch/nweeks/gdrive/
total 543900
-rw-r--r-- 1 fasrcuser fasrcgroup 495247360 May 1 16:27 cactus:2019.03.01--py27hdbcaa40_1.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 50700288 Aug 22 16:05 ifxpong:1.4.7-ood.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 11005952 Jun 14 15:16 jbrowse:1.16.5_2019-06-14.sif
drwxr-xr-x 1 fasrcuser fasrcgroup 0 Oct 24 10:21 subfolder
cactus_2019.09.03-623cfc5.sif JBrowse-on-Cluster.tar.gz MAKER-cluster-guide-for-review.tar.gz
$ fusermount -uz /scratch/$USER/gdrive/
[1]+ Done rclone mount gdrive:fasrc /scratch/$USER/gdrive
Comments:
- The mountpoint (/scratch/$USER/gdrive) is created with appropriate permissions (via
mkdir -m 700
) to ensure only the owner has access. - The
rclone mount
command is executed asynchronously (“in the background”) using the&
operator. fusermount -uz
explicitly unmounts the Google Drive (causing therclone mount
process to terminate).- This performs a “lazy unmount”, which requests that the OS perform the unmount when there are no processes whose current working directory is within the directory tree rooted at the mountpoint. To guard against accidentally leaving the directory mounted if a job or interactive session is prematurely terminated, the
fusermount -uz
command can be immediately issued after setting the working directory of the shell process that issues therclone mount
command can to the gdrive mountpoint; e.g.:rclone mount gdrive:fasrc /scratch/$USER/gdrive & cd /scratch/$USER/gdrive && fusermount -uz .
Then /scratch/$USER/gdrive will be automatically unmounted when the shell’s process has terminated or its working directory changed to a directory outside of /scratch/$USER/gdrive:
cd .. [1]+ Done rclone mount gdrive:fasrc /scratch/$USER/gdrive
- This performs a “lazy unmount”, which requests that the OS perform the unmount when there are no processes whose current working directory is within the directory tree rooted at the mountpoint. To guard against accidentally leaving the directory mounted if a job or interactive session is prematurely terminated, the
Limitations
At most 2 file transfers to Google Drive can be initiated per per second. Consider bundling many small files into a .zip or .tar(.gz) file.
Other Google drive limitations are listed in the rclone Google Drive documentation.
Bookmarkable Section Links