Search Docs by Keyword
ssh to a compute node
When you have a running batch job, you can ssh
to the compute node where the job is running. This is a nice feature to:
- monitor your job current memory and cpu usage with
top
orhtop
- monitor your job current GPU usage with
nvtop
Prerequisites
To ssh
to a compute node, you must meet all four prerequisites:
- Have a keypair in
$HOME/.ssh
directory - Have the public key added in
$HOME/.ssh/authorized_keys
- Must be on the cluster, in a login or compute node. You cannot
ssh
to a compute node from your local machine. - Have a running job on a compute node
1. Key pair
NOTE: This is a one-time setup. If you have the key pair, you can skip this step.
By default, you should have a key pair in your $HOME/.ssh
directory. The key pair typically is id_rsa
and id_rsa.pub
, where id_rsa
is the private key and id_rsa.pub
is the public key. Another possible pair is id_ed25519
and id_ed25519.pub
. Or if you picked a name for your key pair, it may have a non-standard name. This is an example of id_rsa
:
[jharvard@holylogin07 ~]$ cd .ssh [jharvard@holylogin07 .ssh]$ ls -l total 256 -rw-r--r--. 1 jharvard jharvard_lab 403 Mar 20 2020 authorized_keys -rw-------. 1 jharvard jharvard_lab 1679 Mar 20 2020 id_rsa -rw-r--r--. 1 jharvard jharvard_lab 403 Mar 20 2020 id_rsa.pub -rw-r--r--. 1 jharvard jharvard_lab 3330 Jun 5 16:19 known_hosts
If you do not see id_rsa
and id_rsa.pub
in $HOME/.ssh
directory, you can create a pair with the command
ssh-keygen -t rsa
2. Add public key
NOTE: This is a one-time setup. If you have the public key in $HOME/.ssh/authorized_keys, you can skip this step.
The public key (e.g. id_rsa.pub
) must be added to the file $HOME/.ssh/authorized_keys
. You can check if it has been added by looking at the contents of the file $HOME/.ssh/id_rsa.pub and then see if you see the contents of id_rsa.pub
is in $HOME/.ssh/authorized_keys. If it is not, you can add it with the command:
[jharvard@holylogin07 ~]$ cd $HOME/.ssh [jharvard@holylogin07 ~]$ cat id_rsa.pub >> authorized_keys
3. Be on the cluster
You must be already on the cluster, i.e. you cannot ssh
to a compute node from your local machine.
You can ssh
to a compute node from a login node (see Command-line Access with Terminal) or a compute node on the cluster.
4. Running job
You can only access a compute node that has a running job. As soon as the job is finished or cancelled or exited due to an error, you will no longer be able to access the compute node.
Ensure that you have a running job and find the compute node where the job is running on (see Convenient Slurm Commands):
[jharvard@holylogin07 ~]$ squeue -u jharvard -t RUNNING JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 47564138 test .fasrcoo jharvard R 0:28 1 holy8a24402
The last column, NODELIST
, is the name of the compute node where job 47564138 is running.
Access the compute node
From a login node, simply type:
ssh <compute_node_name>
where you obtained <compute_node_name>
on Step 3 Running Job.
Example:
[jharvard@holylogin07 ~]$ ssh holy8a24402 The authenticity of host 'holy8a24402 (10.31.146.100)' can't be established. ECDSA key fingerprint is SHA256:yhtWldxhLKRfbGeB1x4OcgFLZI0sWWVSUE83YfJQ4hU. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'holy8a24402,10.31.146.100' (ECDSA) to the list of known hosts. Last login: Fri Sep 20 12:18:43 2024 [jharvard@holy8a24402 ~]$
Note that, in the first prompt (first line of the code block above), jharvard is in holylogin07
, a login node. The second prompt (last line of the code block above), jharvard is in holy8a24402
, a compute node.