Search Docs by Keyword

Table of Contents

ssh to a compute node

When you have a running batch job, you can ssh to the compute node where the job is running. This is a nice feature to:

  • monitor your job current memory and cpu usage with top or htop
  • monitor your job current GPU usage with nvtop

Prerequisites

To ssh to a compute node, you must meet all four prerequisites:

  1. Have a keypair in $HOME/.ssh directory
  2. Have the public key added in $HOME/.ssh/authorized_keys
  3. Must be on the cluster, in a login or compute node. You cannot ssh to a compute node from your local machine.
  4. Have a running job on a compute node

1. Key pair

NOTE: This is a one-time setup. If you have the key pair, you can skip this step.

By default, you should have a key pair in your $HOME/.ssh directory. The key pair typically is id_rsa and id_rsa.pub, where id_rsa is the private key and id_rsa.pub is the public key. Another possible pair is id_ed25519 and id_ed25519.pub. Or if you picked a name for your key pair, it may have a non-standard name. This is an example of id_rsa:

[jharvard@holylogin07 ~]$ cd .ssh
[jharvard@holylogin07 .ssh]$ ls -l
total 256
-rw-r--r--. 1 jharvard jharvard_lab 403 Mar 20 2020 authorized_keys
-rw-------. 1 jharvard jharvard_lab 1679 Mar 20 2020 id_rsa
-rw-r--r--. 1 jharvard jharvard_lab 403 Mar 20 2020 id_rsa.pub
-rw-r--r--. 1 jharvard jharvard_lab 3330 Jun 5 16:19 known_hosts

If you do not see id_rsa and id_rsa.pub in $HOME/.ssh directory, you can create a pair with the command

ssh-keygen -t rsa

2. Add public key

NOTE: This is a one-time setup. If you have the public key in $HOME/.ssh/authorized_keys, you can skip this step.

The public key (e.g. id_rsa.pub) must be added to the file $HOME/.ssh/authorized_keys. You can check if it has been added by looking at the contents of the file $HOME/.ssh/id_rsa.pub and then see if you see the contents of  id_rsa.pub is in $HOME/.ssh/authorized_keys. If it is not, you can add it with the command:

[jharvard@holylogin07 ~]$ cd $HOME/.ssh
[jharvard@holylogin07 ~]$ cat id_rsa.pub >> authorized_keys

3. Be on the cluster

You must be already on the cluster, i.e. you cannot ssh to a compute node from your local machine.

You can ssh to a compute node from a login node (see Command-line Access with Terminal) or a compute node on the cluster.

4. Running job

You can only access a compute node that has a running job. As soon as the job is finished or cancelled or exited due to an error, you will no longer be able to access the compute node.

Ensure that you have a running job and find the compute node where the job is running on (see Convenient Slurm Commands):

[jharvard@holylogin07 ~]$ squeue -u jharvard -t RUNNING
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
47564138 test .fasrcoo jharvard R 0:28 1 holy8a24402

The last column, NODELIST, is the name of the compute node where job 47564138 is running.

Access the compute node

From a login node, simply type:

ssh <compute_node_name>

where you obtained <compute_node_name> on Step 3 Running Job.

Example:

[jharvard@holylogin07 ~]$ ssh holy8a24402
The authenticity of host 'holy8a24402 (10.31.146.100)' can't be established.
ECDSA key fingerprint is SHA256:yhtWldxhLKRfbGeB1x4OcgFLZI0sWWVSUE83YfJQ4hU.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'holy8a24402,10.31.146.100' (ECDSA) to the list of known hosts.
Last login: Fri Sep 20 12:18:43 2024
[jharvard@holy8a24402 ~]$

Note that, in the first prompt (first line of the code block above), jharvard is in holylogin07, a login node. The second prompt (last line of the code block above), jharvard is in holy8a24402, a compute node.

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.