# MATLAB Parallel – PCT and DCS

• MATLAB Parallel - PCT and DCS

NOTE: `matlab-default` is no longer needed to run parallel MATLAB applications. This has been restored to `matlab` only. Please update your workflows accordingly to reflect this change.

## Introduction

This page is intended to help you with running parallel MATLAB codes on the Odyssey cluster. The latest software modules supporting parallel computing with MATLAB available on the cluster are:

```matlab/R2018b-fasrc01
matlab/R2018a-fasrc01
matlab/R2017b-fasrc01
matlab/R2017a-fasrc02
matlab/R2016b-fasrc02
matlab/R2016a-fasrc02
```

Parallel processing with MATLAB is performed with the help of two products, Parallel Computing Toolbox (PCT) and Distributed Computing Server (DCS).

## Parallel Computing Toolbox

Currently, PCT provides up to 32 workers (MATLAB computational engines) to execute applications locally on a multicore machine. This means that with the toolbox one could run parallel MATLAB codes locally on the compute nodes and use up to 32 cores.

### Parallel FOR loops (parfor)

Below is a simple code illustrating the use of PCT to calculate PI via a parallel Monte-Carlo method. This example also illustrates the use of parfor (parallel FOR) loops. In this scheme, suitable FOR loops could be simply replaced by parallel FOR loops without other changes to the code:

```%============================================================================
% Parallel Monte Carlo calculation of PI
%============================================================================
R = 1;
darts = 1e7;
count = 0;
tic
parfor i = 1:darts
% Compute the X and Y coordinates of where the dart hit the...............
% square using Uniform distribution.......................................
x = R*rand(1);
y = R*rand(1);
if x^2 + y^2 <= R^2
% Increment the count of darts that fell inside of the.................
% circle...............................................................
count = count + 1; % Count is a reduction variable.
end
end
% Compute pi.................................................................
myPI = 4*count/darts;
T = toc;
fprintf('The computed value of pi is %8.7f.n',myPI);
fprintf('The parallel Monte-Carlo method is executed in %8.2f seconds.n', T);
delete(gcp);
exit;
```

Important: When using `parpool` in MATLAB, you need include the statement `parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))` in your code. This statement tells MATLAB to start `SLURM_CPUS_PER_TASK` workers on the local machine (the compute node where your job lands). When the parallel computation is done, the MATLAB workers are released with the statement `delete(gcp)`. If the above code is named, e.g., `pfor.m`, it can be sent to the queue with the below batch-job submission script. It starts a MATLAB parallel job with 8 workers:

```#!/bin/bash
#SBATCH -J pfor
#SBATCH -o pfor.out
#SBATCH -e pfor.err
#SBATCH -N 1
#SBATCH -c 8
#SBATCH -t 0-00:30
#SBATCH -p shared
#SBATCH --mem=32G

srun -c \$SLURM_CPUS_PER_TASK matlab -nosplash -nodesktop -r "pfor"
```

The highlighted (in red) SBATCH directives reassure that there are 8 processing cores for the calculation, and they all reside on the same compute node.
If the submission script is named `pfor.run`, it is submitted to the queue by typing in:

```\$ sbatch pfor.run
Submitted batch job 1885302
```

When the job has completed the `pfor.out` output file is generated.

```                            < M A T L A B (R) >
R2018b (9.5.0.944444) 64-bit (glnxa64)
August 28, 2018
To get started, type doc.
For product information, visit www.mathworks.com.
Starting parallel pool (parpool) using the 'local' profile ...
connected to 8 workers.
ans =
Pool with properties:
Connected: true
NumWorkers: 8
Cluster: local
AttachedFiles: {}
IdleTimeout: 30 minutes (30 minutes remaining)
SpmdEnabled: true
The computed value of pi is 3.1410644.
The parallel Monte-Carlo method is executed in     2.14 seconds.
```

Any runtime errors would go to the file `pfor.err`.

### Single Program Multiple Data (SPMD)

In addition, MATLAB also provides a single program multiple data (SPMD) parallel programming model, which allows for a greater control over the parallelization — tasks could be distributed and assigned to parallel processes ( labs or workers in MATLAB's terminology ) depending on their ranks. The below code provides a simple illustration — it prints out the worker rank from each MATLAB lab:

```%====================================================================
% Illustration of SPMD Parallel Programming model with MATLAB
%====================================================================
% Start of parallel region...........................................
spmd
nproc = numlabs;  % get total number of workers
iproc = labindex; % get lab ID
if ( iproc == 1 )
fprintf ( 1, ' Running with  %d labs.n', nproc );
end
for i = 1: nproc
if iproc == i
fprintf ( 1, ' Rank %d out of  %d.n', iproc, nproc );
end
end
% End of parallel region.............................................
end
delete(gcp);
exit;
```

If the code is named `spmd_test.m`, it could be sent to the queue with this script

```#!/bin/bash
#SBATCH -J spmd_test
#SBATCH -o spmd_test.out
#SBATCH -e spmd_test.err
#SBATCH -N 1
#SBATCH -c 8
#SBATCH -t 0-00:30
#SBATCH -p shared
#SBATCH --mem=4000

srun -c \$SLURM_CPUS_PER_TASK matlab -nosplash -nodesktop -r "spmd_test"
```

If the batch-job submission script is named `spmd_test.run`, then it is sent to the queue with

```\$ sbatch spmd_test.run
Submitted batch job 1896986
```

The output is printed out to the file `spmd_test.out`:

```                            < M A T L A B (R) >
R2018b (9.5.0.944444) 64-bit (glnxa64)
August 28, 2018
To get started, type doc.
For product information, visit www.mathworks.com.
Starting parallel pool (parpool) using the 'local' profile ...
connected to 8 workers.
ans =
Pool with properties:
Connected: true
NumWorkers: 8
Cluster: local
AttachedFiles: {}
IdleTimeout: 30 minutes (30 minutes remaining)
SpmdEnabled: true
Lab 1:
Running with  8 labs.
Rank 1 out of  8.
Lab 2:
Rank 2 out of  8.
Lab 3:
Rank 3 out of  8.
Lab 4:
Rank 4 out of  8.
Lab 5:
Rank 5 out of  8.
Lab 6:
Rank 6 out of  8.
Lab 7:
Rank 7 out of  8.
Lab 8:
Rank 8 out of  8.
Parallel pool using the 'local' profile is shutting down.
```

## Distributed Computing Server

The DCS allows for a larger number of MATLAB workers to be used on a single node and/or across several compute nodes. The current DCS license we have on the cluster allows for using up to 256 MATLAB workers. DCS is integrated with SLURM and works with MATLAB versions R2017a, R2017b, R2018a and R2018b, available with modules matlab/R2017a-fasrc02, matlab/R2017b-fasrc01, matlab/R2018a-fasrc01 and matlab/R2018b-fasrc01. The below example steps describe how to set up and use DCS on the Research Computing cluster:
(1) Log on to the cluster and start an interactive / test bash shell.

```\$ srun -p test -N 1 -c 4 -t 0-06:00 --pty --mem=16G bash
```

(2) Start MATLAB on the command line and configure DCS to run parallel jobs on Odyssey by calling `configCluster`. This command needs to be run only once for each MATLAB version.

• Start an interactive bash-shell:
```# Load a MATLAB software module, e.g.,
# Start MATLAB interactively without a GUI
\$ matlab -nosplash -nodesktop -nodisplay
```
• Run `configCluster` in the MATLAB shell:
```>> configCluster
Must set WallTime and QueueName before submitting jobs to ODYSSEY.  E.g.
>> c = parcluster('odyssey');
>> % 5 hour walltime
>> c.saveProfile
```

(3) Setup job parameters, e.g., Wall Time, queue / partition, Memory-Per-CPU, etc. The below example illustrates how this can be done interactively. Once these parameters are set up, their values become default unless changed.

```>> c = parcluster('odyssey');                    % Define a cluster object
>> c.AdditionalProperties.WallTime = '05:00:00'; % Time limit
>> c.AdditionalProperties.QueueName = 'shared';  % Partition
>> c.AdditionalProperties.MemUsage = '4000';     % Memory per CPU in MB
>> c.saveProfile                                 % Save cluster profile. This becomes default until changed
```

(4) Display parallel cluster configuration with `c.AdditionalProperties`.
NOTE: This lists the available cluster options and their current values. These options could be set up as desired.

```>> c.AdditionalProperties
ans =
AccountName: ''
Constraint: ''
DebugMessagesTurnedOn: 0
GpusPerNode: 0
MemUsage: '4000'
ProcsPerNode: 0
QueueName: 'shared'
WallTime: '05:00:00'
```

(5) Submit parallel DCS jobs. There are two ways to submit parallel DCS jobs – from within MATLAB, and directly through SLURM.

### Submitting DCS jobs from within MATLAB

We will illustrate submitting DCS jobs from within MATLAB with a specific example. Below is a simple function evaluating the integer sum from 1 through N in parallel:

```%==========================================================
% Function: parallel_sum( N )
%           Calculates integer sum from 1 to N in parallel
%==========================================================
function s = parallel_sum(N)
s = 0;
parfor i = 1:N
s = s + i;
end
fprintf('Sum of numbers from 1 to %d is %d.n', N, s);
end
```

Use the `batch` command to submit parallel jobs to the cluster. The batch command will return a job object which is used to access the output of the submitted jobs. See the example below and refer to the official MATLAB documentation for more help on batch. This assumes that the MATLAB function is named `parallel_sum.m`. Note that these jobs will always request n+1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs 8 workers will consume 9 CPU cores.

```% Define a cluster object
>> c = parcluster('odyssey');
% Define a job object using batch
>> j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);
```

Notice, that this will start a job with one more MATLAB worker (9 instead of 8). This is because one parallel instance is required to manage the pool of workers (see below).

```>> j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);
```

You can quire the job status with `j.Status`

```>> j.State
ans =
'finished'
```

Once the job completes, we can retrieve the job results. This is done by calling the function `fetchOutputs`.

```>> j.fetchOutputs{:}
ans =
5050
```

NOTE: `fetchOutputs` is used to retrieve function output arguments. Data that has been written to files on the cluster needs to be retrieved directly from the filesystem.
If needed, one may also access job log files. This is particularly useful for debugging. This is done with the `c.getDebugLog(j)` command, e.g.,

```>> c.getDebugLog(j)
LOG FILE OUTPUT:
Node list: holy7c[03205-03206]
mpiexec.hydra -l -n 9 /n/sw/helmod/apps/centos7/Core/matlab/R2018b-fasrc01/bin/worker -parallel

                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018


                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018


                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018


                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018


                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018


                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018







 To get started, type doc.
 To get started, type doc.
 To get started, type doc.
 To get started, type doc.
 To get started, type doc.
 For product information, visit www.mathworks.com.
 For product information, visit www.mathworks.com.

 For product information, visit www.mathworks.com.

 For product information, visit www.mathworks.com.

 For product information, visit www.mathworks.com.


 To get started, type doc.
 For product information, visit www.mathworks.com.


                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018


                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018


                             < M A T L A B (R) >
                   Copyright 1984-2018 The MathWorks, Inc.
                    R2018b (9.5.0.944444) 64-bit (glnxa64)
                               August 28, 2018




 To get started, type doc.
 To get started, type doc.
 To get started, type doc.
 For product information, visit www.mathworks.com.
 For product information, visit www.mathworks.com.
 For product information, visit www.mathworks.com.



 Sending a stop signal to all the labs...
 2019-02-26 15:30:18 | About to exit MATLAB normally
 2019-02-26 15:30:19 | About to exit with code: 0
Exiting with code: 0
```

When the results are no longer needed the job could be deleted.

```% Delete the job after the results are no longer needed
j.delete
```

### Submitting DCS jobs directly through SLURM

Parallel DCS jobs could be submitted directly from the Unix command line through SLURM. For this, in addition to the MATLAB source, one needs to prepare a MATLAB submission script with the job specifications. An example is shown below:

```%==========================================================
% MATLAB job submission script: parallel_batch.m
%==========================================================
c = parcluster('odyssey');
j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);
exit;
```

If this is script is named, for instance, `parallel_batch.m`, it is submitted to the queue with the help of the following SLURM batch-job submission script:

```#!/bin/bash
#SBATCH -J parallel_sum_DCS
#SBATCH -o parallel_sum_DCS.out
#SBATCH -e parallel_sum_DCS.err
#SBATCH -p shared
#SBATCH -c 1
#SBATCH -t 0-00:20
#SBATCH --mem=4000

srun -c 1 matlab -nosplash -nodesktop -r "parallel_batch"
```

Assuming the above script is named `parallel_sum_DCS.run`, for instance, the job is submitted as usual with

```sbatch parallel_sum_DCS.run
```

NOTE: This scheme dispatches 2 jobs – one serial that spawns the actual DCS parallel jobs, and another, the actual parallel job.
Once submitted, the DCS parallel job can be monitored and managed directly through SLURM.

```\$ sacct
JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1916487      parallel_+     shared   rc_admin          1  COMPLETED      0:0
1916487.bat+      batch              rc_admin          1  COMPLETED      0:0
1916487.ext+     extern              rc_admin          1  COMPLETED      0:0
1916487.0        matlab              rc_admin          1  COMPLETED      0:0
1916831            Job3     shared   rc_admin          9  COMPLETED      0:0
1916831.bat+      batch              rc_admin          8  COMPLETED      0:0
1916831.ext+     extern              rc_admin          9  COMPLETED      0:0
1916831.0     pmi_proxy              rc_admin          2  COMPLETED      0:0
```

After the job completes, one can fetch results and delete job object from within MATLAB. If program writes directly to disk fetching is not necessary.

```>> j.fetchOutputs{:};
>> j.delete;
```