Search Docs by Keyword

Table of Contents

Stata on the FASRC clusters

Description

STATA, a powerful statistical software package, is widely used by researchers, analysts, and academics across various disciplines. Renowned for its versatility, STATA enables users to efficiently analyze, manage, and visualize data, making it an indispensable tool for both novice and advanced data practitioners. Its intuitive command syntax facilitates seamless data manipulation, regression analysis, time-series modeling, and more, empowering users to uncover meaningful insights from complex datasets. With its robust suite of features and user-friendly interface, STATA continues to be a cornerstone in statistical analysis and research methodology.

Examples

You can find examples on how to run STATA in both serial and parallel modes, in the FASRC User Codes GitHub repository.

Open OnDemand

STATA can be run from Open OnDemand (OOD, formerly known as VDI) by clicking on Stata icon or choosing it from the Interactive Apps menu, and specifying your resource needs. Hit Launch, wait for the session to start, and click the “Launch Stata” button.

You can also launch Stata from the Remote Desktop app on OOD.

Output file permissions

Stata appears to override filesystem-level permissions structures such as file-ACLs.  In a test using stata/14.0-fasrc01 the .dta files produced by Stata were consistent with the user’s umask, despite default file-ACLs that should have created different effective permissions.  It appears as though Stata is modifying the permissions after writing the file (i.e. after the default file-ACLs have been applied).  The solution for this should be to set the desired umask in the Slurm submission script or on the command line prior to submitting the batch job (though a ‘umask’ command in the Slurm submission script would be preferable in most cases).

Running Stata on FASSE

For how to set the proxies on Stata, see our FASSE proxy documentation.

Troubleshooting

Stata I/O error

If you get the error:

I/O error writing .dta file
Usually such I/O errors are caused by the disk or file system being full.

This is because Stata writes temporary files to disk (instead of only storing on memory). The location that Stata writes temporary files to disk is set, by default, to \tmp. If \tmp‘s size is smaller than your combined datasets, then Stata will not have enough space to write temporary files.

Solution: set the environmental variable STATATMP to a directory that is large enough.

Stata stand-alone app (Open OnDemand)

You can avoid this error by increasing the value in the option “Minimum size of STATATMP in GB”. We recommend increasing to twice the size of the datasets that you are using.

If your total size is more than 68GB, check the size available on each partition on Cannon and FASSE (see last column “/scratch size (GB)”).

On FASSE, if you do not have access to a partition that has enough /scratch space, you can:

  1. use partition serial_requeue (do note that serial_requeue jobs may be preempted)
  2. use the Stata with Remote Desktop app (explained below), but instead of using /scratch, use /n/netscratch/PI_lab

Command line interface or Stata from Remote Desktop app

You must set STATATMP before launching Stata with:

export STATATMP=/scratch/$USER/stata_dir
mkdir -p $STATATMP

Alternatively, you can set to a lab share as well. However, local scratch has better performance.

Additionally, when you submit a job, you can add the slurm option --tmp to request that the local disk has a minimum size. For example, this requests a disk with at least 150GB:

#SBATCH --tmp=150G

Resources

These are some external resources with many examples on how to use Stata:

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.