The Trinity software is used to build transcripts by de novo assembly of RNA-Seq reads from NextGen sequence data. The package is a complex integration of three stages: inchworm, chrysalis, and butterfly. To run this efficiently on the FASRC cluster, these three stages should be broken in to three parts:
inchworm + chrysalis, which are RAM bound butterfly, which is processor bound
Using a SLURM job dependency and the
--gridconf option in Trinity, you can submit your assembly to the cluster to run these two parts sequentially, using the cluster resources most efficiently. Please see pages 41 & 42 of our Informatics tutorial on Genome/Transcript Assembly.
NOTE: only use the
bigmem partition if you require more than 256 GB of RAM for the first two stages. Trinity RAM requirements are discussed here. If you require less than this, please submit to
NOTE: you can also sidestep the large RAM rquirements by performing in silico digital normalization of your input FASTQ files, as documented on the Trinity site and in our Informatics recipe.