Search Docs by Keyword
Summary.htm Details
Summary.htm
The Summary.htm file is the file you should review after your
analysis is complete. The key parameters that you should examine are
listed in the following:
Clusters
This column contains the average number of clusters per tile detected in the first cycle images.
Fewer clusters than expected:
- Problem with cluster formation
- Blurred images Poor focus or dirty flow cell surface
- Lots of clusters visible Cluster density or size is too great to
distinguish individual objects
More clusters than expected:
- Too many clusters on the flow cell: Problem with cluster formation
- Very large clusters: Double counting
Average First Cycle Intensity
Generally, brighter is better, but this result is instrument and sample dependent.
Percentage of First Cycle Intensity Remaining After 20 Cycles of Sequencing
Generally, the higher, the better. The intensity remaining can be sample dependent.
Percentage of Clusters Passing Filters
To remove the least reliable data from the analysis, the raw data can be filtered to remove any clusters that have “too much” intensity corresponding to bases other than the called base. By default, the purity of the signal from each cluster is examined over the first 25 cycles and chastity = Highest_Intensity / (Highest_Intensity + Next_Highest_Intensity) is calculated for each cycle. The new default filtering implemented in at the base calling stage is that at most one cycle is less than the chastity threshold.
The higher the value, the better. This value is very dependent on cluster density, since the major cause of an impure signal in the early cycles is the presence of another cluster within a few micrometers.
Very few clusters passing filter
Possible Cause:
- Poor flow cell, perhaps unblocked DNA
- Faint clusters
- Out of focus
- Poor matrix
- A fluidics or sequencing failure
- Bubbles in individual tiles
- Too many clusters
- Large clusters
- High phasing or prephasing
Suggested Action:
- Some of the causes may be at a single cycle. If the problem is isolated to these early cycles, it is possible that this filtering throws away very good data.
- Base calling errors may be limited to affected cycles, and, as early cycles are fairly resistant to minor focus and fluidics problems, even the number of errors may be few. The filtering can always be set manually to some other values. Check before assuming all the data are poor.
Percentage Error Rate of Clusters Passing Filters
This value should be as low as possible, but it is very dependent on read length. If there is a sudden rise beyond cycle 32, then it is likely that ELANDv2 has effectively filtered out many clusters with more than two errors, thus suppressing the true error rate up to this point. The percentage aligning will also be low.
Percentage of Phasing and Prephasing
Ideally, these values should be as low as possible.
Standard Deviations
Many values have standard deviations associated with them. This can be the first indication as to the uniformity of the flow cell. If standard deviations are high, then it indicates variability from tile to tile with a lane.
Percentage of Clusters Passing Filters that Align Uniquely to the Reference Genome
Optimal value depends on the genome sequenced and the read-length; the higher (up to 100% max), the better. This result is genome specific and dependent on the completeness of the reference. A failure to align could be due to repeat or missing regions, or due to indels where sample and reference do not match.
Lane Results Summary
This table displays basic data quality metrics for each lane. Apart from Lane Yield, which is the total value for the lane, all the statistics are given as means and standard deviations over the tiles used in the lane.
- Clusters (raw)—The number of clusters detected by the image
analysis module. - Clusters (PF)—The number of detected clusters that meet the
filtering criterion listed in Lane Parameter Summary. - 1st Cycle Int (PF)—The average of the four intensities (one per
channel or base type) measured at the first cycle averaged over
filtered clusters. - % Intensity after 20 cycles (PF)—The corresponding intensity
statistic at cycle 20 as a percentage of that at the first cycle. - % PF Clusters—The percentage of clusters passing filtering.
- % Align (PF)—The percentage of filtered reads that were uniquely
aligned to the reference. For eland_rna it is number of PF reads
aligned to the genome and splice junctions. Reads aligned to abundant
sequences and masked by eland_rna do not participate in this number. - Alignment Score (PF)—The average filtered read alignment score
(reads with multiple or no alignments effectively contribute scores of
0). For phiX spikes, the number of reads aligning to PhiX is small and
therefore the reported alignment score (small number of aligned reads
divided by total number of PF reads) is usually small. - % Error Rate (PF)—The percentage of called bases in aligned reads
that do not match the reference. - If eland_pair analysis has been specified for one or more lanes,
then two Lane Results Summaries are produced, one for each read. All
lanes for which analysis has been specified are represented in the Read
1 table, but only those for which eland_pair analysis has been
specified contribute statistics to the Read 2 table.
Expanded Lane Summary
This displays more detailed quality metrics for each lane. Apart from the phasing and prephasing information, all values are tile means for the lane.
- Clusters (tile mean) (raw)—The number of clusters detected by the
image analysis module. - % Phasing—The estimated (or specified) value used for the
percentage of molecules in a cluster for which sequencing falls behind
the current position (cycle) within a read. - % Prephasing—The estimated (specification is not recommended)
value used for the percentage of molecules in a cluster for which
sequencing jumps ahead of the current position (cycle) within a read. - % Error Rate (raw)—The percentage of called bases in aligned
reads from all detected clusters that do not match the reference. - Equiv Perfect Clusters (raw)—The number of clusters in the ideal
situation of read base perfectly predicting reference base that would
provide the same information content (entropy of reference base given
read base and a prior assumption of equiprobable reference bases) as
calculated for all actual detected clusters. - % Retained—The percentage of clusters that passed filtering.
- Cycle 2-4 Av Int (PF)—The intensity averaged over cycles 2, 3,
and 4 for clusters that passed filtering. - Cycle 2-10 Av % Loss (PF)—The average percentage intensity drop
per cycle over cycles 2–10 (derived from a best fit straight line for
log intensity versus cycle number). - Cycle 10-20 Av % Loss (PF)—The average percentage intensity drop
per cycle over cycles 10–20 (derived from a best fit straight line for
log intensity versus cycle number). - % Align (PF)—The percentage of filtered reads that were uniquely
aligned to the reference. - % Error Rate (PF)—The percentage of called bases in aligned
filtered reads that do not match the reference. - Equiv Perfect Clusters (PF)—The number of clusters in the ideal
situation of read base perfectly predicting reference base that would
provide the same information content (entropy of reference base given
read base and a prior assumption of equiprobable reference bases) as
calculated for the actual clusters that passed filtering.
If eland_pair analysis has been specified for one or more lanes, then two Expanded Lane Results Summaries are produced, one for each read. All lanes for which analysis has been specified are represented in the Read 1 table, but only those for which eland_pair analysis has been specified contribute statistics to the Read 2 table.
Pair Summary
For lanes for which eland_pair analysis was performed, there are two per-tile summary tables (one for each read). These tables are preceded by a set of tables collectively entitled the Pair Summary. The Pair Summary tables
provide statistics about the alignment outcomes of the two reads individually and as a pair, the latter including relative orientation and separation (insert size) of partner read alignments.
The following tables are displayed in Pair Summary:
- Relative Orientation Statistics
- Insert Size Statistics
- Insert Statistics (% of individually uniquely alignable pairs)
- Relative Orientation Statistics—The relative orientation of a
pair is the orientation of read 2 relative to the orientation of read
1, based on the definition that the read 1 orientation is forward. The
relative orientation is defined as positive if the read 2 position is
greater than the read 1 position.These statistics are given only for
those pairs in which both reads were individually uniquely aligned,
since these are the reads used to determinethe predominant relative
orientation. Other orientations are considered anomalous and are
filtered out. - The symbols used in the column headings are intended as a visual
reminder of the definitions of the four possible relative orientations.
In the example below, the nominal orientation is correctly computed as
the two reads “pointing to” each other, as expected for the standard
Illumina short insert paired-read sample prep.Unlike these short insert
pairs that have a predominance in opposite and inwardly facing read
pairs (R+: > R1 R2 <), the large insert mate pair libraries
expect to produce a predominance in opposite and outwardly facing read
pairs (R-: < R2 R1 >). High frequencies of paired reads having
the same orientation (F-: > R2 R1 > or F+: > R1 R2 >) may
be indicative of a sample preparation problem, or evidence of an
adapter read through problem found when the read lengths are long
relative to the library insert size. Insert Size Statistics—Statistics
are derived from the insert sizes of those pairs in which both reads
were individually uniquely aligned and have the predominant relative
orientation. First, the median is determined. Then, a standard
deviation value is determined independently for those values below the
median and those above it. The lower and upper thresholds for
acceptable insert sizes are then defined as three of the relevant
standard deviations below and above the median, respectively. Insert
Statistics (% of individually uniquely alignable pairs)—This table
shows the number of inserts (out of those used to calculate insert size
statistics) considered acceptable in size and of those falling outside
the thresholds displayed in the Insert Size Statistics table. The
percentages are relative to the original number of pairs in which both
reads were individually uniquely aligned.
Bookmarkable Section Links
- 0.1 Summary.htm
- 0.2 Clusters
- 0.3 Average First Cycle Intensity
- 0.4 Percentage of First Cycle Intensity Remaining After 20 Cycles of Sequencing
- 0.5 Percentage of Clusters Passing Filters
- 0.6 Very few clusters passing filter
- 0.7 Percentage Error Rate of Clusters Passing Filters
- 0.8 Percentage of Phasing and Prephasing
- 0.9 Standard Deviations
- 0.10 Percentage of Clusters Passing Filters that Align Uniquely to the Reference Genome
- 1 Lane Results Summary