NGSC - FAQs |
Next-Generation Sequencing Core Perelman School of Medicine | University of Pennsylvania |
![]() |
The NGSC has begin providing counts of reads generated during demultiplexing FASTQ files. These files can be used in conjuction with the AAA-StudyInfo
files to determine the amount of data produced.
The files AAA-DemuxStatistics
(.tsv
and .xls
) are located in each investigation's folder. The files are generally updated four times a day at 7AM, 1PM, 7PM, and 1AM.
These statistics were recorded fairly completely from run FGC0518
and for essentially all runs after FGC1240
. A pair of files will be made even if we do not have any data to report.
The counts do not include reads that are put in the undetermined file. They may also not reflect non-barcoded samples correctly.
The columns in the files are described below.
The data is available in either a plain text, tab-delimited .tsv
file or in an Excel .xls
file.
Each line in the file corresponds to a different sample.
Each file has the following 'meta data' columns to describe the samples.
Sample
- the NGSC-assigned sample idSample_name
- the user-assigned sample nameBarcode
- the barcode used in demultiplexingAssay
- the assay from the experimental designCondition
- the condition from the experimental designStudy
- the study name (a subpart of the investigation)Investigation
the investigation nameThe read count columns have headers that look like FGC1429/3/count_reads#2385
. The first two parts are the run and lane. The third part is more technical and generally will not matter. However, should you encounter two columns with the same run and lane the trailing number can be provided to the NGSC IT staff to help identify the column to use.
SUM
- the last column is the sum of all the reads for each sample.The since the data is presented as a big table of samples versus runs/lanes, a large study may have many empty cells. These correspond to runs and lanes for a pool that did not contain the specific sample.