Skip to content

Parse fastq samples

Parsing Sample IDs from FASTQ File Names

When parsing the sample IDs from the FASTQ file names, all of the following patterns can be used to assign the sample name ("SampleName" in the examples below) to a FASTQ file:

Note that each Read 1 file must have a matching Read 2 file, which differs only by 1 -> 2.

Pattern 1:

SampleName.R1.fastq.gz
┃         ┃┃  ┃    ┗━ Extension .gz is optional
┃         ┃┃  ┗━━━━━━ Extension can be '.fastq' <or> '.fq'
┃         ┃┗━━━━━━━━━ Read pair: 'R1' <or> '1' allowed (with matching 'R2' <or> '2')
┃         ┗━━━━━━━━━━ Separator can be '.' <or> '_'
┗━━━━━━━━━━━━━━━━━━━━ Sample identifier ('SampleName' in this case)

Pattern 2:

SampleName_S1_L001_R1_001.fastq.gz
┃           ┃    ┃  ┃┗━ Extension must be '_001.fastq.gz'
┃           ┃    ┃  ┗━━ Read pair: 'R1' (with matching 'R2')
┃           ┃    ┗━━━━━ Lane on Illumina sequencer
┃           ┗━━━━━━━━━━ Sample index number
┗━━━━━━━━━━━━━━━━━━━━━━ Sample identifier ('SampleName' in this case)

Pattern 3:

SampleName_S1_R1_001.fastq.gz
┃           ┃  ┃┗━━━━━━ Extension must be '_001.fastq.gz'
┃           ┃  ┗━━━━━━━ Read pair: 'R1' (with matching 'R2')
┃           ┗━━━━━━━━━━ Sample index number
┗━━━━━━━━━━━━━━━━━━━━━━ Sample identifier ('SampleName' in this case)