The tutorial is designed to introduce the tools, datatypes and workflow of variation detection using human genomic DNA using a small set of sequencing reads from chromosome 20. In this session we will:
- Evaluate the quality of the short data. If the quality is poor, then adjustments can be made – e.g. trimming the short reads, or adjusting your expectations of the final outcome.
- Map each of the individual reads in the sample FASTQ readsets to a reference genome, so that we can then identify the sequence changes with respect to the reference genome. Some of the variant callers need extra information regarding the source of reads in order to identify the correct error profiles to use in their statistical variant detection model, so we add more information into the alignment step so that that generated BAM file contains the metadata the variant caller expects.
- Calling Variants using the GATK Unified Genotyper. The GATK Unified Genotyper is a Bayesian variant caller and genotyper from the Broad Institute. Many users consider the GATK to be best practice in human variant calling.
- Try an alternative caller: Mpileup
- Evaluate known variations. We know a lot about variation in humans from many empirical studies, including the 1000Genomes project, so we have some expectations on what we should see when we detect variants in a new sample.
- Annotate the detected variants against the ensembl database and interpret the annotation output.
- A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.