Loading…
Visit the main conference website:  http://galaxyproject.org/gcc2016
Wednesday, June 29 • 3:15pm - 4:30pm
P12: A Galaxy Workflow for the Generation of Synthetic FASTQ Samples

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Poster    doi:10.7490/f1000research.1112743.1

Authors

Michael Ta, Philip D. Cotter, Mathew W. Moore, Bioinformatics Department, ResearchDx, Irvine CA, USA 

Abstract 
Developing and validating a bioinformatics pipeline for a clinical assay is frequently a costly process; in some cases it includes the acquisition of limited patient samples with extremely rare genotypes. To aide in the validation process, we have developed a Galaxy workflow to generate synthetic FASTQ files with known mutations provided by the user along with those sourced from dbSNP. Researchers can use these FASTQ files to mimic various mutations types including single nucleotide variants, insertions, deletions, translocations, and copy number variations. Researchers can optimize and evaluate the expected efficiency of their bioinformatics pipeline using synthetic samples simulated at various sequencing depths to mimic both germline and somatic events. In addition, changes to existing pipelines can easily be re-verified using static synthetic datasets as the gold standard reference. The workflow uses several open source tools to retrieve the genomic location of variants in HGVS notation (Transvar) and simulate reads from common sequencing platforms (ART). A custom sequencing profile can be provided to ART to simulate reads with a base quality and call rate similar to specific sequencing machines used in the lab. The workflow described here provides a solution to the regulatory requirement for validation and re-validation of clinical bioinformatics pipelines.

Presenters


Wednesday June 29, 2016 3:15pm - 4:30pm EDT
IMU Solarium

Attendees (5)