→ Poster doi:10.7490/f1000research.1112743.1
AuthorsMichael Ta, Philip D. Cotter, Mathew W. Moore, Bioinformatics Department,
ResearchDx, Irvine CA, USA
Abstract Developing and validating a bioinformatics pipeline for a clinical assay is frequently a costly process; in some cases it includes the acquisition of limited patient samples with extremely rare genotypes. To aide in the validation process, we have developed a Galaxy workflow to generate synthetic FASTQ files with known mutations provided by the user along with those sourced from dbSNP. Researchers can use these FASTQ files to mimic various mutations types including single nucleotide variants, insertions, deletions, translocations, and copy number variations. Researchers can optimize and evaluate the expected efficiency of their bioinformatics pipeline using synthetic samples simulated at various sequencing depths to mimic both germline and somatic events. In addition, changes to existing pipelines can easily be re-verified using static synthetic datasets as the gold standard reference. The workflow uses several open source tools to retrieve the genomic location of variants in HGVS notation (
Transvar) and simulate reads from common sequencing platforms (
ART). A custom sequencing profile can be provided to ART to simulate reads with a base quality and call rate similar to specific sequencing machines used in the lab. The workflow described here provides a solution to the regulatory requirement for validation and re-validation of clinical bioinformatics pipelines.