→ Slides doi:10.7490/f1000research.1112709.1
→ Video
Authors: Pratik Jagtap, University of Minnesota
Getiria Onsongo, University of Minnesota
Candace Guerrero, University of Minnesota
James Johnson, University of Minnesota
Thomas McGowan, University of Minnesota
Matthew Andrews, University of Minnesota-Duluth
Timothy Griffin, University of Minnesota
AbstractProteogenomics has emerged as an effective approach for identifying novel proteoforms and improve genome annotation. For example, matching mass spectrometry proteomic data to customized sample-specific RNASeq-derived databases facilitates identification of previously unidentified peptides. Proteogenomic identification of such peptides, however, requires greater scrutiny to qualify them as bonafide novel proteoform candidates.
In order to address these challenges we have developed a blueprint of modular galaxy workflows (
doi: 10.1021/pr500812t). These include a) database generation from RNASeq (
doi: 10.1186/1471-2164-15-703) or cDNA datasets; b) database search strategies that improve sensitivity of peptide spectral matches (doi: 10.1002/pmic.201200352); c) Filtering tools for quality control and d) modules for visualization and interpretation of results.
These Galaxy workflows were used in several studies to provide biological insights. In a fractionated human salivary dataset, we identified multiple, novel peptides that mapped to the basic proline-rich proteins (PRB1 and PRB2) located on chromosome 12. In a quantitative study of heart muscle (
doi: 10.1021/acs.jproteome.5b00575) and skeletal muscle protein expression (
doi: 10.1021/acs.jproteome.5b01138) during hibernation in 13-lined ground squirrel, researchers were able to identify peptides corresponding to previously uncharacterized proteins. Identification of these peptides allowed for improved genomic annotation of this non-model organism and provides insights into muscle physiology during hibernation.
We will present recent improvements by
Galaxy-P team to the above described blueprint workflows. This includes development of Multi-Omics Visualization Platform (MVP) Galaxy plugin that facilitates viewing novel peptide sequences in the context of reference genome sequences and RNASeq data - enabling interpretation and hypothesis generation for testing to understand biological significance.