→ Tutorial
→ Video
The tutorial is designed to introduce the tools, datatypes and workflow of variation detection using human genomic DNA using a small set of sequencing reads from chromosome 20. In this session we will:
Prerequisites:
A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
→ Tutorial
→ Video
This workshop would cover standard, advanced, and alternative RNAseq analysis pipelines, all using workflows and highlighting their advanced features. Three general pipelines would be addressed:
A standard RNAseq analysis pipeline using the Tuxedo suite (Tophat → Cuffdiff) for standard transcript quantification with a reference transcriptome.
An advanced analysis pipeline using the Tuxedo suite with StringTie to create de novo transcript structures, merge these with reference transcripts to create a transcripteome database, followed by transcript quantification.
These three pipelines would be used as examples to highlight usage of workflows and their advanced features.
Prerequisites:
A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
Workshop will cover the basics of de novo genome assembly using a small genome example. This includes project planning steps, selecting fragment sizes, initial assembly of reads into fully covered contigs, and then assembling those contigs into larger scaffolds that may include gaps. The end result will be a set of contigs and scaffolds with sufficient average length to perform further analysis on, including genome annotation (link to that nomination). This workshop will use tools and methods targeted at small genomes. The basics of assembly and scaffolding presented here will be useful for building larger genomes, but the specific tools and much of the project planning will be different.
Prerequisites:
→ Video
This workshop continues where the Introduction to Galaxy session leaves off. Additional features of Galaxy will be introduced and several topics introduced in that first session will be explored in more detail. Topics covered will include
Genome assembly produces the raw genomic sequence of an organism. Genome annotation adds meaning to sequence by associating structural and functional annotation with specific regions (loci) on the genome. This workshop will introduce genome annotation in the context of small genomes. We'll begin with genome annotation concepts, and then introduce resources and tools for automatically annotating small genomes. The workshop will finish with a review of options for further automatic and manual tuning of the annotation, and for maintaining it as new assemblies or information becomes available.
Prerequisites:
→ Slides, doi: 10.7490/f1000research.1112908.1
→ Video
This hands-on workshop will take participants through the essential steps for using Galaxy for the analysis of mass spectrometry (MS)-based proteomics data, focusing protein identification from large-scale datasets, and more advanced applications integrating genomic data with proteomic data. Introductory material will be presented on the basics of MS-based proteomics informatics and also emerging applications integrating genomic and proteomic data (an area called proteogenomics).
The workshop will be constructed to follow the steps of proteomic and proteogenomic workflows. Analysis modules corresponding to each of these steps will be described and demonstrated, following the structure below:
Database generation and raw data processing
Attendees will be guided through the use of tools for selecting and generating databases – either standard databases or customized database for proteogenomics derived from genomic data (e.g. RNA-seq data). Tools for converting raw data to processed peak lists for further analysis will also be described.
Sequence database searching
Attendees will learn about available software in Galaxy for sequence database searching, which identifies proteins via matching of MS data to sequence databases. Use of these tools and optimization of parameters will be demonstrated and discussed.
Results visualization and interpretation
Attendees will be exposed to a variety of tools for visualizing and filtering results in Galaxy. Emphasis will be on tools useful for filtering identified proteins from proteogenomic analyses, where quality control of results is essential to generate high confidence results.
At the end of the workshop, attendees will have working knowledge of MS-based proteomics tools in the Tool Shed, experience in setting up basic workflows for protein identification, as well as more advanced applications in proteogenomics. Attendees will also have a better comprehension of the pitfalls encountered when interpreting data from these applications, and tools in Galaxy to help ensure confidence in results.
Participants will be given temporary accounts to a cloud-based Galaxy instance to participate in hands-on workshop activities.
Prerequisites:
→ Slides, doi: 10.7490/f1000research.1112912.1
→ Video
RADseq1 data allow scientists to gather genome wide information with a low-cost approach compared to complete genome sequencing. In this training session, we will show how to analyze RADseq data to
Stacks works with restriction-enzyme based data, including GBS, CRoPS, and single and double digest RAD. Stacksidentifies loci in a set of individuals, either de novo or aligned to a reference genome (including gapped alignments), and then genotypes each locus. See the Stacks Manual for full details.
Stacks has been integrated into Galaxy and is available via the GUGGO Tool Shed.
Prerequisites:
1. Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Research. 17(2):240-248.
2. Amores A, Catchen J, Ferrara A, Fontenot Q, Postlethwait JH. (2011) Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication. Genetics 188(4):799-808.
3. Davey JW and Blaxter ML (2011) RADSeq: next-generation population genetics. Briefings in Functional Genomics. 10 (2): 108
4. Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. (2012) Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species. PLoS ONE 7(5): e37135.
5. Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3 1(3):171-182
This workshop would cover standard, advanced, and alternative RNAseq analysis pipelines, all using workflows and highlighting their advanced features. Three general pipelines would be addressed:
A standard RNAseq analysis pipeline using the Tuxedo suite (Tophat → Cuffdiff) for standard transcript quantification with a reference transcriptome.
An advanced analysis pipeline using the Tuxedo suite with StringTie to create de novo transcript structures, merge these with reference transcripts to create a transcripteome database, followed by transcript quantification.
These three pipelines would be used as examples to highlight usage of workflows and their advanced features.
Prerequisites:
A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
→ Slides, doi: 10.7490/f1000research.1112913.1
→ Video
This workshop will cover visualization in Galaxy for both primary high-throughput sequencing /next-generation sequencing (NGS) analyses—alignments, variants, expression levels, and annotations—as well as visualization of downstream and aggregated datasets using histograms, heat maps, and other numerical plots. First, using datasets from a combined exome and transcriptome (RNA-seq) experiment, participants will visualize data using Galaxy’s genome browser and Circos plot. Participants will learn how to create a genome visualization, add data, configure data, move between a genome browser view and Circos view, and share complex genome visualizations with more than 12 NGS datasets. Second, using an integrated datasets of genomics and other -omics information, participants will create a several numerical plots (e.g., scatter plot, histogram) to gain an overview of the data. Based on insight gained from these visualizations, participants will create a heatmap to identify patterns and potential causal factors. All visualizations will be created, saved, and shared using only Galaxy and a Web browser; no data or software downloads will be necessary.
Prerequisites: