Galaxy Community Conference 2016 (GCC16): Full Schedule

Visit the main conference website: http://galaxyproject.org/gcc2016

9:00am EDT

Introduction to Galaxy

→ Video

New to Galaxy? This will introduce you to the Galaxy Project, the Galaxy Community, and walk you through a simple use case demonstrating what Galaxy can do. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites:

Little or no knowledge of Galaxy.
A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Anton Nekrutenko

Penn State University

Sunday June 26, 2016 9:00am - 11:30am EDT
IMU Persimmon Room

D.1 Training - Using D.x Training - All

9:00am EDT

Metagenomics with Galaxy

➔ Video

Tools and workflows for the analysis and visualisation of metagenomics data sets.

Software & Downloads:

Ability to unzip files: can unzip on command-line; or use a GUI, like 7Zip (http://www.7-zip.org/) or your OS built-inAbility to upload files via FTP: can use ftp command-line, or can use GUI like Cyberduck (https://cyberduck.io/) or another (https://en.wikipedia.org/wiki/Comparison_of_FTP_client_software)
We will be using a Galaxy Instance available at: http://gcc2016.dblankenberg.org/
It would be helpful as well if you download this file in advance: http://gcc2016.dblankenberg.org/static/example/MiSeqSOPData.zip alternate link: http://www.mothur.org/w/images/d/d6/MiSeqSOPData.zip
Slides are available here: http://dblankenberg.org/gcc2016/training/metagenomics/blankenberg_gcc_2016_training_metagenomics.pdf

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).

A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Daniel Blankenberg

Assistant Professor, Genomic Medicine Institute, Cleveland Clinic Lerner Research Institute

Sunday June 26, 2016 9:00am - 11:30am EDT
IMU Maple Room

D.1 Training - Using D.x Training - All

12:30pm EDT

Human Variant Calling with Galaxy

→ Tutorial
→ Video

The tutorial is designed to introduce the tools, datatypes and workflow of variation detection using human genomic DNA using a small set of sequencing reads from chromosome 20. In this session we will:

Evaluate the quality of the short data. If the quality is poor, then adjustments can be made – e.g. trimming the short reads, or adjusting your expectations of the final outcome.
Map each of the individual reads in the sample FASTQ readsets to a reference genome, so that we can then identify the sequence changes with respect to the reference genome. Some of the variant callers need extra information regarding the source of reads in order to identify the correct error profiles to use in their statistical variant detection model, so we add more information into the alignment step so that that generated BAM file contains the metadata the variant caller expects.
Calling Variants using the GATK Unified Genotyper. The GATK Unified Genotyper is a Bayesian variant caller and genotyper from the Broad Institute. Many users consider the GATK to be best practice in human variant calling.
Try an alternative caller: Mpileup
Evaluate known variations. We know a lot about variation in humans from many empirical studies, including the 1000Genomes project, so we have some expectations on what we should see when we detect variants in a new sample.
Annotate the detected variants against the ensembl database and interpret the annotation output.

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).

A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Annette McGrath

CSIRO

Jessica Chung

VLSCI

Pip Griffin

University of Melbourne

Simon Gladman

University of Melbourne

Torsten Seemann

University of Melbourne

Sunday June 26, 2016 12:30pm - 3:00pm EDT
IMU Persimmon Room

D.1 Training - Using D.x Training - All

12:30pm EDT

Beyond the Intro: Further adventures in using Galaxy

→ Slides, Video
This workshop continues where the Introduction to Galaxy session leaves off. Additional features of Galaxy will be introduced and several topics introduced in that first session will be explored in more detail. Topics covered will include

Uploading data via FTP
History management
Defining and using custom reference genomes
Using Tagging and Annotation to manage your Galaxy objects
More on workflow editing and management
More on sharing and publishing
Using Galaxy to help debug your analyses

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Daniel Blankenberg

Assistant Professor, Genomic Medicine Institute, Cleveland Clinic Lerner Research Institute

blankenberg gcc 2016 training advanced galaxy pdf

Sunday June 26, 2016 12:30pm - 3:00pm EDT
IMU Sassafras Room

D.1 Training - Using D.x Training - All

3:30pm EDT

ChIPseq analysis using deepTools and MACS

→ Slides, doi: 10.7490/f1000research.1112903.1
→ Video

Did my IP work? Where is my signal? How well do my replicates correlate? What might my peaks even look like? Where are my peaks (or signal) in relationship to transcription start sites (or other features)? These are common questions that biologists first pose when dealing with ChIPseq data. We will use deepTools and MACS within Galaxy to demonstrate effective methods of (A) performingChIPseq-specific quality control, (B) calling peaks and (C) visualizing signal and peak enrichment around genes or other features.

Prerequisites:

A basic familiarity with using Galaxy (how to import datasets and run tools).

Ideally participants will already be familiar with generic NGS quality control and read mapping, since those won't be covered

Instructors

Devon Ryan

Max Planck Institute of Immunobiology and Epigenetics (MPI-IE)

DeepToolsMacs GCC2016 pdf

Sunday June 26, 2016 3:30pm - 6:00pm EDT
IMU Walnut Room

D.1 Training - Using D.x Training - All

3:30pm EDT

RNA-seq analysis with Galaxy, using advanced workflows

→ Tutorial
→ Video

This workshop would cover standard, advanced, and alternative RNAseq analysis pipelines, all using workflows and highlighting their advanced features. Three general pipelines would be addressed:

A standard RNAseq analysis pipeline using the Tuxedo suite (Tophat → Cuffdiff) for standard transcript quantification with a reference transcriptome.
An advanced analysis pipeline using the Tuxedo suite with StringTie to create de novo transcript structures, merge these with reference transcripts to create a transcripteome database, followed by transcript quantification.
An alternative RNAseq analysis pipeline using count based quantification methods (DESeq2, edgeR, or limma) to generate abundance measurements.

These three pipelines would be used as examples to highlight usage of workflows and their advanced features.

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).

A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Annette McGrath

CSIRO

Jessica Chung

VLSCI

Pip Griffin

University of Melbourne

Simon Gladman

University of Melbourne

Torsten Seemann

University of Melbourne

Sunday June 26, 2016 3:30pm - 6:00pm EDT
IMU Oak Room

D.1 Training - Using D.x Training - All

9:00am EDT

Introduction to Galaxy

Little or no knowledge of Galaxy.
A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Anton Nekrutenko

Penn State University

Monday June 27, 2016 9:00am - 11:30am EDT
IMU Sassafras Room

D.1 Training - Using D.x Training - All

9:00am EDT

Small Genomes de novo Assembly and Scaffolding

→ Tutorial
→ Video

Workshop will cover the basics of de novo genome assembly using a small genome example. This includes project planning steps, selecting fragment sizes, initial assembly of reads into fully covered contigs, and then assembling those contigs into larger scaffolds that may include gaps. The end result will be a set of contigs and scaffolds with sufficient average length to perform further analysis on, including genome annotation (link to that nomination). This workshop will use tools and methods targeted at small genomes. The basics of assembly and scaffolding presented here will be useful for building larger genomes, but the specific tools and much of the project planning will be different.

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).

A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Annette McGrath

CSIRO

Jessica Chung

VLSCI

Pip Griffin

University of Melbourne

Simon Gladman

University of Melbourne

Torsten Seemann

University of Melbourne

Monday June 27, 2016 9:00am - 11:30am EDT
IMU Persimmon Room

D.1 Training - Using D.x Training - All

12:30pm EDT

Beyond the Intro: Further adventures in using Galaxy

→ Video

This workshop continues where the Introduction to Galaxy session leaves off. Additional features of Galaxy will be introduced and several topics introduced in that first session will be explored in more detail. Topics covered will include

Uploading data via FTP
History management
Defining and using custom reference genomes
Using Tagging and Annotation to manage your Galaxy objects
More on workflow editing and management
More on sharing and publishing
Using Galaxy to help debug your analyses

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Daniel Blankenberg

Assistant Professor, Genomic Medicine Institute, Cleveland Clinic Lerner Research Institute

Monday June 27, 2016 12:30pm - 3:00pm EDT
IMU Persimmon Room

D.1 Training - Using D.x Training - All

12:30pm EDT

Small Genome Annotation

→ Tutorial
→ Video

Genome assembly produces the raw genomic sequence of an organism. Genome annotation adds meaning to sequence by associating structural and functional annotation with specific regions (loci) on the genome. This workshop will introduce genome annotation in the context of small genomes. We'll begin with genome annotation concepts, and then introduce resources and tools for automatically annotating small genomes. The workshop will finish with a review of options for further automatic and manual tuning of the annotation, and for maintaining it as new assemblies or information becomes available.

Prerequisites:

Instructors

Annette McGrath

CSIRO

Jessica Chung

VLSCI

Pip Griffin

University of Melbourne

Simon Gladman

University of Melbourne

Torsten Seemann

University of Melbourne

Monday June 27, 2016 12:30pm - 3:00pm EDT
IMU Maple Room

D.1 Training - Using D.x Training - All

12:30pm EDT

Using Galaxy for proteomic and integrative multi-omic data analysis

→ Slides, doi: 10.7490/f1000research.1112908.1

→ Video

This hands-on workshop will take participants through the essential steps for using Galaxy for the analysis of mass spectrometry (MS)-based proteomics data, focusing protein identification from large-scale datasets, and more advanced applications integrating genomic data with proteomic data. Introductory material will be presented on the basics of MS-based proteomics informatics and also emerging applications integrating genomic and proteomic data (an area called proteogenomics).

The workshop will be constructed to follow the steps of proteomic and proteogenomic workflows. Analysis modules corresponding to each of these steps will be described and demonstrated, following the structure below:

Database generation and raw data processing

Attendees will be guided through the use of tools for selecting and generating databases – either standard databases or customized database for proteogenomics derived from genomic data (e.g. RNA-seq data). Tools for converting raw data to processed peak lists for further analysis will also be described.
Sequence database searching

Attendees will learn about available software in Galaxy for sequence database searching, which identifies proteins via matching of MS data to sequence databases. Use of these tools and optimization of parameters will be demonstrated and discussed.
Results visualization and interpretation

Attendees will be exposed to a variety of tools for visualizing and filtering results in Galaxy. Emphasis will be on tools useful for filtering identified proteins from proteogenomic analyses, where quality control of results is essential to generate high confidence results.

At the end of the workshop, attendees will have working knowledge of MS-based proteomics tools in the Tool Shed, experience in setting up basic workflows for protein identification, as well as more advanced applications in proteogenomics. Attendees will also have a better comprehension of the pitfalls encountered when interpreting data from these applications, and tools in Galaxy to help ensure confidence in results.

Participants will be given temporary accounts to a cloud-based Galaxy instance to participate in hands-on workshop activities.

Prerequisites:

Instructors

James Johnson

Senior Software Developer, Minnesota Supercomputing Institute, University of Minnesota

Galaxy for genomics and proteomics

Pratik Jagtap

Research Assistant Professor, University of Minnesota

Metaproteomics . DIA . Proteogenomics

Timothy J. Griffin

Professor, University of Minnesota

GCC2016 workshop GalaxyP pdf

Monday June 27, 2016 12:30pm - 3:00pm EDT
IMU Walnut Room

D.1 Training - Using D.x Training - All

3:30pm EDT

RADseq Data Analysis Through STACKS on Galaxy

→ Slides, doi: 10.7490/f1000research.1112912.1
→ Video

RADseq1 data allow scientists to gather genome wide information with a low-cost approach compared to complete genome sequencing. In this training session, we will show how to analyze RADseq data to

build genetic maps2,
calculate population genomics statistics3,4 and
assemble paired-end loci with or without reference genome using Stacks5 on Galaxy

Stacks works with restriction-enzyme based data, including GBS, CRoPS, and single and double digest RAD. Stacksidentifies loci in a set of individuals, either de novo or aligned to a reference genome (including gapped alignments), and then genotypes each locus. See the Stacks Manual for full details.

Stacks has been integrated into Galaxy and is available via the GUGGO Tool Shed.

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

1. Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Research. 17(2):240-248.

2. Amores A, Catchen J, Ferrara A, Fontenot Q, Postlethwait JH. (2011) Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication. Genetics 188(4):799-808.

3. Davey JW and Blaxter ML (2011) RADSeq: next-generation population genetics. Briefings in Functional Genomics. 10 (2): 108

4. Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. (2012) Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species. PLoS ONE 7(5): e37135.

5. Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3 1(3):171-182

Instructors

Anthony Bretaudeau

BIPAA/GenOuest

Gildas Le Corguillé

CNRS-SU - Station Biologique de Roscoff - ABiMS

Yvan Le Bras

Research engineer, French National Museum of Natural History

GCC2016 Training RADSeq pdf

Monday June 27, 2016 3:30pm - 6:00pm EDT
IMU Sassafras Room

D.1 Training - Using D.x Training - All

3:30pm EDT

RNA-seq analysis with Galaxy, using advanced workflows

→ Tutorial
→ Video

This workshop would cover standard, advanced, and alternative RNAseq analysis pipelines, all using workflows and highlighting their advanced features. Three general pipelines would be addressed:

A standard RNAseq analysis pipeline using the Tuxedo suite (Tophat → Cuffdiff) for standard transcript quantification with a reference transcriptome.
An advanced analysis pipeline using the Tuxedo suite with StringTie to create de novo transcript structures, merge these with reference transcripts to create a transcripteome database, followed by transcript quantification.
An alternative RNAseq analysis pipeline using count based quantification methods (DESeq2, edgeR, or limma) to generate abundance measurements.

These three pipelines would be used as examples to highlight usage of workflows and their advanced features.

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).

A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Annette McGrath

CSIRO

Jessica Chung

VLSCI

Pip Griffin

University of Melbourne

Simon Gladman

University of Melbourne

Torsten Seemann

University of Melbourne

Monday June 27, 2016 3:30pm - 6:00pm EDT
IMU Persimmon Room

D.1 Training - Using D.x Training - All

3:30pm EDT

Visualization of Omics Datasets in Galaxy

→ Slides, doi: 10.7490/f1000research.1112913.1
→ Video

This workshop will cover visualization in Galaxy for both primary high-throughput sequencing /next-generation sequencing (NGS) analyses—alignments, variants, expression levels, and annotations—as well as visualization of downstream and aggregated datasets using histograms, heat maps, and other numerical plots. First, using datasets from a combined exome and transcriptome (RNA-seq) experiment, participants will visualize data using Galaxy’s genome browser and Circos plot. Participants will learn how to create a genome visualization, add data, configure data, move between a genome browser view and Circos view, and share complex genome visualizations with more than 12 NGS datasets. Second, using an integrated datasets of genomics and other -omics information, participants will create a several numerical plots (e.g., scatter plot, histogram) to gain an overview of the data. Based on insight gained from these visualizations, participants will create a heatmap to identify patterns and potential causal factors. All visualizations will be created, saved, and shared using only Galaxy and a Web browser; no data or software downloads will be necessary.

Prerequisites:

A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Instructors

Aysam Guerler

Galaxy Project, Johns Hopkins University

Johns Hopkins University

Jeremy Goecks

Galaxy Project, Oregon Health and Science University

2016 gcc viz workshop pdf

Monday June 27, 2016 3:30pm - 6:00pm EDT
IMU Maple Room

D.1 Training - Using D.x Training - All