→ Slides (10.7490/f1000research.1113243.1)
→ Video
Want to know the big picture about what is going on inside Galaxy? This workshop will give participants a practical introduction to the Galaxy code base with a focus on changing those parts of Galaxy most often modified by local deployers and new contributors.
The workshop will include the following specific content:
Prerequisites:
→ Tutorial
→ Video
The tutorial is designed to introduce the tools, datatypes and workflow of variation detection using human genomic DNA using a small set of sequencing reads from chromosome 20. In this session we will:
Prerequisites:
A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
This session will walk developers and bioinformaticians through the process of taking a working script or application and turning it into a Galaxy tool. It will also cover the basics of using Planemo: a command-line utility to assist in building and publishing Galaxy tools. We will investigate wrapping, common parameters, tool linting, best practices, loading tools into Galaxy, citations, and publishing tools to Github and the Galaxy Tool Shed. Common tips and tricks will be discussed as well as insights from experienced tool developers.
Prerequisites
→ Tutorial
→ Video
This workshop would cover standard, advanced, and alternative RNAseq analysis pipelines, all using workflows and highlighting their advanced features. Three general pipelines would be addressed:
A standard RNAseq analysis pipeline using the Tuxedo suite (Tophat → Cuffdiff) for standard transcript quantification with a reference transcriptome.
An advanced analysis pipeline using the Tuxedo suite with StringTie to create de novo transcript structures, merge these with reference transcripts to create a transcripteome database, followed by transcript quantification.
These three pipelines would be used as examples to highlight usage of workflows and their advanced features.
Prerequisites:
A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
→ Tutorial, Source files
→ Video
The Galaxy project has developed a significant number of Ansible roles that enable anyone to build a production-level Galaxy server on any infrastructure without much manual effort. In this workshop, we will cover the purpose of the available roles and how they relate to each other. To showcase their use, we will build a complete Galaxy server with personal choice of tools using only a handful of commands.
Prerequisites:
→ Slides, doi: 10.7490/f1000research.1112906.1
→ Video
In this session you will get an introduction to Interactive Environments (IE) as an easy and powerful way to integrate arbitrary interactive web services into Galaxy. We will demonstrate the IPython Galaxy project and the general concept of IE’s.
Prerequisites:
You are on your own for dinner this evening. See the bottom of the conference location page for links to nearby options. Or, if you just want to wander, see the online map for restaurant-enriched neighborhoods. Fourth street from Indiana Avenue to Walnut St. and Fifth Street (Kirkwood Avenue) from Indiana Avenue to Rogers St. both have an array of amazing options. The square downtown is a great find as well. 3rd St. from Wilkie to just past Jordan Ave. also has restaurant options. Lennie's on 10th St. near the Herman B Wells Library is also very good.
Workshop will cover the basics of de novo genome assembly using a small genome example. This includes project planning steps, selecting fragment sizes, initial assembly of reads into fully covered contigs, and then assembling those contigs into larger scaffolds that may include gaps. The end result will be a set of contigs and scaffolds with sufficient average length to perform further analysis on, including genome annotation (link to that nomination). This workshop will use tools and methods targeted at small genomes. The basics of assembly and scaffolding presented here will be useful for building larger genomes, but the specific tools and much of the project planning will be different.
Prerequisites:
→ Tutorial, doi: 10.7490/f1000research.1112907.1
→ Video
This will be a technical workshop covering the process of creating a Galaxy on the Cloud platform, for a range of clouds. We will look at how to use Galaxy Ansible automation playbooks to build all the components required to run Galaxy on a cloud using CloudMan. At the end of this workshop, you will know how to create a custom version Galaxy on any supported cloud (AWS, OpenStack) and allow others to then easily and independently launch those. Specifically, it will cover the process of creating the machine image, galaxy and indices file systems as well as installing Cloud Launch.
Prerequisites:
There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology. GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups. Birds of a Feather meetups are informal gatherings where participants group together based on common interests.
BoF meetups scheduled during this time are:
BoF meetups are encouraged throughout GCC2016.
If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
Application portability conundrum of Galactic proportions.
Using application containers for portability, especially overcoming Toolshed build issues.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
A developer school is planned in January 2017, in Strasbourg, organized by Elixir (European bioinformatics HUB) and the French Institute of Bioinformatics (the Elixir French national node). This BoF is a discussion in order to fix training modules that will be proposed. The first discussion about this event has been animated around these slides, during the all-hands Elixir meeting in Barcelona, 2016.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
→ Video
This workshop continues where the Introduction to Galaxy session leaves off. Additional features of Galaxy will be introduced and several topics introduced in that first session will be explored in more detail. Topics covered will include
Genome assembly produces the raw genomic sequence of an organism. Genome annotation adds meaning to sequence by associating structural and functional annotation with specific regions (loci) on the genome. This workshop will introduce genome annotation in the context of small genomes. We'll begin with genome annotation concepts, and then introduce resources and tools for automatically annotating small genomes. The workshop will finish with a review of options for further automatic and manual tuning of the annotation, and for maintaining it as new assemblies or information becomes available.
Prerequisites:
→ Slides, doi: 10.7490/f1000research.1112908.1
→ Video
This hands-on workshop will take participants through the essential steps for using Galaxy for the analysis of mass spectrometry (MS)-based proteomics data, focusing protein identification from large-scale datasets, and more advanced applications integrating genomic data with proteomic data. Introductory material will be presented on the basics of MS-based proteomics informatics and also emerging applications integrating genomic and proteomic data (an area called proteogenomics).
The workshop will be constructed to follow the steps of proteomic and proteogenomic workflows. Analysis modules corresponding to each of these steps will be described and demonstrated, following the structure below:
Database generation and raw data processing
Attendees will be guided through the use of tools for selecting and generating databases – either standard databases or customized database for proteogenomics derived from genomic data (e.g. RNA-seq data). Tools for converting raw data to processed peak lists for further analysis will also be described.
Sequence database searching
Attendees will learn about available software in Galaxy for sequence database searching, which identifies proteins via matching of MS data to sequence databases. Use of these tools and optimization of parameters will be demonstrated and discussed.
Results visualization and interpretation
Attendees will be exposed to a variety of tools for visualizing and filtering results in Galaxy. Emphasis will be on tools useful for filtering identified proteins from proteogenomic analyses, where quality control of results is essential to generate high confidence results.
At the end of the workshop, attendees will have working knowledge of MS-based proteomics tools in the Tool Shed, experience in setting up basic workflows for protein identification, as well as more advanced applications in proteogenomics. Attendees will also have a better comprehension of the pitfalls encountered when interpreting data from these applications, and tools in Galaxy to help ensure confidence in results.
Participants will be given temporary accounts to a cloud-based Galaxy instance to participate in hands-on workshop activities.
Prerequisites:
→ Tutorial
This workshop is aimed at people with some experience developing tools and will cover more advanced topics in tool development, more complex tools, and recent enhancements to the Galaxy tool development process including:
Prerequisites:
→ Slides, doi: 10.7490/f1000research.1112912.1
→ Video
RADseq1 data allow scientists to gather genome wide information with a low-cost approach compared to complete genome sequencing. In this training session, we will show how to analyze RADseq data to
Stacks works with restriction-enzyme based data, including GBS, CRoPS, and single and double digest RAD. Stacksidentifies loci in a set of individuals, either de novo or aligned to a reference genome (including gapped alignments), and then genotypes each locus. See the Stacks Manual for full details.
Stacks has been integrated into Galaxy and is available via the GUGGO Tool Shed.
Prerequisites:
1. Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Research. 17(2):240-248.
2. Amores A, Catchen J, Ferrara A, Fontenot Q, Postlethwait JH. (2011) Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication. Genetics 188(4):799-808.
3. Davey JW and Blaxter ML (2011) RADSeq: next-generation population genetics. Briefings in Functional Genomics. 10 (2): 108
4. Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. (2012) Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species. PLoS ONE 7(5): e37135.
5. Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read Sequences. G3 1(3):171-182
This workshop would cover standard, advanced, and alternative RNAseq analysis pipelines, all using workflows and highlighting their advanced features. Three general pipelines would be addressed:
A standard RNAseq analysis pipeline using the Tuxedo suite (Tophat → Cuffdiff) for standard transcript quantification with a reference transcriptome.
An advanced analysis pipeline using the Tuxedo suite with StringTie to create de novo transcript structures, merge these with reference transcripts to create a transcripteome database, followed by transcript quantification.
These three pipelines would be used as examples to highlight usage of workflows and their advanced features.
Prerequisites:
A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
→ Slides, doi: 10.7490/f1000research.1112913.1
→ Video
This workshop will cover visualization in Galaxy for both primary high-throughput sequencing /next-generation sequencing (NGS) analyses—alignments, variants, expression levels, and annotations—as well as visualization of downstream and aggregated datasets using histograms, heat maps, and other numerical plots. First, using datasets from a combined exome and transcriptome (RNA-seq) experiment, participants will visualize data using Galaxy’s genome browser and Circos plot. Participants will learn how to create a genome visualization, add data, configure data, move between a genome browser view and Circos view, and share complex genome visualizations with more than 12 NGS datasets. Second, using an integrated datasets of genomics and other -omics information, participants will create a several numerical plots (e.g., scatter plot, histogram) to gain an overview of the data. Based on insight gained from these visualizations, participants will create a heatmap to identify patterns and potential causal factors. All visualizations will be created, saved, and shared using only Galaxy and a Web browser; no data or software downloads will be necessary.
Prerequisites:
→ Slides, doi: 10.7490/f1000research.1112914.1
In this session you will get in-depth introduction to Interactive Environments (IE). You will learn how to setup and secure IE’s in a production Galaxy instance. Moreover, we will create an IE on-the-fly to get you started in creating your own Interactive Environments.
Prerequisites:
→ Slides, doi: 10.7490/f1000research.1112915.1
→ Video
Galaxy has an always-growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover a variety of approaches to making use of the API.
Prerequisites:
There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology. GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups. Birds of a Feather meetups are informal gatherings where participants group together based on common interests.
BoF meetups are encouraged throughout GCC2016.
If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
Following the first Galaxy Data Hackathon at GCC2015, we founded this group to represent the scientific community among Galaxy users. To have a structured voice concerning issues, feedback, needs. To work together and improve conducting Galaxy-based research. We would like to revive this group, so, if you liked what we were doing during the Datathon, but also if you have not been there and would like to contribute to the Galaxy scientific community, please join us.
If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology. GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups. Birds of a Feather meetups are informal gatherings where participants group together based on common interests.
BoF meetups during this slot are:
BoF meetups are encouraged throughout GCC2016.
If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
Discussion
GalaxyAdmins is a group of people that are responsible for administering Galaxy instances. We meet online every other month and at events like GCC2016, where a lot of us happen to be.
GCC2016 will be the fourth in-person GalaxyAdmins meetup. Previous GalaxyAdmins BoFs were very well attended and have resulted in several action items, many of which have since been implemented.
This meetup will discuss plans for the coming year, GalaxyAdmins leadership, and whatever else participants want to talk about.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
We are interested in a general discussion of Genome Annotation problems and solutions.
If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
You are on your own for dinner this evening (Sunday). See the bottom of the conference location page for links to nearby options. Or, if you just want to wander, see the online map for restaurant-enriched neighborhoods. Fourth street from Indiana Avenue to Walnut St. and Fifth Street (Kirkwood Avenue) from Indiana Avenue to Rogers St. both have an array of amazing options. The square downtown is a great find as well.
Find someone you don't know, share a meal, and learn what others are up to.There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology. GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups. Birds of a Feather meetups are informal gatherings where participants group together based on common interests.
BoF meetups are encouraged throughout GCC2016. This session will likely be split into several distinct blocks, enabling participants to attend more BoFs.
If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology. GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups. Birds of a Feather meetups are informal gatherings where participants group together based on common interests.
BoF meetups are encouraged throughout GCC2016.
If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
Bioinformatics skills have become essential to modern biologists, yet many schools do not have bioinformatics programs or even a designated bioinformatics course at the undergraduate level. What are the list of qualifications you wish to see from your incoming graduate students? How can we prepare them to meet your expectations? If you have any thoughts on bioinformatics curriculum development and/or faculty training, please join me. (If the meeting time does not work for you, please contact me at zxu@bgsu.edu at any time. Thank you!)
If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.
And, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.
That's why we chose Galaxy. ChemFlow is being implemented with our own functions. By now it includes most of the processing tools : import and convert our data; run chemometrics methods such as calibrations and classifications.
We are very satisfied of the performances of Chemflow running on a server. Nevertheless, some issues were fixed, others are still pending:
As a summary, Galaxy is used in a new domain, chemometrics, adressed to a new user community, and will be a central platform for a new e-learning module, as a MOOC.
→ Slides doi: 10.7490/f1000research.1112750.1
→ Video
Author
Abdulrahman Azab
Björn Grüning
Abstract
This talk is relevant mainly for advanced developers and sysadmins who wish to support docker on their systems but skeptical about docker being insecure. This is also relevant for running Galaxy in production on the top of a HPC system.
How to configure the system to run docker containers as the local user in a very simple and quick way without having to worry about e.g. having connection to LDAP from containers.
→ Slides doi: 10.7490/f1000research.1112751.1
→ Video
As use of Galaxy increases and computational resources are continuously busy it becomes important to optimize resource usage. To address this issue, we have developed Dynamic Tool Destination (DTD), which is a dynamic job destination that works with all tools and destinations. In DTD an administrator sets up rules for each tool in a YAML file, these rules define what destination a tool should go to when particular parameters are present, input data is large or small, etc. DTD is open source under the Apache License and is available on github at https://github.com/phac-nml/dynamic-tool-destination
→ Slides doi: 10.7490/f1000research.1112752.1
→ Video
Classic bioinformatics curricula are limited by a relatively rigid course compartmentalization, employment of expensive IT/Bioinformatics proprietary tools, and limited grading system as an outcome for completing the course. Here we present a curriculum infused with real-life research-based projects such as whole genome analysis, gene expression array and molecular dynamics, applied for aging, cancer and pharmacogenomics. These projects serve as pivotal points for integrating biomedical, computer science and statistics into one coherent interdisciplinary subject known as bioinformatics. Each project has scientific objectives serving as underlying platform for educational goals. Students join the projects after completing a basic course familiarizing them with the technical and scientific aspects of the projects. The curriculum is based on 100% open source, cutting edge, evolving technology. This allows teaching students to use the most current technology at the fraction of proprietary software price. The utilization of real-life projects brings excitement of involvement in pertinent discoveries and facilitates learning and open sharing of ideas. As the outcome of completing the projects, students will develop the skills, knowledge, and hands-on experience that will make them competitive in today's intensive and rapidly changing field of computational biology.
→ Slides doi:10.7490/f1000research.1112753.1
→ Video
Metaproteomics characterizes proteins expressed by microorganism communities (microbiome) present in environmental samples or a host organism. Mass spectrometry (MS)-based metaproteomics has catalyzed new discoveries into the functional dynamics of microbiomes (Wilmes et al 2015, doi: 10.1002/pmic.201500183). Metaproteomic informatics is distinctly challenging due to the large databases and complex processing steps involved. This challenge limits widespread use of metaproteomics. Through modular workflows, we demonstrate the use of the Galaxy bioinformatics framework as a metaproteomic informatics solution (Jagtap et al 2015; doi: 10.1002/pmic.201500074). The workflow output results are compatible with tools for taxonomic and functional characterization (Unipept and MEGAN5). MEGAN5 was used to generate functional characterization of the metaproteome using Inter2Pro pathway analysis. These workflows enable new discoveries from diverse communities such as dental plaques (Rudney et al 2015, doi: 10.1186/s40168-015-0136-z), bronchoalveolar lavage fluid (BALF), lung tissue, and cervical-vaginal fluid (CVF). Our results demonstrate the power of discovery metaproteomics to add functional understanding to microbiomes, beyond what is possible using traditional metagenomic approaches.
→ Slides doi: 10.7490/f1000research.1112754.1
→ Video
GenAP is a Canadian platform that provides Galaxy instances across different Canadian HPC centers. Having more that 7 TB of reference genomes, replicating this data in all HPC centers becomes expensive and hard to keep in synch. Cern VM files system (CVMFS) allow us to centralize the provisioning, replicate the data and distribute genome references on demand. In CVMFS the local machine only imports the genomes necessary for the job being run allowing the use of a minimal storage by the HCP centers.
→ Slides doi: 10.7490/f1000research.1112755.1
→ Video
Circos is a biologist favourite tool for production quality plots, however there is an extremely large activation energy in building the initial plots due to Circos' steep learning curve. We have worked to developing a generic and easily configurable Galaxy tool permitting the generation of Circos plots, while providing the generated configuration files in order to allow further tweaking and customization after the fact. We have made the tool publicly available during development and have already received contributions during the GCC2016 Hackathon.
→ Slides doi:10.7490/f1000research.1112756.1
→ Video
Integrating workflow support for GenomeSpace into Galaxy.
The GenomeSpace importer/exporter itself has been rewritten as a standalone pip installable tool, available here: https://github.com/gvlproject/python-genomespaceclient. We hope to transfer that code back back into GenomeSpace or Galaxy as a set of Python bindings + commandline client for GenomeSpace.
There's a 3 minute video of how things work here:
https://www.youtube.com/watch?v=5QPtWS_ab0I
Seehttps://github.com/galaxyproject/galaxy/pull/1814
→ Slides doi: 10.7490/f1000research.1112757.1
→ Video
Monarch (https://monarchinitiative.org) integrates a variety of genomic, phenotypic, and disease data by leveraging ontologies to create relationships across multiple organisms.
We have (quickly using planemo!) created a galaxy tool to wrap the web services exposed by monarch, including the phenopacket implementation.
Please let us know how we can improve on this first cut. Looking forward to getting some feedback from you
→ Slides doi: 10.7490/f1000research.1112758.1
→ Video
A report on the GCC2016 Datathon.