Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Visit the main conference website:  http://galaxyproject.org/gcc2016
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Friday, June 24
 

6:00pm

Conference desk open
The conference desk will be open on Friday night in the IMU East Lounge, if you want to check in before the Hackathons start on Saturday morning.

 
Saturday, June 25
 

9:00am

Hackathon Breakfast
Continental breakfast will be provided.  Feel free to stop by throughout the day for sustenance and networking opportunities.

9:00am

Conference desk open
Check-in the day before!  Check-in will be quite hectic on Sunday morning, and if you are around, you are encouraged to check-in on Saturday.

10:00am

10:00am

7:00pm

Training Office Hours
Need some help getting your laptop setup for tomorrow's training sessions?  Experts will be available to help you get any virtual machines and other software installed on your laptop, so you can hit the ground running when the first session starts.

Note: that this has been moved to the library from the IMU. 

 
Sunday, June 26
 

8:00am

8:00am

9:00am

Hackathon Breakfast
Continental breakfast will be provided.  Feel free to stop by throughout the day for sustenance and networking opportunities.

9:00am

Introduction to Galaxy
New to Galaxy? This will introduce you to the Galaxy Project, the Galaxy Community, and walk you through a simple use case demonstrating what Galaxy can do. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites:
  • Little or no knowledge of Galaxy.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Anton Nekrutenko

Anton Nekrutenko

Galaxy Project, Penn State University


9:00am

Metagenomics with Galaxy
Tools and workflows for the analysis and visualisation of metagenomics data sets.

Software & Downloads:
Prerequisites:
  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 
 

Instructors
avatar for Daniel Blankenberg

Daniel Blankenberg

Galaxy Project, Penn State University


9:00am

Galaxy Architecture

➔ Slides (10.7490/f1000research.1113243.1

Want to know the big picture about what is going on inside Galaxy? This workshop will give participants a practical introduction to the Galaxy code base with a focus on changing those parts of Galaxy most often modified by local deployers and new contributors. 

The workshop will include the following specific content:

  • A description of the various file and top-level directories in the Galaxy code base.
  • An overview of important Python modules - including models, tools, jobs, workflows, visualizations, and API controllers.
  • An overview of important Python objects and concepts in the Galaxy codebase - including the Galaxy transaction object ("trans"), the application object ("app") , and the configuration object ("config").
  • An overview of various plugin extension points.
  • An overview of important JavaScript modules that power the front-end.
  • An overview of important JavaScript concepts used by Galaxy - in particular RequireJS, Backbone MVC, and grunt.
  • An overview of the client build system used to generate compressed JavaScript, cascading stylesheets, and other static web assets.
  • A demonstration of a complete start-to-finish modification of Galaxy - including forking the project on Github, modifying files, running the tests, checking style guidelines, committing the change, pushing it back to your local Github fork, and opening a pull request.
  • A brief description of other projects in the Galaxy ecosystem (CloudMan, the Tool Shed, bioblend, docker-galaxy-stable, Pulsar, and Planemo).

Prerequisites:

  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • Comfort with scripting programming languages (Python and JavaScript will be the most helpful).

 


Instructors
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University



9:00am

Setting up a Galaxy instance as a service
Slidesdoi:10.7490/f1000research.1113080.1

This session will use a virtual machine image.  Please download the image from here or here or here before GCC2016 starts.  The image is 2.4 GB and will take some time to download.  You will also want to install the latest version of VirtualBox as well.

The talk guidelines (including a link to the slides) can be found here.

In this workshop, you will learn what is important when you set up a Galaxy server from scratch, what are the pitfalls you might run into, how to interact with the potential users of the service you gonna offer, and how to make sure, the Galaxy instance you have set up is really used in the end. After a general introduction, several Galaxy installations are presented. The session will include some demonstrations and hands-on exercises. We will finish with a panel discussion, where we intend to discuss questions from the workshop participants.

Prerequisites:
  • Familiar with the Bioinformatics problems (and their solutions) that wet lab scientists run into.
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.

Instructors
avatar for Hans-Rudolf Hotz

Hans-Rudolf Hotz

Friedrich Miescher Institute for Biomedical Research
avatar for Jochen Bick

Jochen Bick

ETH Zürich
avatar for Nikolay Aleksandrov Vazov

Nikolay Aleksandrov Vazov

Senior Engineer, University of Oslo
Galaxy in the context of : | - galaxy - cluster communiction | - authentication and authorization | - resource allocation management (accounting of resources) | - reporting and user/project management in Galaxy | - multiple galaxy server installations with common and customized features (services)



9:00am

The Galaxy Docker Project
➔ Slides

In this session you will learn the internals of the Docker Galaxy Image. We will show you tips and tricks on how to run the Galaxy Docker Image successfully in production, how to manage updates and how to bind the container to a cluster scheduler. Moreover, you will learn how to create your own Galaxy flavour mixing a variety of different tools and visualisations.

Prerequisites:
  • Basic understanding of Galaxy from a developer point of view.
  • General knowledge about Docker
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Enis Afgan

Enis Afgan

Galaxy Project, Johns Hopkins University
Everything 'Galaxy on the Cloud' related!
avatar for Marius van den Beek

Marius van den Beek

IBPS / Université Pierre et Marie Curie



10:00am

10:00am

11:30am

12:30pm

Beyond the Intro: Further adventures in using Galaxy

This workshop continues where the Introduction to Galaxy session leaves off. Additional features of Galaxy will be introduced and several topics introduced in that first session will be explored in more detail. Topics covered will include

  • Uploading data via FTP
  • History management
  • Defining and using custom reference genomes
  • Using Tagging and Annotation to manage your Galaxy objects
  • More on workflow editing and management
  • More on sharing and publishing
  • Using Galaxy to help debug your analyses
Prerequisites:
  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 
 

Instructors
avatar for Daniel Blankenberg

Daniel Blankenberg

Galaxy Project, Penn State University


12:30pm

Human Variant Calling with Galaxy

→ Tutorial

The tutorial is designed to introduce the tools, datatypes and workflow of variation detection using human genomic DNA using a small set of sequencing reads from chromosome 20. In this session we will:

  • Evaluate the quality of the short data. If the quality is poor, then adjustments can be made – e.g. trimming the short reads, or adjusting your expectations of the final outcome.
  • Map each of the individual reads in the sample FASTQ readsets to a reference genome, so that we can then identify the sequence changes with respect to the reference genome. Some of the variant callers need extra information regarding the source of reads in order to identify the correct error profiles to use in their statistical variant detection model, so we add more information into the alignment step so that that generated BAM file contains the metadata the variant caller expects.
  • Calling Variants using the GATK Unified Genotyper. The GATK Unified Genotyper is a Bayesian variant caller and genotyper from the Broad Institute. Many users consider the GATK to be best practice in human variant calling.
  • Try an alternative caller: Mpileup
  • Evaluate known variations. We know a lot about variation in humans from many empirical studies, including the 1000Genomes project, so we have some expectations on what we should see when we detect variants in a new sample.
  • Annotate the detected variants against the ensembl database and interpret the annotation output.

Prerequisites:

  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Pip Griffin

Pip Griffin

University of Melbourne
avatar for Simon Gladman

Simon Gladman

Bioinformatician, VLSCI / University of Melbourne
avatar for Torsten Seemann

Torsten Seemann

University of Melbourne


12:30pm

Get your own Galaxy within minutes
➔ Tutorial

Dozens of public Galaxy servers are a great community resource. However, if you need more capacity (compute or storage) or a custom toolset, it is necessary to install your own Galaxy server. In this workshop, you will learn how to create your own Galaxy within a few minutes using just a web browser. We will use cloud resources to create a full-featured instance that is ready for use without limitations of public servers while requiring no technical expertise. You will then learn how to easily install additional tools for your lab’s specific needs. The workshop will also cover how to get free access to some academic clouds from around the world.

Prerequisites:
  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 
 

Instructors
avatar for Enis Afgan

Enis Afgan

Galaxy Project, Johns Hopkins University
Everything 'Galaxy on the Cloud' related!
avatar for Nitesh Turaga

Nitesh Turaga

Software Engineer, Galaxy Project, Johns Hopkins University
Galaxy
avatar for Nuwan Goonasekera

Nuwan Goonasekera

VLSCI / University of Melbourne



12:30pm

The Galaxy Database Schema
Running a production Galaxy server, you some times end up in with a situation, where you manually need to interact with the database. e.g. you need to change the state of a job to 'error'. This is always a very risky adventure. Or a not-at-all risky situation: you want to extract usage information, which can not be gathered using the given report tools. For both cases, you need a good understanding of the Galaxy database schema. Learn some of the design concepts of the database, which parts of the schema are stable, and which will be changing in the foreseeable future. Also, some advice will be given on how to migrate a production Galaxy server running on MySQL to PostgreSQL.

Session Document

Prerequisites:
  • Experience maintaining a production Galaxy server Basic knowledge of relational databases and SQL statements.
 

Instructors
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University
avatar for Hans-Rudolf Hotz

Hans-Rudolf Hotz

Friedrich Miescher Institute for Biomedical Research


12:30pm

Writing & Publishing Galaxy Tools

Tutorial

This session will walk developers and bioinformaticians through the process of taking a working script or application and turning it into a Galaxy tool. It will also cover the basics of using Planemo: a command-line utility to assist in building and publishing Galaxy tools. We will investigate wrapping, common parameters, tool linting, best practices, loading tools into Galaxy, citations, and publishing tools to Github and the Galaxy Tool Shed. Common tips and tricks will be discussed as well as insights from experienced tool developers.

Prerequisites

  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

 


Instructors
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Dave Bouvier

Dave Bouvier

Galaxy Project, Penn State University
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Marius van den Beek

Marius van den Beek

IBPS / Université Pierre et Marie Curie
avatar for Nicola Soranzo

Nicola Soranzo

The Genome Analysis Centre (TGAC)


3:00pm

Training Break
Network and meet your fellow GCC2016 participants!

Drinks and snacks will be provided.

3:30pm

ChIPseq analysis using deepTools and MACS
➔ Slides, doi: 10.7490/f1000research.1112903.1

Did my IP work? Where is my signal? How well do my replicates correlate? What might my peaks even look like? Where are my peaks (or signal) in relationship to transcription start sites (or other features)? These are common questions that biologists first pose when dealing with ChIPseq data. We will use deepTools and MACS within Galaxy to demonstrate effective methods of (A) performingChIPseq-specific quality control, (B) calling peaks and (C) visualizing signal and peak enrichment around genes or other features.

Prerequisites:
  • A basic familiarity with using Galaxy (how to import datasets and run tools).
  • Ideally participants will already be familiar with generic NGS quality control and read mapping, since those won't be covered

Instructors
avatar for Devon Ryan

Devon Ryan

Max Planck Institute of Immunobiology and Epigenetics (MPI-IE)



3:30pm

RNA-seq analysis with Galaxy, using advanced workflows

→ Tutorial

This workshop would cover standard, advanced, and alternative RNAseq analysis pipelines, all using workflows and highlighting their advanced features. Three general pipelines would be addressed:

  • A standard RNAseq analysis pipeline using the Tuxedo suite (Tophat → Cuffdiff) for standard transcript quantification with a reference transcriptome.

  • An advanced analysis pipeline using the Tuxedo suite with StringTie to create de novo transcript structures, merge these with reference transcripts to create a transcripteome database, followed by transcript quantification.

  • An alternative RNAseq analysis pipeline using count based quantification methods (DESeq2, edgeR, or limma) to generate abundance measurements.

These three pipelines would be used as examples to highlight usage of workflows and their advanced features.

Prerequisites: 

  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Pip Griffin

Pip Griffin

University of Melbourne
avatar for Simon Gladman

Simon Gladman

Bioinformatician, VLSCI / University of Melbourne
avatar for Torsten Seemann

Torsten Seemann

University of Melbourne


3:30pm

BioinforMagic: Marrying Galaxy and Bioconductor
Tutorial, doi: 10.7490/f1000research.1112904.1

R/Bioconductor tools are popular among bioinformaticians for conducting analyses on high-throughput genomics data. Many researchers have written and used their own R/Bioconductor tools but have limited options for sharing their methods with collaborators and ensuring reproducibility of their work. While some R/Bioconductor tools are currently integrated into Galaxy, scientific communities would benefit greatly if a larger repertoire of these tools were available. In this workshop, we will describe the key components of Galaxy tools and outline approaches for integrating R/Bioconductor tools into the Galaxy platform. Workshop attendees will engage in writing their own R/Bioconductor tool and integrating the tool into Galaxy. Additional topics covered in this workshop include wrapping R/Bioconductor tools using Planemo, dealing with R/Bioconductor dependencies, and if time permits interacting with RData objects from within Galaxy using RStudio Galaxy Interactive Environment (GIE). Participants are encouraged to bring in an R-script they want to integrate with Galaxy for the hands on session.

Prerequisites:
  • Basic understanding of Galaxy from a tool developer point of view or attendance at the Writing and Publishing Galaxy Tools workshop.
  • Basic experience with R/Bioconductor.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Mallory Freeberg

Mallory Freeberg

Johns Hopkins University
avatar for Nitesh Turaga

Nitesh Turaga

Software Engineer, Galaxy Project, Johns Hopkins University
Galaxy



3:30pm

How to use Galaxy Ansible Playbooks

Tutorial, Source files

The Galaxy project has developed a significant number of Ansible roles that enable anyone to build a production-level Galaxy server on any infrastructure without much manual effort. In this workshop, we will cover the purpose of the available roles and how they relate to each other. To showcase their use, we will build a complete Galaxy server with personal choice of tools using only a handful of commands.

Prerequisites:

  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • Some knowledge of running a Galaxy server.

Instructors
avatar for Enis Afgan

Enis Afgan

Galaxy Project, Johns Hopkins University
Everything 'Galaxy on the Cloud' related!
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University



3:30pm

Introduction to Galaxy Interactive Environments

→ Slidesdoi: 10.7490/f1000research.1112906.1

In this session you will get an introduction to Interactive Environments (IE) as an easy and powerful way to integrate arbitrary interactive web services into Galaxy. We will demonstrate the IPython Galaxy project and the general concept of IE’s.

Prerequisites:

  • Basic understanding of Galaxy from a developer point of view.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

 


Instructors
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Marius van den Beek

Marius van den Beek

IBPS / Université Pierre et Marie Curie



6:00pm

Dinner (on your own)

You are on your own for dinner this evening.  See the bottom of the conference location page for links to nearby options.  Or, if you just want to wander, see the online map for restaurant-enriched neighborhoods.  Fourth street from Indiana Avenue to Walnut St. and Fifth Street (Kirkwood Avenue) from Indiana Avenue to Rogers St. both have an array of amazing options.  The square downtown is a great find as well. 3rd St. from Wilkie to just past Jordan Ave. also has restaurant options.  Lennie's on 10th St. near the Herman B Wells Library is also very good.


Sunday June 26, 2016 6:00pm - 7:30pm
TBA

7:30pm

Galactically Speaking: Best Practices and Resources for Galaxy Training
Session notes

This workshop will review best practices and resources for teaching Galaxy, and bioinformatics with Galaxy. We’ll cover best practices for teaching as well as recommended compute infrastructures and resources for Galaxy trainers (all compiled by Galaxy Training Network members). If you use Galaxy for training, and want to learn from others and share your best practices, then this workshop is for you. Unlike most other workshops, this is not a hands-on session. Rather, this will session will be a series of discussions on topics useful to Galaxy trainers.

Participants will be polled before the workshop to identify areas/issues from their experience and the results will be incorporated into the presentations.

Prerequisites:
  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • An interest and/or experience in teaching bioinformatics and Galaxy.

 

Instructors
avatar for Galaxy Training Network Members

Galaxy Training Network Members

Galaxy Training Network


8:30pm

Training Office Hours
Need some help getting your laptop setup for tomorrow's training sessions?  Experts will be available to help you get any virtual machines and other software installed on your laptop, so you can hit the ground running when the first session starts.

 
Monday, June 27
 

8:00am

8:00am

9:00am

Introduction to Galaxy
New to Galaxy? This will introduce you to the Galaxy Project, the Galaxy Community, and walk you through a simple use case demonstrating what Galaxy can do. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites:
  • Little or no knowledge of Galaxy.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Anton Nekrutenko

Anton Nekrutenko

Galaxy Project, Penn State University


9:00am

Small Genomes de novo Assembly and Scaffolding

→ Tutorial

Workshop will cover the basics of de novo genome assembly using a small genome example. This includes project planning steps, selecting fragment sizes, initial assembly of reads into fully covered contigs, and then assembling those contigs into larger scaffolds that may include gaps. The end result will be a set of contigs and scaffolds with sufficient average length to perform further analysis on, including genome annotation (link to that nomination). This workshop will use tools and methods targeted at small genomes. The basics of assembly and scaffolding presented here will be useful for building larger genomes, but the specific tools and much of the project planning will be different.

Prerequisites:

  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

 

 


Instructors
avatar for Pip Griffin

Pip Griffin

University of Melbourne
avatar for Simon Gladman

Simon Gladman

Bioinformatician, VLSCI / University of Melbourne
avatar for Torsten Seemann

Torsten Seemann

University of Melbourne


9:00am

Dynamic prototyping of tools with Galaxy ProTo
Download the virtual machine image for this session.

In addition to being a feature-rich framework for biomedical research, Galaxy can also be thought of as a simple way to provide web access to locally developed functionality. Galaxy can for instance be used by master students to showcase their developed functionality to the supervisors and examiners, or it can be used by researchers to easily provide access to their ad hoc developed scripts. For such use, however, Galaxy poses some limitations. For one, the developer needs to learn the XML format used by Galaxy, with all the twists and turns inherent in the format. Also, the format itself has limited support for dynamics in the parameter option boxes, e.g. for providing the user with direct feedback based upon dynamic calculations within the interface itself.

The Galaxy Prototyping Tool API (Galaxy ProTo) is a new tool building methodology introduced by the Genomic HyperBrowser project. Galaxy ProTo is an unofficial alternative for defining Galaxy tools. Instead of XML files, Galaxy ProTo supports defining the user interface of a tool as a Python class. There are no limitations to what kind of code that can be executed to generate the interface. For instance one could read the beginning of an input file and provide dynamic options based on the file contents. When developing a ProTo tool, results of changes in the code can be witnessed on-the-fly in a web browser; there is no need to reload the tool or restart the Galaxy server. When development is finished, a ProTo tool can be easily be installed into the Galaxy tool menu alongside the standard Galaxy tools. Galaxy ProTo thus empowers developers without Galaxy experience to easily develop Galaxy tools, both for prototyping purposes, but also for developing fully functional, interactive tools.

This session will cover a number of example for creating Galaxy ProTo-based tools using different options and various levels of dynamicity. The participants will try developing their own ProTo tools on-the-fly, on VMs

Session document

Prerequisites:
  • Python programming experience
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 
  • VirtualBox as a VM manager

Instructors
avatar for Abdulrahman Azab

Abdulrahman Azab

Head Engineer, University of Oslo
avatar for Boris Simovski

Boris Simovski

University of Oslo



9:00am

Galaxy on the Cloud: build it

Tutorialdoi: 10.7490/f1000research.1112907.1

This will be a technical workshop covering the process of creating a Galaxy on the Cloud platform, for a range of clouds. We will look at how to use Galaxy Ansible automation playbooks to build all the components required to run Galaxy on a cloud using CloudMan. At the end of this workshop, you will know how to create a custom version Galaxy on any supported cloud (AWSOpenStack) and allow others to then easily and independently launch those. Specifically, it will cover the process of creating the machine image, galaxy and indices file systems as well as installing Cloud Launch.


Prerequisites:

  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • Familiarity with the Galaxy on the Cloud concepts.

Instructors
avatar for Enis Afgan

Enis Afgan

Galaxy Project, Johns Hopkins University
Everything 'Galaxy on the Cloud' related!
avatar for Nuwan Goonasekera

Nuwan Goonasekera

VLSCI / University of Melbourne



9:00am

The Galaxy Docker Project
➔ Slides

In this session you will learn the internals of the Docker Galaxy Image. We will show you tips and tricks on how to run the Galaxy Docker Image successfully in production, how to manage updates and how to bind the container to a cluster scheduler. Moreover, you will learn how to create your own Galaxy flavour mixing a variety of different tools and visualisations.

Prerequisites:
  • Basic understanding of Galaxy from a developer point of view.
  • General knowledge about Docker
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Marius van den Beek

Marius van den Beek

IBPS / Université Pierre et Marie Curie



11:30am

11:30am

Birds-of-a-Feather Flocking

There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology.  GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups.  Birds of a Feather meetups are informal gatherings where participants group together based on common interests.

BoF meetups scheduled during this time are:


BoF meetups are encouraged throughout GCC2016.

If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.

 


Monday June 27, 2016 11:30am - 12:30pm
IMU: Indiana Memorial Union 900 E 7th St, Bloomington, IN

11:30am

Application containers for the win! Birds-of-a-feather

Application portability conundrum of Galactic proportions.

Using application containers for portability, especially overcoming Toolshed build issues.

The wonderful, but checkered history of Toolshed distribution of galaxy tools has had its share of delights and frustrations, the latter often attributed to Linux distribution issues, but nevertheless big time sinks. While a lot of work in the Galaxy ecosystem is being done towards Dockerizing everything some sites have no opportunity to run Docker. Let's talk about some other possibilities focused on the application portability to try to make toolshed tool distribution as painless as possible.

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.


And
, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.



11:30am

European Galaxy developer school Birds-of-a-feather

A developer school is planned in January 2017, in Strasbourg, organized by Elixir (European bioinformatics HUB) and the French Institute of Bioinformatics (the Elixir French national node). This BoF is a discussion in order to fix training modules that will be proposed. The first discussion about this event has been animated around these slides, during the all-hands Elixir meeting in Barcelona, 2016.

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.


And
, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Moderators
avatar for Frederik Coppens

Frederik Coppens

Project Leader, VIB
avatar for Jean-François Dufayard

Jean-François Dufayard

Researcher, CIRAD


12:30pm

Beyond the Intro: Further adventures in using Galaxy

This workshop continues where the Introduction to Galaxy session leaves off. Additional features of Galaxy will be introduced and several topics introduced in that first session will be explored in more detail. Topics covered will include

  • Uploading data via FTP
  • History management
  • Defining and using custom reference genomes
  • Using Tagging and Annotation to manage your Galaxy objects
  • More on workflow editing and management
  • More on sharing and publishing
  • Using Galaxy to help debug your analyses
Prerequisites:
  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 
 

Instructors
avatar for Daniel Blankenberg

Daniel Blankenberg

Galaxy Project, Penn State University


12:30pm

Small Genome Annotation

Tutorial

Genome assembly produces the raw genomic sequence of an organism. Genome annotation adds meaning to sequence by associating structural and functional annotation with specific regions (loci) on the genome. This workshop will introduce genome annotation in the context of small genomes. We'll begin with genome annotation concepts, and then introduce resources and tools for automatically annotating small genomes. The workshop will finish with a review of options for further automatic and manual tuning of the annotation, and for maintaining it as new assemblies or information becomes available.

Prerequisites:

  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

 

 


Instructors
avatar for Pip Griffin

Pip Griffin

University of Melbourne
avatar for Simon Gladman

Simon Gladman

Bioinformatician, VLSCI / University of Melbourne
avatar for Torsten Seemann

Torsten Seemann

University of Melbourne


12:30pm

Using Galaxy for proteomic and integrative multi-omic data analysis

Slidesdoi: 10.7490/f1000research.1112908.1

This hands-on workshop will take participants through the essential steps for using Galaxy for the analysis of mass spectrometry (MS)-based proteomics data, focusing protein identification from large-scale datasets, and more advanced applications integrating genomic data with proteomic data. Introductory material will be presented on the basics of MS-based proteomics informatics and also emerging applications integrating genomic and proteomic data (an area called proteogenomics).  

The workshop will be constructed to follow the steps of proteomic and proteogenomic workflows. Analysis modules corresponding to each of these steps will be described and demonstrated, following the structure below:

  1. Database generation and raw data processing

    Attendees will be guided through the use of tools for selecting and generating databases – either standard databases or customized database for proteogenomics derived from genomic data (e.g. RNA-seq data). Tools for converting raw data to processed peak lists for further analysis will also be described.

  2. Sequence database searching

    Attendees will learn about available software in Galaxy for sequence database searching, which identifies proteins via matching of MS data to sequence databases. Use of these tools and optimization of parameters will be demonstrated and discussed.

  3. Results visualization and interpretation

    Attendees will be exposed to a variety of tools for visualizing and filtering results in Galaxy. Emphasis will be on tools useful for filtering identified proteins from proteogenomic analyses, where quality control of results is essential to generate high confidence results.

At the end of the workshop, attendees will have working knowledge of MS-based proteomics tools in the Tool Shed, experience in setting up basic workflows for protein identification, as well as more advanced applications in proteogenomics. Attendees will also have a better comprehension of the pitfalls encountered when interpreting data from these applications, and tools in Galaxy to help ensure confidence in results. 

Participants will be given temporary accounts to a cloud-based Galaxy instance to participate in hands-on workshop activities.

Prerequisites:

  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

 


Instructors
avatar for James (JJ) Johnson

James (JJ) Johnson

Minnesota Supercomputing Institute, University of Minnesota
avatar for Pratik Jagtap

Pratik Jagtap

Center for Mass Spectrometry and Proteomics, University of Minesota
avatar for Timothy Griffin

Timothy Griffin

Center for Mass Spectrometry and Proteomics, University of Minnesota



12:30pm

Advanced Topics in Galaxy Tool Development

→ Tutorial

This workshop is aimed at people with some experience developing tools and will cover more advanced topics in tool development, more complex tools, and recent enhancements to the Galaxy tool development process including: 

  • Driving tool development using testing (test driven development or TDD).
  • Designing tools for use with the dataset collections.
  • Maintaining suites of Galaxy tools - subtopics include Tool Shed concerns & macros.
  • Publishing tools with complex dependencies to the Tool Shed.

Prerequisites:

  • Basic Knowledge of Galaxy Tools, or attendance at the Writing and Publishing Galaxy Tools session.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

 

 


Instructors
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Dave Bouvier

Dave Bouvier

Galaxy Project, Penn State University
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Marius van den Beek

Marius van den Beek

IBPS / Université Pierre et Marie Curie
avatar for Nicola Soranzo

Nicola Soranzo

The Genome Analysis Centre (TGAC)


12:30pm

Contributing Code to the Galaxy Project
→ Slides, doi: 10.7490/f1000research.1112911.1

Have you extended Galaxy in some way, or identified and fixed a bug in your installation? This session will cover how to contribute your code into the Galaxy Project trunk. We'll cover how submissions (i.e, pull requests) are handled by the project team, and best practices for preparing submissions.
Prerequisites:
  • Basic understanding of Galaxy from a developer point of view.
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Dannon Baker

Dannon Baker

Galaxy Project, Johns Hopkins University



3:00pm

Training Break
Network and meet your fellow GCC2016 participants!

Drinks and snacks will be provided.

3:30pm

RADseq Data Analysis Through STACKS on Galaxy

Slidesdoi: 10.7490/f1000research.1112912.1

RADseq
1 data allow scientists to gather genome wide information with a low-cost approach compared to complete genome sequencing. In this training session, we will show how to analyze RADseq data to

  1. build genetic maps2,
  2. calculate population genomics statistics3,4 and
  3. assemble paired-end loci with or without reference genome using Stacks5 on Galaxy

Stacks works with restriction-enzyme based data, including GBS, CRoPS, and single and double digest RAD. Stacksidentifies loci in a set of individuals, either de novo or aligned to a reference genome (including gapped alignments), and then genotypes each locus. See the Stacks Manual for full details. 

Stacks
 has been integrated into Galaxy and is available via the GUGGO Tool Shed.

Prerequisites:

  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

 

1. Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markersGenome Research. 17(2):240-248.

2. Amores A, Catchen J, Ferrara A, Fontenot Q, Postlethwait JH. (2011) Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome DuplicationGenetics 188(4):799-808.

3. Davey JW and Blaxter ML (2011) RADSeq: next-generation population geneticsBriefings in Functional Genomics. 10 (2): 108

4. Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. (2012) Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model SpeciesPLoS ONE 7(5): e37135.

5. Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH. (2011) Stacks: Building and Genotyping Loci De Novo From Short-Read SequencesG3 1(3):171-182


Instructors
avatar for Anthony Bretaudeau

Anthony Bretaudeau

INRA, BIPAA and GenOuest platforms
avatar for Gildas Le Corguillé

Gildas Le Corguillé

CNRS-UPMC Station Biologique de Roscoff
avatar for Yvan Le Bras

Yvan Le Bras

Research engineer, INRIA / EnginesOn
Initially a marine Biologist, focusing on Populations structure, Yvan received a PhD on quantitative genetics and genomics in Rennes University. After a one year postdoc at INSERM dedicated to Integrative genomics, he investigated an e-Science approach for Life Sciences during a 3 year postdoc project at INRIA / IRISA Rennes. One of the outcome of this project, called e-Biogenouest, is an innovative Virtual Research Environment (VRE) based on... Read More →



3:30pm

RNA-seq analysis with Galaxy, using advanced workflows

→ Tutorial

This workshop would cover standard, advanced, and alternative RNAseq analysis pipelines, all using workflows and highlighting their advanced features. Three general pipelines would be addressed:

  • A standard RNAseq analysis pipeline using the Tuxedo suite (Tophat → Cuffdiff) for standard transcript quantification with a reference transcriptome.

  • An advanced analysis pipeline using the Tuxedo suite with StringTie to create de novo transcript structures, merge these with reference transcripts to create a transcripteome database, followed by transcript quantification.

  • An alternative RNAseq analysis pipeline using count based quantification methods (DESeq2, edgeR, or limma) to generate abundance measurements.

These three pipelines would be used as examples to highlight usage of workflows and their advanced features.

Prerequisites: 

  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Pip Griffin

Pip Griffin

University of Melbourne
avatar for Simon Gladman

Simon Gladman

Bioinformatician, VLSCI / University of Melbourne
avatar for Torsten Seemann

Torsten Seemann

University of Melbourne


3:30pm

Visualization of Omics Datasets in Galaxy

Slidesdoi: 10.7490/f1000research.1112913.1

This workshop will cover visualization in Galaxy for both primary high-throughput sequencing /next-generation sequencing (NGS) analyses—alignments, variants, expression levels, and annotations—as well as visualization of downstream and aggregated datasets using histograms, heat maps, and other numerical plots. First, using datasets from a combined exome and transcriptome (RNA-seq) experiment, participants will visualize data using Galaxy’s genome browser and Circos plot. Participants will learn how to create a genome visualization, add data, configure data, move between a genome browser view and Circos view, and share complex genome visualizations with more than 12 NGS datasets. Second, using an integrated datasets of genomics and other -omics information, participants will create a several numerical plots (e.g., scatter plot, histogram) to gain an overview of the data. Based on insight gained from these visualizations, participants will create a heatmap to identify patterns and potential causal factors. All visualizations will be created, saved, and shared using only Galaxy and a Web browser; no data or software downloads will be necessary.

Prerequisites:

  • A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101 or have attended Introduction to Galaxy).
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best.

Instructors
avatar for Aysam Guerler

Aysam Guerler

Galaxy Project, Johns Hopkins University
avatar for Jeremy Goecks

Jeremy Goecks

Galaxy Project, George Washington University



3:30pm

Advanced Topics in Galaxy Interactive Environments

Slides, doi: 10.7490/f1000research.1112914.1 

In this session you will get in-depth introduction to Interactive Environments (IE). You will learn how to setup and secure IE’s in a production Galaxy instance. Moreover, we will create an IE on-the-fly to get you started in creating your own Interactive Environments.

Prerequisites:

  • Basic understanding of Galaxy from a developer point of view.
  • General knowledge about Docker
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best. 

Instructors
avatar for Björn Grüning

Björn Grüning

University of Freiburg
avatar for Dave Bouvier

Dave Bouvier

Galaxy Project, Penn State University
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University
avatar for Marius van den Beek

Marius van den Beek

IBPS / Université Pierre et Marie Curie



3:30pm

Scripting Galaxy using the API and BioBlend

Slidesdoi: 10.7490/f1000research.1112915.1

Galaxy has an always-growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover a variety of approaches to making use of the API.

Prerequisites:

  • Basic understanding of Galaxy from a developer point of view.
  • Python programming.
  • A wi-fi enabled laptop with a modern web browser.  Google Chrome, Firefox and Safari will work best.

Slides are available here: https://docs.google.com/presentation/d/12wts6oaUH4TLKYMYBzCZPYI3Jf1wzl1ecK0IeCVkJ4s/edit?usp=sharing

Instructors
avatar for Dannon Baker

Dannon Baker

Galaxy Project, Johns Hopkins University
avatar for Nicola Soranzo

Nicola Soranzo

The Genome Analysis Centre (TGAC)



6:45pm

Conference desk open
The conference desk will move to the Cyberinfrastructure building during the opening reception.

7:00pm

Opening Reception with Bus Schedule
Jetstream, IU's newest National Science Foundation-funded project, and the National Center for Genome Analysis Support at IU, will sponsor a reception at the CIB with local wine/beer, morsels from local eateries, and demonstrations of the 15 million+ pixel IQ-Wall, IU's Data Center, Science on a Sphere, and other IU-centric IT.

Your name badge is your ticket to get on a bus.  The two blue tickets in your name badge holder are the key to receiving libations at the reception.  Make sure you bring your name badge.

Meet at the sidewalk by the stop sign near the guard gate on 7th street by the IMU circle drive ~6:25 and 7:25pm or at Wilkie North Tower circle drive at ~6:45pm or 7:45pm for transport or have a walk west out 10th street.  See map (CIB is two purple balloons on far right of map)




9:00pm

Birds-of-a-Feather Flocking

There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology.  GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups.  Birds of a Feather meetups are informal gatherings where participants group together based on common interests.

BoF meetups are encouraged throughout GCC2016.

If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.

 


Monday June 27, 2016 9:00pm - 10:30pm
IMU: Indiana Memorial Union 900 E 7th St, Bloomington, IN

9:00pm

GalaxyScientists Revival Birds-of-a-feather

Following the first Galaxy Data Hackathon at GCC2015, we founded this group to represent the scientific community among Galaxy users. To have a structured voice concerning issues, feedback, needs. To work together and improve conducting Galaxy-based research. We would like to revive this group, so, if you liked what we were doing during the Datathon, but also if you have not been there and would like to contribute to the Galaxy scientific community, please join us.

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.


And
, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Moderators
avatar for Frederik Coppens

Frederik Coppens

Project Leader, VIB
avatar for Christian Schudoma

Christian Schudoma

The Genome Analysis Centre (TGAC)

 
Tuesday, June 28
 

8:00am

8:00am

9:00am

Opening and Welcome
Opening comments and welcoime to the 2016 Galaxy Community Conference.

Speakers
avatar for Robert Ping

Robert Ping

Manager of Education and Outreach, Indiana University Pervasive Technology Institute, UITS Research Technologies
Robert Ping was Project Manager for the Data to Insight Center at Indiana University 2008-2013.  In his role as Manager of Education and Outreach for the IU Pervasive Technology Institute he promotes high performance computing, advanced visualization, and nine other service areas supported by UITS Research Technologies to the 228,000+ members of the IU community.  He has been with IU on and off for close to 25 years.


9:00am

Session 1
Session 1 features the Keynote Address by Yoav Gilad of the University of Chicago, and two accepted talks from the Galaxy Community.

Moderators
avatar for Robert Ping

Robert Ping

Manager of Education and Outreach, Indiana University Pervasive Technology Institute, UITS Research Technologies
Robert Ping was Project Manager for the Data to Insight Center at Indiana University 2008-2013.  In his role as Manager of Education and Outreach for the IU Pervasive Technology Institute he promotes high performance computing, advanced visualization, and nine other service areas supported by UITS Research Technologies to the 228,000+ members of the IU community.  He has been with IU on and off for close to 25 years.


9:15am

Keynote: Genomic variation. Impact of regulatory variation from RNA to protein
 Slides   doi:10.7490/f1000research.1112708.1

Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs).  We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation. Overall, about ~65% of eQTLs have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.

Speakers
avatar for Yoav Gilad

Yoav Gilad

University of Chicago
The keynote speaker will be Dr. Yoav Gilad, a professor of human genetics at the University of Chicago. | | Dr. Gilad earned a PhD in molecular genetics from the Weizmann Institute of Science in Israel, and completed an EMBO postdoctoral fellowship training at Yale University



10:00am

Proteogenomics in Galaxy: Identifying novel ‘constellations’ of proteoforms using transcriptomic and proteomic data.
 Slides    doi:10.7490/f1000research.1112709.1

Authors:

Pratik Jagtap, University of Minnesota
Getiria Onsongo, University of Minnesota
Candace Guerrero, University of Minnesota
James Johnson, University of Minnesota
Thomas McGowan, University of Minnesota
Matthew Andrews, University of Minnesota-Duluth
Timothy Griffin, University of Minnesota

Abstract

Proteogenomics has emerged as an effective approach for identifying novel proteoforms and improve genome annotation. For example, matching mass spectrometry proteomic data to customized sample-specific RNASeq-derived databases facilitates identification of previously unidentified peptides. Proteogenomic identification of such peptides, however, requires greater scrutiny to qualify them as bonafide novel proteoform candidates.

In order to address these challenges we have developed a blueprint of modular galaxy workflows (doi: 10.1021/pr500812t). These include a) database generation from RNASeq (doi: 10.1186/1471-2164-15-703) or cDNA datasets; b) database search strategies that improve sensitivity of peptide spectral matches (doi: 10.1002/pmic.201200352); c) Filtering tools for quality control and d) modules for visualization and interpretation of results.

These Galaxy workflows were used in several studies to provide biological insights. In a fractionated human salivary dataset, we identified multiple, novel peptides that mapped to the basic proline-rich proteins (PRB1 and PRB2) located on chromosome 12. In a quantitative study of heart muscle (doi: 10.1021/acs.jproteome.5b00575) and skeletal muscle protein expression (doi: 10.1021/acs.jproteome.5b01138) during hibernation in 13-lined ground squirrel, researchers were able to identify peptides corresponding to previously uncharacterized proteins. Identification of these peptides allowed for improved genomic annotation of this non-model organism and provides insights into muscle physiology during hibernation.

We will present recent improvements by Galaxy-P team to the above described blueprint workflows. This includes development of Multi-Omics Visualization Platform (MVP) Galaxy plugin that facilitates viewing novel peptide sequences in the context of reference genome sequences and RNASeq data - enabling interpretation and hypothesis generation for testing to understand biological significance. 

Speakers
avatar for Pratik Jagtap

Pratik Jagtap

Center for Mass Spectrometry and Proteomics, University of Minesota



10:20am

An Interactive Tool for Reproducible Analysis of Affinity Proteomics Data
→ Slides    doi:10.7490/f1000research.1112710.1

Authors:
Brent. M. Kuenzi (1), Adam Borne (2), Jiannong Li (3), Eric B. Haura (2), John Koomen (4), Paul A. Stewart (2), Uwe Rix (1)

Departments of (1) Drug Discovery, (2) Thoracic Oncology, (3) Biostatistics Core Facility, (4) Molecular Oncology, Moffitt Cancer Center, Tampa, FL 33612

Abstract
Understanding protein interactions and how they are altered in cancer is crucial for identifying new drug targets. Purification methods such as tandem affinity purification, affinity enrichment of labeled baits, and drug affinity chromatography have all been combined with mass spectrometry (affinity purification MS or AP-MS) to study protein interactions and complexes in cancer. However, if the scientist (e.g. a bench biologist or analytical chemist) lacks a computational background, then managing large proteomics datasets can be challenging, manually formatting data for input into analysis software can be error-prone, and data visualization involving dozens of variables can be laborious. These difficulties presented an opportunity to develop a solution that could move data from unprocessed AP-MS results to publication-quality figures in a single workflow. Here, we present Automated Processing of SAINT Templated Layouts (APOSTL), a Galaxy-based analysis pipeline for reproducible analysis of AP-MS data, and we demonstrate that this application streamlines the AP-MS data analysis workflow, improving both efficiency and consistency of the process. APOSTL utilizes Significance Analysis of INTeractome (SAINT), popular command-line software for analyzing AP-MS data. APOSTL can process AP-MS results from both MaxQuant and Scaffold, two widely used proteomics software, and APOSTL can create a number of publication-quality visualizations including interactive bubble plots, protein-protein interaction networks through Cytoscape.js integration, and pathway enrichment/gene ontology plots. All visualizations are accomplished through Shiny, an interactive and open-source visualization package for the R programming language. APOSTL is open-source software released under GPLv3, and it is freely available on the Galaxy Tool Shed and GitHub. 

Speakers
avatar for Paul A. Stewart

Paul A. Stewart

Moffitt Cancer Center



10:40am

11:10am

11:10am

Session 2
Session 2 features a talk from IU about what's happening here to support data-intensive science (and how Galaxy fits into that), plus 3 accepted talks.

Moderators
TM

Tea Muelia

The Ohio State University


11:40am

Sample Size Does Matter: Scaling Up Analysis in Galaxy with Metagenomics
Slides    doi:10.7490/f1000research.1112712.1

Authors  

  • Daniel Blankenberg, Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, 
  • Sarah J. Carnahan-Craig, Department of Biology, Penn State University, University Park, PA

Abstract

Metagenomics provides an exciting opportunity to begin to explore large-scale multiple sample analysis with Galaxy. As part of an obesity study, we have obtained over 400 buccal and stool samples from mother-child pairs. These samples have been subjected to 16S RNA extraction and sequencing on a MiSeq instrument. While sequencing 400 samples is no small feat, once generated, the data analysis reveals itself as crippling bottleneck.

Galaxy provides researchers with a vast quantity of tools and methods to analyze a wide-array of data, and makes connecting any number of tools together easy via Workflows. Although running a workflow individually over a handful of samples is approachable, how does one deal with 10, 20, or even 100 samples without becoming frustrated, introducing errors, breaking their mouse, or falling back to writing an API script? While Dataset Collection functionality provides a significant portion of a solution to this problem, there are still major hurdles that need to be overcome before Galaxy is usable for large multiple sample analysis.

Here we describe a generalizable metagenomic pipeline as implemented within Galaxy that is able to handle the simultaneous analysis of over 5,000 Human Microbiome Project samples. In addition to integrating a number of third-party algorithms and toolsets, some requiring the creation of upstream fixes and enhancements, we have developed new tools and approaches for dealing with large collections of data. Furthermore, we discuss the problems encountered using Galaxy at a large-scale, what has been done to overcome these issues, as well as initial results. 

Speakers
avatar for Daniel Blankenberg

Daniel Blankenberg

Galaxy Project, Penn State University



12:00pm

FROGS: Find Rapidly OTU with Galaxy Solution
 → Slides    doi:10.7490/f1000research.1112713.1

Authors

  • Frederic ESCUDIE, INRA Toulouse
  • Lucas AUER, INRA Toulouse
  • Maria BERNARD, INRA Jouy-en-Josas
  • Laurent CAUQUIL, INRA Toulouse
  • Katia VIDAL, INRA Toulouse
  • Sarah MAMAN, INRA Toulouse
  • Mahendra MARIADASSOU, INRA Jouy-en-Josas
  • Guillermina HERNANDEZ-RAQUET, INRA Toulouse
  • Geraldine PASCAL, INRA Toulouse

Abstract

High-throughput sequencing of 16S/18S/23S RNA amplicons has opened new horizons in the study of microbe communities. With the sequencing at great depth the current processing pipelines struggle to run rapidly and the most effective solutions are often designed for specialists. These tools are designed to give both the abundance table of operational taxonomic units (OTUs) and their taxonomic affiliation. In this context we developed the pipeline FROGS: « Find Rapidly OTU with Galaxy Solution ». Developed for biologists on the Galaxy platform.

A preprocessing tool merges paired sequences into contigs with flash, cleans the data with cutadapt, deletes the chimeras with VSEARCH combined with a cross-validation method and dereplicates sequences with a home-made python script. The clusterisation tool runs with SWARM that uses a local clustering threshold, not a global clustering threshold like other software do. The affiliation tool returns taxonomic affiliation for each OTU using both RDPClassifier and NCBIBlast+ on different databases (Silva, Greengenes). And finally, the post processing tool allows users to process this table with the user-specified filters and provides statistical results and numerous graphical illustrations of these data.

FROGS has been developed to be very fast even on large amounts of 454/HiSeq/MiSeq data in using cutting-edge tools and an optimized design, also it is portable on all Galaxy platforms. FROGS was tested on numerous simulated datasets. The tool has been extremely rapid, robust and highly sensitive for the OTU detection with very few false positives compared to other pipelines widely used by the community.  

Speakers
avatar for Yvan Le Bras

Yvan Le Bras

Research engineer, INRIA / EnginesOn
Initially a marine Biologist, focusing on Populations structure, Yvan received a PhD on quantitative genetics and genomics in Rennes University. After a one year postdoc at INSERM dedicated to Integrative genomics, he investigated an e-Science approach for Life Sciences during a 3 year postdoc project at INRIA / IRISA Rennes. One of the outcome of this project, called e-Biogenouest, is an innovative Virtual Research Environment (VRE) based on... Read More →



12:20pm

Ktoolu and idFusion - Galaxy Solutions for Plant Immunity and Pathogen Informatics
Slides    doi:10.7490/f1000research.1112714.1

Authors 

  • Christian Schudoma, The Genome Analysis Centre, The Sainsbury Laboratory, Norwich UK
  • Yogesh Gupta, The Sainsbury Laboratory, Norwich, UK
  • Pirasteh Pahlavan, Leibniz-Institut DSMZ, Braunschweig, University of Würzburg, Germany
  • Agathe Jouet, , The Sainsbury Laboratory, Norwich, UK
  • Dan MacLean, The Sainsbury Laboratory, Norwich, UK
  • Ksenia Krasileva, The Genome Analysis Centre, The Sainsbury Laboratory, Norwich UK 

Abstract

Background
The analysis of plant immunity and plant-pathogen interactions are major topics in plant disease research.

Plant immunity is conferred by so-called nucleotide-binding leucine-rich repeat (NLR) proteins. A specific group of these proteins is fused to additional (integrated) domains that can recognise pathogen effector molecules. In a recent study, 41 plant genomes were computationally screened for such NLR-ID proteins.

Analysis of the interactions between plant-pathogens and their host can provide insight into both the pathogen’s effector proteins, i.e. its attack mechanisms, and the plant’s defense mechanisms. These interactions can be investigated by tailored metagenomics approaches.


Results
We present Galaxy tools/pipelines for screening of plant NLR-ID proteins (idFusion) and for complementing metagenomics analysis of plant-pathogen interactions (Ktoolu - Kraken tools and utilities). idFusion is a Galaxy implementation of the NLR-ID screening pipeline described in (Sarris et al, BMC Biology 2016). Ktoolu is a collection of tools and their Galaxy wrappers that allow to dissect sequencing datasets using the taxonomy information assigned by the Kraken metagenomics classifier as well as to visualise the results utilising the Krona tools. 

Speakers
avatar for Christian Schudoma

Christian Schudoma

The Genome Analysis Centre (TGAC)



12:40pm

Arts & Crafts
GCC sure can be overwhelming sometimes! This is a quiet place to do some stress free, science related, arts and crafts.

Moderators
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus Medical Center
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology

12:40pm

Birds-of-a-Feather Flocking

There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology.  GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups.  Birds of a Feather meetups are informal gatherings where participants group together based on common interests.

BoF meetups during this slot are:

BoF meetups are encouraged throughout GCC2016.

If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.

 


Tuesday June 28, 2016 12:40pm - 1:40pm
IMU: Indiana Memorial Union 900 E 7th St, Bloomington, IN

12:40pm

12:40pm

GalaxyAdmins Birds-of-a-feather

Discussion

GalaxyAdmins
 is a group of people that are responsible for administering Galaxy instances.  We meet online every other month and at events like GCC2016, where a lot of us happen to be.  

GCC2016 will be the fourth in-person GalaxyAdmins meetup. Previous GalaxyAdmins BoFs were very well attended and have resulted in several action items, many of which have since been implemented.

This meetup will discuss plans for the coming year, GalaxyAdmins leadership, and whatever else participants want to talk about.


If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.


And
, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Moderators
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University
avatar for Hans-Rudolf Hotz

Hans-Rudolf Hotz

Friedrich Miescher Institute for Biomedical Research

12:40pm

Genome Annotation Birds-of-a-feather

We are interested in a general discussion of Genome Annotation problems and solutions.

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.


And
, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Moderators
avatar for Nathan Dunn

Nathan Dunn

Lead Software Engineer, Lawrence Berkeley National Laboratory
I primarily work on the Apollo project, a web-based genome editor used for real-time collaborative manual curation, AKA Google Docs for genome editing. Apollo is built using JBrowse as our genomic viewer. http://genomearchitect.org https://github.com/GMOD/Apollo We use Grails + GWT + Angular (in addition to the JBrowse stack). I'm a scientific programmer having worked in a variety of domains including biology, psychology, automated speech... Read More →
SL

Suzanna Lewis

Lawrence Berkeley National Laboratory

1:40pm

1:40pm

Session 3
Galaxy Community Update, two accepted talks, and a sponsor talk

Moderators
avatar for Chris Hemmerich

Chris Hemmerich

Bioinformatician, Indiana University
Indiana University


2:15pm

GSuite Tools: integrative genomic analyses across cells and epigenetic factors using Galaxy
Slides    doi:10.7490/f1000research.1112716.1

Authors

Boris Simovski, University of Oslo 
Sveinung Gundersen, University of Oslo
Daniel Vodák, Oslo University Hospital
Abdulrahman Azab, University of Oslo 
Diana Domanska, University of Oslo
Eivind Hovig, University of Oslo
Geir Kjetil Sandve, University of Oslo

Abstract

Genomic investigations increasingly involve multiple genome-scale datasets (genomic tracks) representing diverse cell types and epigenetic factors. This raises the need for software tools that allow efficient management and appropriate statistical analysis of such data collections. The dataset lists in Galaxy represents a major advancement in this direction, allowing efficient management and per-dataset analysis of large numbers of datasets within a web-based system. A natural next step is to consider integrative analyses of collections of datasets. An example is to comparatively assess the co-occurrence of a dataset of disease-associated SNPs against a large number of datasets representing chromatin accessibility in diverse cell types. In addition to managing a large number of datasets (of chromatin accessibility), such an analysis requires efficient means of locating and compiling a collection of relevant datasets, as well as the provision of appropriate statistical measures.

We have developed GSuite Tools, a Galaxy-based analysis framework that offers a broad range of novel tools for managing and performing statistical analysis on collections of datasets (genomic tracks). The tools operate on GSuite files - an alternative to the standard Galaxy dataset lists that provides additional robustness and flexibility as needed for this range of tools. A prototype of GSuite Tools was presented at GCC2015. After a year of further focused work, GSuite Tools now has much broader capabilities, is based on a deeper statistical treatment of the problems, and is ready for practical use by the community. It is publicly available at: http://hyperbrowser.uio.no/gsuite

 

Speakers
avatar for Boris Simovski

Boris Simovski

University of Oslo



2:35pm

The Galaxy workflow for epigenetic profiling of progressing melanoma
Slides

Authors
Katarzyna Murat 1 and Krzysztof Poterlowicz 2

1. Faculty of Mathematics, Physics and Informatics, University of Gdansk
2. Centre for Skin Sciences, University of Bradford

Abstract
Recent studies have found that distinct poor-prognosis tumours lack genetic alterations but are epigenetically heterogeneous, pointing to the important role that multi-domain epigenetic regulation in cancer progression.
Researchers studying epigenetic regulation generated a vast amount of high-throughput sequencing data for processes such as DNA methylation, histone modifications and chromatin remodelers activity and transcriptomic profiling of non-coding DNA.

Although galaxy offers range of standalone tools that allow to investigate next generation sequencing data (i.e. Bismark, MACS, SICER, edgeR ), there is a lack of multi-layers epigenetic workflows characterizing tumor progression regulation.

Here we extend use of these software to investigate epigenetic profiling of the progressive melanoma which recognized early is almost always curable. Otherwise it spreads very quickly to other parts of the body making it one of the most deadliest cancer.

By integrating DNA methylation profiles, ChIP-Seq profiles for H3K27me3, H3K4me3, MITF and BRG1 for normal melanocytes and different stages of melanoma we were able to identify novel epigenetic switches responsible for metastatic progression of this tumor.
Results and experiences using this framework demonstrate the potential for Galaxy to be a bioinformatics solution for multi-omics cancer biomarker discovery tool.

Speakers
avatar for Katarzyna Murat

Katarzyna Murat

University of Gdansk
avatar for Krzysztof Poterlowicz

Krzysztof Poterlowicz

University of Bradford
Following graduation in mathematics and applied statistics at the Wroclaw University in Poland. I continued my education at the University of Bradford studying computational modelling of the yeast cell cycle where I obtained a MPhil degree in Bioinformatics. In 2009 I visited the Biotechnology Research Institute of the National Research Council Canada where my research involved computer simulation of molecular signalling cascades governing the... Read More →



2:55pm

Infinite Galaxy!
Slides

Authors

Brian Finley, Principal Architect for Big Data Solutions at Lenovo

Abstract
The efficacy, benefit, and potential impact of running Galaxy on Spark.  Ever expanding possibilities by leveraging the paradigm of Big Data.

Speakers
avatar for Brian Finley

Brian Finley

Principal Architect for Big Data Solutions, Lenovo
Brian Finley is the Principal Architect for Big Data Solutions at Lenovo. Mr. Finley is an Open Group certified Distinguished IT Specialist and holds a number of other technical certifications, writes articles for industry publications, is the creator of SystemImager (popular Linux mass-deployment software), is an xCAT (cluster management software) developer, and has created and/or contributed to a number of other open source projects. With... Read More →



3:10pm

P01: ELIXIR: a distributed infrastructure for life-science information
→ Poster    doi:10.7490/f1000research.1112469.1

Authors

Frederik Coppens
, VIB 
ELIXIR Galaxy workgroup, ELIXIR

Abstract
ELIXIR (http://www.elixir-europe.org/) - the European life-science Infrastructure for Biological Information - is a European Research Infrastructure, part of the ESFRI (European Strategy Forum on Research Infrastructures) that consolidates Europe’s national centres, services, and core bioinformatics resources into a single, coordinated infrastructure. Its goal is to orchestrate the collection, quality control, share-ability and archiving of large amounts of biological data produced by life science experiments and the resources to compute with this data.

ELIXIR brings together Europe’s major life-science data archives and, for the first time, connects these with national bioinformatics infrastructures throughout ELIXIR’s member states. By coordinating local, national and international resources the ELIXIR infrastructure will meet the data-related needs of Europe’s 500,000 life-scientists, ensuring a seamless service provision that is easily accessible to all.

Galaxy is widely used in the ELIXIR community. Therefore ELIXIR has assessed what the needs of it’s members are and how this can be incorporated in ELIXIR’s technical strategy. Galaxy is mainly used through local instances to enable biologists to perform data intensive massive parallel sequencing analyses. The main recommendations were 1) to build a user community within ELIXIR, 2) integration of Galaxy in the ELIXIR platforms, 3) provision of Galaxy training for users, developers and administrators, 4) contribution to the development of the Galaxy platform.

Several initiatives in the national Nodes and the ELIXIR Hub have been initiated, which aim to implement these recommendations. 

Presenters
avatar for Frederik Coppens

Frederik Coppens

Project Leader, VIB


3:10pm

P03: Using Galaxy with Jetstream
Poster    doi:10.7490/f1000research.1112454.1

Authors

Jeremy Fischer 1, Enis Afgan 2, Carrie Ganote 1, David Y. Hancock 1, Tom Doak 1, and Matthew Vaughn 3 

1. Indiana University Pervasive Technology Institute
2. Johns Hopkins University
3. Texas Advanced Computing Center

Abstract 
Jetstream is a new cloud computing resource funded by the National Science Foundation (NSF). As a fully configurable cloud resource, Jetstream adds significantly to the NSF-funded resources available to the Galaxy user community. Jetstream has a total computing capability of 0.5 petaflops, and supports interactive users. Jetstream hosts persistent Science Gateways (specifically, Galaxy) and Virtual Machines (VMs) useable by individual researchers within the cloud environment. Here, we explain how to access Galaxy on Jetstream and how to get an allocation on Jetstream through the NSF-mandated XSEDE allocation process.

Galaxy users will have the option of preserving their workflow through persistent storage of VMs. Persistent VM storage for use on Jetstream can be stored in the Indiana University persistent digital repository, IUScholarWorks (scholarworks.iu.edu) and obtain a Digital Object Identifier (DOI) that is associated with the stored VM. Overall, we anticipate that Jetstream will improve Galaxy usability by reducing job wait times and increasing job throughput. 
 

Presenters
avatar for Enis Afgan

Enis Afgan

Galaxy Project, Johns Hopkins University
Everything 'Galaxy on the Cloud' related!
avatar for Jeremy Fischer

Jeremy Fischer

Senior Technical Advisor, Indiana University
Indiana University


3:10pm

P05: Advantages and Challenges of Using Galaxy CloudMan within an Integrated Data Analysis and Visualization Platform
Poster    doi:10.7490/f1000research.1112495.1

Authors

Ilya Sytchev, Harvard T.H. Chan School of Public Heath
David Jones, Sheffield Institute for Translational Neuroscience
Shannan Ho Sui, Harvard T.H. Chan School of Public Heath
Fritz Lekschas, Harvard Medical School
Jennifer Marx, Harvard Medical School
Scott Ouellette, Harvard Medical School
Winston Hide, Sheffield Institute for Translational Neuroscience
Peter Park, Harvard Medical School 
Nils Gehlenborg, Harvard Medical School

Abstract

The Stem Cell Commons was developed by the Harvard Stem Cell Institute to create a community for stem cell bioinformatics. This open source environment for sharing and analyzing stem cell data combines genomics data sets with tools for discovery, analysis, visualization, and collaboration. The Commons uses the Refinery Platform, an integrated web-based data analysis and visualization system, to enable reproducible analyses implemented as Galaxy workflows.

We originally deployed Refinery using instances of Galaxy on clusters in two different research computing facilities. However, limited control over scheduling, access, and deployment in these environments prevented us from moving from development to production. To allow for greater flexibility of the system, ensure reliability, optimize cost, scale based on demand, and facilitate collaboration, our recent efforts have focused on making Refinery easy to deploy in a cloud environment backed by Galaxy CloudMan. CloudMan is a cloud manager that orchestrates the provision and management of Galaxy clusters on cloud infrastructures. Here we discuss our experiences of using key CloudMan features, such as automated deployment, cluster sharing, and autoscaling. We also describe some of the problems that we have encountered using CloudMan in novel and perhaps unanticipated ways.

Given our experience with Refinery and CloudMan so far, we believe that Galaxy can be deployed as a cloud-based analysis backend for other systems. We hope that by continued collaboration with the CloudMan and Galaxy developer communities, we can address the challenges that we are still facing.

Presenters


3:10pm

P07: Hardwood Genomics Database (HGD): a web portal and database resource for hardwood tree genomic and genetic research
Poster    doi:10.7490/f1000research.1112718.1

Authors

  • Ming Chen, University of Tennessee at Knoxville, TN;
  • Nathan Henry, University of Tennessee at Knoxville, TN;
  • John Carlson, Pennsylvania State University, University Park, PA, USA;
  • Meg Staton, University of Tennessee at Knoxville, TN; 
Abstract
The Hardwood Genomics Database (HGD, www.hardwoodgenomics.org) serves forest tree scientists by providing online access to hardwood tree genomic and genetic data. HGD currently houses data for economically and phylogenetically important hardwood species including assembled reference genomes, transcriptomes, and genetic mapping information. The results of bioinformatic analysis including functional annotation of genes, ontology assignment, analysis of gene expression patterns across RNASeq libraries and simple sequence repeat (SSR) identification are available to enhance the utility of the information. The web site provides access to online tools for mining and visualization of these data sets, including BLAST for comparing sequences, Jbrowse1 for browsing genomes, Apollo2 for community annotation and SyMAP3 for comparative genomics. However, these tools are limited in scope. To maximize the ability for users of community databases to harness this information and perform their own custom analysis, we are collaborating with other genome databases to build an interface to the Galaxy data analysis platform4. Users will be able to select and transfer data from the HGD to a Galaxy instance or run a pre-designed Galaxy workflow on selected datasets. 

Presenters
avatar for Margaret Staton

Margaret Staton

University of Tennessee Knoxville
My lab works on genome databases, web applications, cyberinfrastructure, and RNASeq data. Our main website is hardwoodgenomics.org.


3:10pm

P09: A Galaxy Interactive Environment for exploring the Neo4J Graph Database
Poster    doi:10.7490/f1000research.1112403.1

Authors

Thoba Lose, Peter van Heusden, Alan Christoffels, South African National Bioinformatics Institute

Abstract 
Storing the entities that describe a genome and its annotation involves modeling and storing thousands of entities that are interrelated in complex ways. Graph databases, a recently emerging form of non-relational (NoSQL) database, are seen as a natural fit to the huge network of relationships between these entities. The recently initiated COMBAT-TB project aims to provide a platform for researchers to analyze and visualize their own M.tuberculosis genome sequencing data, primarily through a web interface (the COMBAT TB Explorer). This integrated platform relies on Neo4J, a highly scalable graph database, for storing and querying annotation of Mycobacterium tuberculosis. To expose the full power of the Neo4J database and its Cypher declarative query language, we implemented a Galaxy Interactive Environment (GIE) to explore a Neo4J database from within Galaxy and demonstrate its utility for data mining the COMBAT TB annotation database.



3:10pm

P11: Apollo: Collaborative Manual Annotation for Genomic Sequencing Projects
Poster    doi:10.7490/f1000research.1112335.1

Authors

  • Nathan Dunn, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
  • Monica Muñoz-Torres, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
  • Colin Diesh, University of Missouri
  • Deepak Unni, University of Missouri
  • Eric Rasche, Department of Biochemistry and Biophysics, Texas A&M University
  • Eric Yao, University of California Berkeley
  • Ian Holmes, University of California Berkeley, 
  • Chris Elsik,  University of Missouri
  • Suzie Lewis,  Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory

Abstract

Manual annotation is a crucial step in the annotation portion of a genome sequencing project. It enables curators to improve automated gene predictions by visually comparing a variety of experimental evidence tracks from different sources to more accurately represent the underlying biology.

Apollo is a web-based genome annotation editor that allows curators to manually revise and edit genomic elements. It provides a reporting structure for annotated genomic elements and an ‘Annotator Panel’ that allows users to quickly browse the genome and its annotations. Users can manually edit the structure of a genomic element as well as add metadata, including references to other databases and functional assignments with specific lookup support for Gene Ontology (GO) terms.

Apollo is currently being used in over one hundred genome annotation projects around the world, ranging from annotation of a single species to lineage-specific efforts supporting annotation for dozens of organisms at a time. Collaborators are able to visualize each others changes in real time (similar to Google Docs), restrict access to annotations depending on the role of users and groups within the community, and share tracks of evidence data with the public. Finally, users are able to export their manual annotations via FASTA, GFF3, the Chado database schema, and web services. Lastly, Apollo is available for integration with Galaxy via Docker, allowing users to run genome analyses sequencing using the Galaxy platform.

Apollo is an Open-Source project. Further details and code are available at http://genomearchitect.org/. 

Presenters
avatar for Nathan Dunn

Nathan Dunn

Lead Software Engineer, Lawrence Berkeley National Laboratory
I primarily work on the Apollo project, a web-based genome editor used for real-time collaborative manual curation, AKA Google Docs for genome editing. Apollo is built using JBrowse as our genomic viewer. http://genomearchitect.org https://github.com/GMOD/Apollo We use Grails + GWT + Angular (in addition to the JBrowse stack). I'm a scientific programmer having worked in a variety of domains including biology, psychology, automated speech... Read More →


3:10pm

P13: Galaxy as a Platform for Genome Annotation and Scalable Big Data Training
Poster    doi:10.7490/f1000research.1112719.1

Authors

Rémi Marenco, George Washington University
Wilson Leung, Washington University in St. Louis
Sarah C.R. Elgin, Washington University in St. Louis
Jeremy Goecks, George Washington University

Abstract
We are developing a customized version of Galaxy called G-OnRamp that will enable biologists to annotate the functional elements of eukaryotic genomes using large genomic datasets, a task that can also serve as an introduction to other “big data” biomedical analyses. Genome annotation—identifying functionally active regions within a genome—requires the use of diverse datasets and tools, including sequence similarity to known genes, gene prediction models, and high-throughput genomic data. To construct this interactive Web-based environment for genome annotation, we are building on two successful efforts, the Genomics Education Partnership (GEP) and Galaxy.

GEP (http://gep.wustl.edu) is a consortium of over 100 colleges/universities that provides classroom undergraduate research experiences in genomics for students at all levels. Students perform primary research on selected regions of Drosophila genomes using genomic databases (e.g., FlyBase) and bioinformatics tools (e.g., BLAST) while learning about gene structure, evolution, programming, and other topics. GEP faculty are now interested in annotating other eukaryotic genomes, reflecting their diverse research interests.

G-OnRamp will extend Galaxy by providing (a) analysis pipelines for functional genomic data (e.g., ChIP-Seq, RNA-Seq); (b) interactive visual analytics to annotate a genome (e.g., create UCSC Assembly Hubs); and (c) capacity for collaborative genome annotation. The GEP will serve as a key use case to validate and refine G-OnRamp, ensuring that it satisfies real educational needs. In this poster and demonstration, we will describe G-OnRamp’s vision and showcase its current features. G-OnRamp is available under the Academic Free License, and the software will be available via https://github.com/goeckslab

Presenters
avatar for Rémi Marenco

Rémi Marenco

Software engineer, George Washington University


3:10pm

P15: MYcrobiota: An open-source, user-friendly Galaxy application for microbiota determination and dynamic reporting from 16S sequences
Poster    doi:10.7490/f1000research.1112720.1

Authors

S.D. Hiltemann, Erasmus University Medical Center
S.A. Boers, Regional Laboratory for Public Health
A. Kriesels, Erasmus University Medical Center
P.J. van der Spek, Erasmus University Medical Center
R.Jansen, Regional Laboratory for Public Health
J.P. Hays, Erasmus University Medical Center
A.P. Stubbs, Erasmus University Medical Center

Abstract 
Microbiota profiling methods are greatly enhancing our insights into the microbial diversity and taxonomy of many different types of environments and ecosystems. These techniques are provided by an extensive array of sophisticated software such as Mothur and QIIME. Whilst many of these applications have graphical user interface (GUI), providing access to these technologies for the research or clinical scientist remains complex.

We have developed a Galaxy workflow for the analysis of metagenomics data using the Mothur suite of tools, incorporating Phinch and Krona for visualisation.
We demonstrate our work using a previously published 16S rRNA gene dataset.

Presenters
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus Medical Center


3:10pm

P17: FROGS: Find Rapidly OTU with Galaxy Solution
→ Poster

Authors

  • Frederic ESCUDIE, INRA Toulouse
  • Lucas AUER, INRA Toulouse
  • Maria BERNARD, INRA Jouy-en-josas
  • Laurent CAUQUIL, INRA Toulouse
  • Katia VIDAL, INRA Toulouse
  • Sarah MAMAN, INRA Toulouse
  • Mahendra MARIADASSOU, INRA Jouy-en-josas
  • Guillermina HERNANDEZ-RAQUET, INRA Toulouse
  • Geraldine PASCAL, INRA Toulouse
Abstracts
High-throughput sequencing of 16S/18S/23S RNA amplicons has opened new horizons in the study of microbe communities. With the sequencing at great depth the current processing pipelines struggle to run rapidly and the most effective solutions are often designed for specialists. These tools are designed to give both the abundance table of operational taxonomic units (OTUs) and their taxonomic affiliation. In this context we developed the pipeline FROGS: « Find Rapidly OTU with Galaxy Solution ». Developed for biologists on the Galaxy platform.

A preprocessing tool merges paired sequences into contigs with flash, cleans the data with cutadapt, deletes the chimeras with VSEARCH combined with a cross-validation method and dereplicates sequences with a home-made python script. The clusterisation tool runs with SWARM that uses a local clustering threshold, not a global clustering threshold like other software do. The affiliation tool returns taxonomic affiliation for each OTU using both RDPClassifier and NCBIBlast+ on different databases (Silva, Greengenes). And finally, the post processing tool allows users to process this table with the user-specified filters and provides statistical results and numerous graphical illustrations of these data.

FROGS has been developed to be very fast even on large amounts of 454/HiSeq/MiSeq data in using cutting-edge tools and an optimized design, also it is portable on all Galaxy platforms. FROGS was tested on numerous simulated datasets. The tool has been extremely rapid, robust and highly sensitive for the OTU detection with very few false positives compared to other pipelines widely used by the community.

Presenters
avatar for Yvan Le Bras

Yvan Le Bras

Research engineer, INRIA / EnginesOn
Initially a marine Biologist, focusing on Populations structure, Yvan received a PhD on quantitative genetics and genomics in Rennes University. After a one year postdoc at INSERM dedicated to Integrative genomics, he investigated an e-Science approach for Life Sciences during a 3 year postdoc project at INRIA / IRISA Rennes. One of the outcome of this project, called e-Biogenouest, is an innovative Virtual Research Environment (VRE) based on... Read More →

3:10pm

P19: Interaction Analysis of Vancomycin-Resistome from RNA-seq Data using Galaxy and Functional Examination of sRNA241 in S. aureus
Poster    doi:10.7490/f1000research.1112721.1

Authors

  • Devika Subramanian, Data mining and Text mining laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamil Nadu, India
  • Jeyakumar Natarajan, Data mining and Text mining laboratory, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamil Nadu, India
Abstract 
The widespread emergence of antibiotic-resistant Staphylococcus aureus is a major bottleneck in the development of novel treatments as the mechanisms triggering this phenomenon is largely unknown. The role of small-RNAs in regulating bacterial response to environmental stresses including antibiotic exposures is now recognized. However, the functions of a majority of them are still unknown. Here, a collection of RNA-seq expression profiles were analyzed to reconstruct a model of vancomycin-resistome interactions which was then used to predict the functions of sRNA241, a small-noncoding RNA which was consistently downregulated upon antibiotic exposures. The state-of-the-art tools Bowtie, Stringtie (ran from galaxy webserver) and Ballgown were used to align, assemble and identify the differentially expressed mRNAs that could be responsible for the development of resistance mechanisms. Based on this, an mRNA repertoire encompassing the major resistome components were identified and used to reconstruct a vancomycin-resistome network with 308 nodes and 2477 interactions. Clustering and enrichment analyses of the network indicate that a variety of gene clusters representing various metabolic pathways and defense mechanisms mediate the resistance to vancomycin. Subsequently, the resistome network was used to examine the functions of sRNA241. Predicted targets of the sRNA were refined by opposite expression pairing and a functional subnetwork of the resistome consisting of the specific sRNA-mRNA interactions were identified. Enrichment analysis of the subnetwork indicates the regulation of different metabolic pathways including quinone/menaquinone biosynthetic pathways by sRNA241. Thus the resistome network model is a good platform to expand the knowledge on cellular interactions behind antibiotic resistance.

Presenters
avatar for Devika Subramanian

Devika Subramanian

Research Scholar, Bharathiar University


3:10pm

P21: The TRUElncRNA workflow: A Galaxy based workflow for identification of novel and known high confidence long non-coding RNAs.
Poster    doi:10.7490/f1000research.1112722.1

Authors

Mohammad Heydarian (1,2), Jevon Cutler (1,2), Mike Sauria (3), Barbara-Sollner-Webb(1), James Taylor (3), and Karen Reddy (1,2)
1. Department of Biological Chemistry, Johns Hopkins University, Baltimore, MD, USA
2. Center for Epigenetics, Johns Hopkins University, Baltimore, MD, USA
3. Department of Biology, Johns Hopkins University, Baltimore, MD, USA

Abstract 
Long non-coding RNAs (lncRNAs) are a class of RNA that lack protein coding potential and exhibit features similar to protein coding RNAs, in that they are transcribed by RNA polymerase II, are 5' capped, and spliced in most cases. LncRNAs are expressed at levels lower than protein coding RNAs and exhibit tissue/cell type restricted expression. To identify lncRNAs in early B cell development in mouse, we performed RNA-sequencing on two developmentally arrested models of early hematopoiesis, a multi-potent progenitor (MPP) with the capacity to differentiate towards monocyte/lymphocyte lineages and a lineage committed pro-B (pro-B) cell system. Using the Tuxedo RNA-seq analysis suite with a de novo transcriptome reconstruction approach, we identified ~ 45,000 transcripts deemed to be long non-coding RNAs. To prioritize high confidence lncRNAs, we developed a Galaxy based workflow for discovery of novel and known high confidence lncRNAs that we call the 'TRUElncRNA workflow'. This workflow requires standard output file formats from the Tuxedo suite, as well as widely available reference data from the UCSC table browser, and returns high confidence novel and known lncRNAs. Using the TRUElncRNA workflow, we identified ~ 200 novel and ~2,300 known high confidence lncRNAs expressed in early B cell development. These high confidence lncRNAs demonstrate low coding potential relative to protein coding RNAs by PhyloCSF scoring. The identified high confidence lncRNAs exhibit chromatin profiles similar to annotated protein coding genes and show tissue restricted expression patterns across a comprehensive array of mouse tissues. Lastly, these lncRNAs also tend to reside in topological domains with genes of relevance to early hematopoiesis and in some cases interact with promoters of relevant genes across hundreds of kilobases. 

Presenters


3:10pm

P23: A Multi-omics Visualization Platform (MVP) Plug-in for Galaxy-based Applications
Poster    doi:10.7490/f1000research.1112723.1

Authors

Thomas McGowan, James Johnson, Pratik Jagtap, Getiria Onsongo, Candace Guerrero, Timothy Griffin, University of Minnesota, Minneapolis MN

Abstract 
The Galaxy-P project has extended the popular Galaxy bioinformatics framework deploying tools for MS-based proteomics data analysis and integrative "multi-omic" applications. The MVP visualization tool extends Galaxy-P's advantages into the visualization of large, complex data sets. This allows researchers to quickly inspect and verify the quality of the results as well offer an overview with visualization and a deeper understanding of underlying spectral data. This can be especially valuable when results include inputs from possibly diverse domains. The core of the MVP is based on standard JavaScript and js libraries. In addition it receives data from a documented Galaxy sqlite data provider. The main visualization is integrated into Galaxy via the Galaxy visualizations registry. Once registered, any dataset of type mz.sqlite will automatically be viewable from the MVP tool. The MVP tool uses 1) the DataTables library to manage the presentation, sorting and filtering of data 2) the Lorikeet MS/MS viewer to visualize spectra, and 3) the IGV.js package to interactively present features of interest. This enables a researcher to see, in one HTML window, both genomic and proteomic data as they relate to one another. With the incorporation of Integrated Genomics Viewer (IGV) and Lorikeet, the MVP platform is already merging proteomic and genomic results into a single, accessible output. A user can, with relatively few keystrokes, filter and order large datasets down to a manageable subset. Due to the tools use of server-side caching, large data sets are handled as quickly as small datasets.

Presenters
avatar for James (JJ) Johnson

James (JJ) Johnson

Minnesota Supercomputing Institute, University of Minnesota


3:10pm

P25: Recent developments and new directions for the Galaxy-P project
Poster    doi:10.7490/f1000research.1112724.1

Authors

Pratik Jagtap, University of Minnesota and Center for Mass Spectrometry and Proteomics
James Johnson, University of Minnesota Supercomputing Institute
Thomas McGowan, University of Minnesota Supercomputing Institute
Innocent Onsongo, University of Minnesota Supercomputing Institute
Benjamin Lynch, University of Minnesota Supercomputing Institute
Candace Guerrero, University of Minnesota
Kevin Murray, University of Minnesota
Lloyd M Smith, University of Wisconsin-Madison
Michael R Shortreed, University of Wisconsin-Madison
Anthony J Cesnik, University of Wisconsin-Madison
Lennart Martens, Ghent University and VIB
Adrian Hegeman, University of Minnesota
Timothy Griffin, University of Minnesota and Center for Mass Spectrometry and Proteomics

Abstract 
The Galaxy-P project has extended the popular Galaxy bioinformatics framework into new realms, deploying tools for MS-based proteomics data analysis and integrative “multi-omic” applications. Galaxy-P leverages the many advantages offered by the Galaxy operating environment for informatics and data analysis, including flexibility, transparency and accessibility for bench scientists.

In the past, we have demonstrated Galaxy-P’s effectiveness not only standard proteomic studies, but also multi-omic applications such as proteogenomic and metaproteomics. Here, we describe more recent developments and emerging applications using Galaxy-P. These include: 1) expansion of tools for more comprehensive characterization of protein modifications; 2) new visualization functionalities for results interpretation; 3) Integration of informatics tools for MS-based metabolomics; and 4) New avenues for dissemination of tools and workflows.


Presenters
avatar for Timothy Griffin

Timothy Griffin

Center for Mass Spectrometry and Proteomics, University of Minnesota


3:10pm

P27: GeneSeqToFamily: the Ensembl GeneTree pipeline as a Galaxy workflow
Poster    doi:10.7490/f1000research.1112472.1

Authors

Anil S. Thanki, Nicola Soranzo, Robert P. Davey, The Genome Analysis Centre, Norwich, UK,
 
Abstracts
The Ensembl GeneTrees pipeline [1] infers the evolutionary history of gene families, represented as gene trees. These are analysed alongside the corresponding species tree to detect duplication and speciation events. This pipeline is a large and complex suite of interconnected tools and scripts with many dependencies and is therefore quite difficult to port and replicate on a different platform.

We have simplified this process by converting the command line GeneTrees pipeline into an open-source Galaxy workflow, called GeneSeqToFamily. This workflow consists of more than 20 steps and uses existing tools already available in the Galaxy Toolshed, as well as new tools that we developed, such as wrappers for TreeBest and hcluster_sg, alongside data format converters and output parsers. We have also developed tools for retrieving sequences, features and gene trees from Ensembl using its REST API, which can be used as inputs for the workflow.

The outputs of the GeneSeqToFamily workflow are a collection of discovered gene families from genes of interest, a gene tree and multiple sequence alignments for each gene family. These are then merged with gene feature information for each family to generate a dataset which can be visualised inside Galaxy with Aequatus.js, a new JavaScript library derived from Aequatus.

1. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19(2):327–335.
 

Presenters
avatar for Nicola Soranzo

Nicola Soranzo

The Genome Analysis Centre (TGAC)


3:10pm

P29: Trinity CTAT: A Community Resource for De Novo andReference-based RNA-Seq Analysis
Poster    doi:10.7490/f1000research.1112725.1

Authors

Asma Bankapur 1, Timothy Ticke 1, Carrie Ganote 2, Ben Fulton 2, Tom Doak 2, Brian Haas 1, Aviv Regev 1
  1. Broad Institute
  2. Indiana University 

Abstract

Cancer transcriptome sequencing (RNA-Seq) has highlighted the extent of gene variation in cancer leading to unique cancer transcriptomes. We provide best known practices in cancer transcript analysis leveraging de novo transcript reconstruction in form of simple, user-friendly Galaxy tools accessible to any cancer researcher. Currently available RNA-Seq analysis modules include: an ensemble of best-in-class fusion discovery tools, a mutation calling pipeline with cancer specific annotation, Trinity de novo assembly and downstream analysis highlighted in Nature Protocols, lncRNA detection. Visualization for fusion and mutation tools is aided by IGV.js and for lncRNA detection we use web browser generated by Slncky for ortholog search within Galaxy framework.This is made available through our public Galaxy instance hosted by National Center for Genome Analysis Support at Indiana University. In addition, we would also like to highlight our in house implementation at KCO of RNASeq and single cell DropSeq pipelines in multisample mode which runs via an internal job runner.

Presenters
AB

Asma Bankapur

Assoc Computational Biologist, Broad Institute


3:10pm

3:10pm

D01: Demonstrating Galaxy with Jetstream
Authors
Jeremy Fischer 1, Enis Afgan 2, Carrie Ganote 1, David Y. Hancock 1, Tom Doak 1, and Matthew Vaughn 3 

1. Indiana University Pervasive Technology Institute
2. Johns Hopkins University 
3. Texas Advanced Computing Center

Abstract
Jetstream is a new cloud computing resource funded by the National Science Foundation (NSF). As a fully configurable cloud resource, Jetstream adds significantly to the NSF-funded resources available to the Galaxy user community. Jetstream has a total computing capability of 0.5 petaflops, and supports interactive users. It also allows use of computing power of Jetstream during “off peak hours” for CPU-intensive data processing. Jetstream hosts persistent Science Gateways (specifically, Galaxy) and Virtual Machines (VMs) useable by individual researchers within a cloud environment.

Here, we demonstrate how to access Galaxy on Jetstream from the perspectives of an end-user and a Galaxy admin. For the end-user, the most important features for Galaxy on Jetstream are easy access to all required tools, the ability to run them without further configuration, and the ability to use histories or workflows from other Galaxies so they do not lose previous work. All of this should be easily accomplished without resorting to using the command line.

For the Galaxy admin, the demo will go into more detail about how to configure Galaxy. We demonstrate: 1) how easy it is to instantiate and logon to a Galaxy VM and get started using Galaxy, 2) how to configure a standalone Galaxy instance by installing new tools via the toolshed, and 3) how the image may be customized and saved for future use on Jetstream.

Presenters
avatar for Enis Afgan

Enis Afgan

Galaxy Project, Johns Hopkins University
Everything 'Galaxy on the Cloud' related!
avatar for Jeremy Fischer

Jeremy Fischer

Senior Technical Advisor, Indiana University
Indiana University

3:10pm

D03: Advantages and Challenges of Using Galaxy CloudMan within an Integrated Data Analysis and Visualization Platform
Authors
Ilya Sytchev, Harvard T.H. Chan School of Public Heath
David Jones, Sheffield Institute for Translational Neuroscience
Shannan Ho Sui, Harvard T.H. Chan School of Public Heath
Fritz Lekschas, Harvard Medical School
Jennifer Marx, Harvard Medical School
Scott Ouellette, Harvard Medical School
Winston Hide, Sheffield Institute for Translational Neuroscience
Peter Park, Harvard Medical School 
Nils Gehlenborg, Harvard Medical School

Abstract

The Stem Cell Commons was developed by the Harvard Stem Cell Institute to create a community for stem cell bioinformatics. This open source environment for sharing and analyzing stem cell data combines genomics data sets with tools for discovery, analysis, visualization, and collaboration. The Commons uses the Refinery Platform, an integrated web-based data analysis and visualization system, to enable reproducible analyses implemented as Galaxy workflows.

We originally deployed Refinery using instances of Galaxy on clusters in two different research computing facilities. However, limited control over scheduling, access, and deployment in these environments prevented us from moving from development to production. To allow for greater flexibility of the system, ensure reliability, optimize cost, scale based on demand, and facilitate collaboration, our recent efforts have focused on making Refinery easy to deploy in a cloud environment backed by Galaxy CloudMan. CloudMan is a cloud manager that orchestrates the provision and management of Galaxy clusters on cloud infrastructures. Here we discuss our experiences of using key CloudMan features, such as automated deployment, cluster sharing, and autoscaling. We also describe some of the problems that we have encountered using CloudMan in novel and perhaps unanticipated ways.

Given our experience with Refinery and CloudMan so far, we believe that Galaxy can be deployed as a cloud-based analysis backend for other systems. We hope that by continued collaboration with the CloudMan and Galaxy developer communities, we can address the challenges that we are still facing.

Presenters

3:10pm

D05: Apollo: Manual Annotation in Galaxy
Authors
  • Nathan Dunn, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory 
  • Monica Muñoz-Torres, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
  • Colin Diesh, University of Missouri 
  • Deepak Unni, University of Missouri
  • Eric Rasche, Department of Biochemistry and Biophysics, Texas A&M University
  • Eric Yao, University of California Berkeley 
  • Ian Holmes, University of California Berkeley 
  • Chris Elsik,  University of Missouri
  • Suzie Lewis, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
Abstract
Manual annotation is a crucial step in the annotation process of a genome sequencing project. It enables curators to improve automated gene predictions by visually comparing a variety of experimental evidence tracks from different sources to more accurately represent the underlying biology.

Apollo is a web-based genome annotation editor that allows curators to manually revise and edit genomic elements. It provides a reporting structure for annotated genomic elements and an ‘Annotator Panel’ that allows users to quickly browse the genome and its annotations. Users can manually edit the structure of a genomic element as well as add metadata, including references to other databases and functional assignments with specific lookup support for Gene Ontology (GO) terms.

Apollo is currently being used in over one hundred genome annotation projects around the world, ranging from annotation of a single species to lineage-specific efforts supporting annotation for dozens of organisms at a time. Collaborators are able to visualize each others changes in real time (similar to Google Docs), restrict access to annotations depending on the role of users and groups within the community, and share tracks of evidence data with the public. Finally, users are able to export their manual annotations via FASTA, GFF3, the Chado database schema, and web services. Lastly, Apollo is available for integration with Galaxy via Docker, allowing users to run genome analyses sequencing using the Galaxy platform.

Apollo is an Open-Source project. Further details and code are available at http://genomearchitect.org/.

Special thanks to Eric Rasche for his help.

Presenters
avatar for Nathan Dunn

Nathan Dunn

Lead Software Engineer, Lawrence Berkeley National Laboratory
I primarily work on the Apollo project, a web-based genome editor used for real-time collaborative manual curation, AKA Google Docs for genome editing. Apollo is built using JBrowse as our genomic viewer. http://genomearchitect.org https://github.com/GMOD/Apollo We use Grails + GWT + Angular (in addition to the JBrowse stack). I'm a scientific programmer having worked in a variety of domains including biology, psychology, automated speech... Read More →

3:10pm

D07: Galaxy as a Platform for Genome Annotation and Scalable Big Data Training
Authors
  • Rémi Marenco, George Washington University 
  • Wilson Leung, Washington University in St. Louis 
  • Sarah C.R. Elgin, Washington University in St. Louis 
  • Jeremy Goecks, George Washington University
Abstract
We are developing a customized version of Galaxy called G-OnRamp that will enable biologists to annotate the functional elements of eukaryotic genomes using large genomic datasets, a task that can also serve as an introduction to other “big data” biomedical analyses. Genome annotation—identifying functionally active regions within a genome—requires the use of diverse datasets and tools, including sequence similarity to known genes, gene prediction models, and high-throughput genomic data. To construct this interactive Web-based environment for genome annotation, we are building on two successful efforts, the Genomics Education Partnership (GEP) and Galaxy. GEP (http://gep.wustl.edu) is a consortium of over 100 colleges/universities that provides classroom undergraduate research experiences in genomics for students at all levels. Students perform primary research on selected regions of Drosophila genomes using genomic databases (e.g., FlyBase) and bioinformatics tools (e.g., BLAST) while learning about gene structure, evolution, programming, and other topics. GEP faculty are now interested in annotating other eukaryotic genomes, reflecting their diverse research interests. G-OnRamp will extend Galaxy by providing (a) analysis pipelines for functional genomic data (e.g., ChIP-Seq, RNA-Seq); (b) interactive visual analytics to annotate a genome (e.g., create UCSC Assembly Hubs); and (c) capacity for collaborative genome annotation. The GEP will serve as a key use case to validate and refine G-OnRamp, ensuring that it satisfies real educational needs. In this poster and demonstration, we will describe G-OnRamp’s vision and showcase its current features. G-OnRamp is available under the Academic Free License, and the software will be available via https://github.com/goeckslab
 

Presenters
avatar for Rémi Marenco

Rémi Marenco

Software engineer, George Washington University

3:10pm

D09: CosmicNotes - Write your daily lab notebook entries within Galaxy.
Authors
Fabrice Hess, Wolfgang Maier, Ralf Baumeister, University of Freiburg

Abstract
CosmicNotes is an electronic lab notebook plugin for Galaxy. It is integrated via Galaxy's visualizations framework and provides two new visualizations for every dataset that enable the creation of richly formatted lab book entries through a web editor. One visualization supports the annotation of individual datasets beyond what is possible with Galaxy's built in annotation feature. The second lets users maintain lab book pages for whole histories with the option to cross-link to dataset annotations. CosmicNotes lab book entries are version controlled on a daily basis (through using Git in the background). In combination with an integrated diff viewer, this allows for fully editable lab book entries until final submission of a project, while supporting full tracking of all changes. An overview page lists lab books associated with published histories on the server and offers the option to search by history names, authors and/or CosmicNotes projects, which can be used to group lab books and histories. Work in progress includes additional features such as export/import functionality and full-text search support. CosmicNotes is free and open source software available under the GLPv3 license.

Presenters

3:10pm

3:10pm

4:25pm

The LAPPS Grid and Galaxy
Slides    doi:10.7490/f1000research.1112586.1

Authors

Nancy Ide, Keith Suderman, James Pustejovsky, Marc Verhagen, Eric Nyberg, Chris Cieri 

Abstract
The NSF/SI2-funded Language Applications (LAPPS) Grid project (http://www.lappsgrid.org) is a collaborative effort among Brandeis University, Vassar College, Carnegie-Mellon University (CMU), and the Linguistic Data Consortium (LDC) at the University of Pennsylvania, which has developed an open, web service-based infrastructure through which massive and distributed language resources can be accessed, and tailored language services can be composed, evaluated, disseminated and consumed by researchers, developers, and students.

We recently adopted Galaxy as the primary workflow management system for the LAPPS Grid. We have worked with the Galaxy development team to adapt the system to our domain and continue this collaboration to enhance the capabilities we require and contribute to the expansion of Galaxy to domains outside the life sciences.

We have contributed a “Galaxy Flavor" including all LAPPS Grid services and resources, and have developed or are developing the following capabilities for use in Galaxy : (1) exploitation of our web service metadata to automatically detect input/output requirements and invoke converters where necessary; (2) incorporation of authentication procedures for protected data using OAuth; and (3) addition of a visualization plugin for linguistic analyses.

An additional outcome of the LAPPS/Galaxy collaboration is that it provides researchers in the life sciences with access to a wide array of NLP tools. So, for example, biologists will be able to take advantage of bio-oriented NLP web services to mine bio-entities and relations from textual sources such as PubMed, and via capabilities already present in Galaxy, integrate them into existing bio-data resources and analysis tools.
 

Speakers
avatar for Keith Suderman

Keith Suderman

Research Assistant, Vassar College



4:25pm

Session 4
Accdepted and Lightning Talks.  The call for lightning taks will go out just before GCC2016 events start.

Moderators
SL

Suzanna Lewis

Lawrence Berkeley National Laboratory


4:45pm

Lightning Talks
The call for Lightning Talks will go out shortly before GCC2016 events begin.

4:48pm

SeqResults - Simple comparisons of results across libraries
Slides

Authors

  • Brad Langhorst, New England Biolabs
Abstract
We have developed SeqResults to enable simple comparision of libraries across experiments. The Galaxy-integrated component captures metadata and results in a relational database. Results are available via a simple web site and Tableau visualizations. SeqResults has recently been extended with RNA-seq features. It aggregates simple metrics like fractions of reads on exons, introns and other genomic regions, average 5'-3' coverage and alignment efficiency. However, summary metrics are only part of the story.  Accurate representation of transcript levels is important to any RNA-seq experiment. We present a simple interface to compare transcript levels as well as 5'-3' coverage profiles of individual transcripts across experiments. SeqResults now contains millions of individual results from 6841 libraries produced during development of NEBNext library preparation reagents.

Presenters


4:55pm

Common Workflow Language v1.0 & How It Will Affect You
Slides    doi:10.7490/f1000research.1112726.1

Author

Michael R. Crusoe, Common Workflow Language Project

Abstract
Version 1.0 of the CWL standards are coming soon. This talk will review what has changed in the last year and how the CWL benefits the Galaxy community. Talk will include a side-by-side demonstration of a popular Galaxy workflow and its CWL incarnation.

Presenters
avatar for Michael Crusoe

Michael Crusoe

Community Engineer & Co-founder, Common Workflow Language
Workflows, community standards, standardization process, open use research software sustainability, Debian, Debian Med, Debian packaging.


5:02pm

A resource for metabolomics and transcriptomics analysis
Slides    doi:10.7490/f1000research.1112727.1

Authors

Manhoi Hur, Iowa State University
Jason R. Miller, J Craig Venter Institute
Christopher D. Town, J Craig Venter Institute
Erik Ferlanti, J Craig Venter Institute
Irina Belyaeva, J Craig Venter Institute
Eve Syrkin Wurtele, Iowa State University

Abstract

PMR (Plant/Eukaryotic and Microbial Systems Resource) and its database are a community resource for deposition and analysis of metabolomics data and related transcriptomics data. PMR currently houses terabytes of data and metadata from over 25 species of eukaryotes, and provides a unique resource for computational modeling and hypothesis development. ​ PMR’s web APIs ​enables​ PMR​ ​data and analytic functions ​to integrate with other community resources. ​ In this talk, we introduce​ the​ PMR​ database and illustrate its analytic tools.  We present ​​a proof-of-concept for the utility of the API as a research science app using Araport to provide Arabidopsis metabolomics data​ and its functionality ​to diverse users.

Presenters


5:09pm

Science Gateways Community Institute
Slides    doi:10.7490/f1000research.1112593.1

Authors

Maytal Dahan, University of Texas at Austin
Sandra Gesing, University of Notre Dame
Linda B. Hayden, Elizabeth City State University
Katherine Lawrence, University of Michigan
Marlon E. Pierce, Indiana University
Nancy Wilkins-Diehr, The University of California, San Diego
Michael Zentner, Purdue University

Abstract 
Science gateways, also known as web portals, virtual research environments, virtual laboratories, are a fundamental part of today’s research landscape. But they can be difficult to develop in a sustainable fashion. This talk will provide an overview of the newly funded NSF Science Gateways Community Institute, which aims to address these challenges by offering services to and building community among the research communities developing gateways. The institute is comprised of five areas to support gateways throughout their lifecycle:
  •  Incubator will provide shared expertise in business and sustainability planning, cybersecurity, user interface design, and software engineering practices.
  • Extended Developer Support will provide expert developers for up to one year to projects that request assistance and demonstrate the potential to achieve the most significant impacts on their research communities.
  • Scientific Software Collaborative will offer a component-based, open-source, extensible framework for gateway design, integration, and services, including gateway hosting and capabilities for external developers to integrate their software into Institute offerings.
  • Community Engagement and Exchange will provide a forum for communication and shared experiences among gateway developers, user communities, within NSF, across federal agencies, and internationally. 
  • Workforce Development will increase the pipeline of gateway developers with training programs, including special emphasis on recruiting underrepresented minorities, and by helping universities form gateway support groups.
We envision close collaborations with gateway providers such as the Galaxy developer group to provide best practices for developers and use cases of real-world gateways to improve the experience and efficiency of developers and user communities.

Presenters
avatar for Nancy Wilkins-Diehr

Nancy Wilkins-Diehr

Associate Director, San Diego Supercomputer Center
Science gateways and running


5:16pm

Galaxy at the Pittsburgh Supercomputing Center
Slides    doi:10.7490/f1000research.1112729.1

Authors

Alexander J. Ropelewski, Pittsburgh Supercomputing Center
Philip D. Blood, Pittsburgh Supercomputing Center
Robert Light, Pittsburgh Supercomputing Center

Abstract
The Pittsburgh Supercomputing Center's (PSC) new computational system Bridges, funded by the National Science Foundation (NSF), is available to U.S. academic researchers through NSF's XSEDE program. Bridges is a unique system that consists of a variety of specialized nodes including: compute nodes, GPU nodes, database nodes, webserver nodes and data transfer nodes. A unique feature of Bridges is that the compute nodes are tiered in terms of memory, containing either 128GB, 3TB, or 12TB of hardware-supported shared memory, which makes the system ideal for Galaxy workflows involving Next Generation Sequencing data.

In this talk we will discuss the history of Galaxy at the PSC and describe various Galaxy usage scenarios for Bridges. These scenarios include (1) a shared galaxy instance for users with XSEDE allocations, (2) private "virtualized" instances of Galaxy, and (3) back-end computational support for remote Galaxy instances. We will also discuss the system that we developed to authenticate and charge usage against specific user-selected projects. 



5:23pm

65 millions of observers
Slides    doi:10.7490/f1000research.1112730.1

Collecting and analysing information from increasingly diverse origins is needed to understand complex systems life scientists are studying. This means in particular to mobilize a large number of human and technical resources for acquisition and analysis of data. Regarding human resources, it seems appropriate to involve citizens in research projects. In the meantime, it's clear that the relationship between science and citizens are degraded and for example, it is very difficult for a citizen to have access to the results of research projects and even more to participate to them. Citizen science approaches can be a good way to face these issues, but until now, a majority of citizen science projects are considering citizens only for data production. The "65 millions d'observateurs" project is an interesting French initiative who wants to test involving citizens to others part of the research lifecycle. Can this be the beginning of a Galaxy-E, for Ecology?


Presenters
avatar for Yvan Le Bras

Yvan Le Bras

Research engineer, INRIA / EnginesOn
Initially a marine Biologist, focusing on Populations structure, Yvan received a PhD on quantitative genetics and genomics in Rennes University. After a one year postdoc at INSERM dedicated to Integrative genomics, he investigated an e-Science approach for Life Sciences during a 3 year postdoc project at INRIA / IRISA Rennes. One of the outcome of this project, called e-Biogenouest, is an innovative Virtual Research Environment (VRE) based on... Read More →


5:30pm

Dinner & Socializing (on your own)

You are on your own for dinner this evening (Sunday).  See the bottom of the conference location page for links to nearby options.  Or, if you just want to wander, see the online map for restaurant-enriched neighborhoods.  Fourth street from Indiana Avenue to Walnut St. and Fifth Street (Kirkwood Avenue) from Indiana Avenue to Rogers St. both have an array of amazing options.  The square downtown is a great find as well.

Find someone you don't know, share a meal, and learn what others are up to.

And you can certainly socialize past midnight, but in the interests of not consuming all the coffee in Bloomington on Wednesday morning, conference organizers urge you to consider postponing any post-midnight plans until Wednesday night.

Tuesday June 28, 2016 5:30pm - Wednesday June 29, 2016 12:00am
TBA

7:00pm

Birds-of-a-Feather Flocking

There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology.  GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups.  Birds of a Feather meetups are informal gatherings where participants group together based on common interests.

BoF meetups are encouraged throughout GCC2016.  This session will likely be split into several distinct blocks, enabling participants to attend more BoFs. 

If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.



Tuesday June 28, 2016 7:00pm - 10:30pm
IMU: Indiana Memorial Union 900 E 7th St, Bloomington, IN
 
Wednesday, June 29
 

8:00am

8:00am

9:00am

9:00am

Session 5
Session 5 features four accepted talks from the Galaxy Community.

Moderators
SM

Scott Michaels

Indiana University


9:10am

Enhancements to Galaxy for delivering on NIH Commons
Slides    doi:10.7490/f1000research.1112588.1

Author

Ravi K Madduri, University of Chicago, 

Abstract
The Big Data for Discovery Science (BDDS) Center is one on the NIH BD2K centers. In BDDS, we are building tools to move, share, analyze, discover and publish big biomedical data. These tools constitute the BDDS platform. We are leveraging the platform to enable data-driven discovery across our center and are also working with other BD2K centers, both directly and through the Commons initiative, to build standard interfaces for various data management activities. Galaxy is an integral part of our platform and we are enhancing Galaxy to support working with Digital object identifiers (DoIs), analyze data at scale using identified docker containers, publish results in to Globus Publication services thus providing an end-to-end framework for reproducible research in support of the NIH Commons vision 

Speakers
avatar for Ravi K. Madduri

Ravi K. Madduri

Computation Institute, University of Chicago, and Argonne National Laboratory



9:30am

Moving data from the warehouse to the workbench: a bridge to Galaxy from the Tripal community genome database software platform
Slides    doi: 10.7490/f1000research.1112734.1

Authors

Margaret Staton1, Ming Chen1, Nathan Henry1, Emily Grau2, Connor Wytko3, Brian Soto3, Sook Jung3, Kuangching Wang4, Nick Watts5, Chun-huai Cheng3, Lacey A. Sanderson6, Jill Wegrzyn2, Doreen Main3, F. Alex Feltus7, Stephen P. Ficklin3
  1. University of Tennessee Institute of Agriculture Department of Entomology and Plant Pathology, Knoxville, TN 37996, USA 
  2. University of Connecticut Department of Ecology and Evolutionary Biology, Storrs, CT 06269 USA 
  3. Washington State University Department of Horticulture, Pullman, WA 99164 USA 
  4. Clemson University Department of Electrical & Computer Engineering, Clemson, SC 29634 USA 
  5. Clemson University, Clemson Computing and Information Technology, Anderson, SC 29625 USA 
  6. University of Saskatchewan, Department of Plant Sciences, Saskatoon, Saskatchewan, SK S7N Canada 
  7. Clemson University Department of Genetics & Biochemistry, Clemson, SC 29634 USA

Abstract

Online community genome databases offer curated and mission-specific data and information to scientists with shared basic and applied research goals. In an effort to share a common code base, standardize storage formats, and simplify site construction, a coalition of genome databases have developed the software Tripal. Tripal is an open-source platform that bridges Drupal, a popular content management system (CMS), and Chado, a standardized relational database for storage of biological data. There is a need for users of community databases to not only discover, visualize and download genomic information but to directly port it to analysis workflow software such as the Galaxy platform. Through development of the new Tripal Galaxy module, site visitors will be able to select custom datasets from within and across Tripal databases and import those directly to a Galaxy instance from within a Tripal-based site. Additionally, a set of pre-designed workflows for common analyses needed by users of community databases will be made publicly available, including functional annotation of gene sequences, genomic variant discovery and genotype/phenotype association. Current efforts are focused on enabling authenticated users to move data from within a Tripal community database to the Tripal community Galaxy instance or a public Galaxy instance, creation of PHP bindings for the Galaxy API, and establishment of the most commonly needed analysis workflows for database users. 

Speakers
avatar for Margaret Staton

Margaret Staton

University of Tennessee Knoxville
My lab works on genome databases, web applications, cyberinfrastructure, and RNASeq data. Our main website is hardwoodgenomics.org.



9:50am

Apollo: Collaborative Manual Annotation for Genomic Sequencing Projects
Slides    doi:10.7490/f1000research.1112336.1

Authors

  • Nathan Dunn, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
  • Monica Muñoz-Torres, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
  • Colin Diesh, University of Missouri
  • Deepak Unni, University of Missouri
  • Eric Rasche, Department of Biochemistry and Biophysics, Texas A&M University
  • Eric Yao, University of California Berkeley
  • Ian Holmes, University of California Berkeley
  • Chris Elsik,  University of Missouri
  • Suzie Lewis,  Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory 

Abstract

Manual annotation is a crucial step in the annotation portion of a genome sequencing project. It enables curators to improve automated gene predictions by visually comparing a variety of experimental evidence tracks from different sources to more accurately represent the underlying biology.

Apollo is a web-based genome annotation editor that allows curators to manually revise and edit genomic elements. It provides a reporting structure for annotated genomic elements and an ‘Annotator Panel’ that allows users to quickly browse the genome and its annotations. Users can manually edit the structure of a genomic element as well as add metadata, including references to other databases and functional assignments with specific lookup support for Gene Ontology (GO) terms.

Apollo is currently being used in over one hundred genome annotation projects around the world, ranging from annotation of a single species to lineage-specific efforts supporting annotation for dozens of organisms at a time. Collaborators are able to visualize each others changes in real time (similar to Google Docs), restrict access to annotations depending on the role of users and groups within the community, and share tracks of evidence data with the public. Finally, users are able to export their manual annotations via FASTA, GFF3, the Chado database schema, and web services. Lastly, Apollo is available for integration with Galaxy via Docker, allowing users to run genome analyses sequencing using the Galaxy platform.

Apollo is an Open-Source project. Further details and code are available at http://genomearchitect.org/.

Speakers
avatar for Nathan Dunn

Nathan Dunn

Lead Software Engineer, Lawrence Berkeley National Laboratory
I primarily work on the Apollo project, a web-based genome editor used for real-time collaborative manual curation, AKA Google Docs for genome editing. Apollo is built using JBrowse as our genomic viewer. http://genomearchitect.org https://github.com/GMOD/Apollo We use Grails + GWT + Angular (in addition to the JBrowse stack). I'm a scientific programmer having worked in a variety of domains including biology, psychology, automated speech... Read More →



10:10am

Accurate and Complete Gene Construction with EvidentialGene Pipeline
→ Slides    doi:10.7490/f1000research.1112467.1

Author

Don Gilbert, Indiana University



Abstract
Precision genomics is essential in medicine, environmental health, sustainable agriculture, and biological research. Yet popular genome informatics methods lag behind the high levels of accuracy and completeness in gene construction that are attainable with current RNA-seq data.

EvidentialGene is a genome informatics pipeline for gene construction that has a measurably high accuracy and completeness rate for animals and plants, from insects, ticks and crustaceans to crop plants and trees, to fishes and other vertebrates. It uses big data from gene sequencers, generating bigger gene sets than alternate methods, then reduces those with biological criteria of protein codes and orthology into accurate species gene sets. EvidentialGene is in production use at compute centers in USA, Sweden, Australia and elsewhere.

The software pair of MAKER and Trinity form a common recipe now in gene discovery publications, but greater accuracy is possible and easy to obtain. Recent examples with disease vector mosquitoes Aedes (yellow fever, Zika virus) and Anopheles (malaria), show EvidentialGene surpasses accuracy of published genes from MAKER, Trinity and Vectorbase. For fishes, Evigene surpasses those recently published from MAKER, Trinity and NCBI Eukaryote genome annotation pipelines.

Galaxy installations that provide genome and transcriptome services will benefit by adding EvidentialGene. This author challenges Galaxy centers with MAKER, Trinity or other gene construction pipelines to reach comparable accuracy and completeness of EvidentialGene, and will collaborate on such with select genomics projects.

Speakers
avatar for Don Gilbert

Don Gilbert

Indiana University



10:30am

11:00am

Galaxy security practices in an age of clinical data for point of care services
Slides    doi: 10.7490/f1000research.1112735.1

Authors

Carrie Ganote, Indiana University 

Abstract
One of the major users of bioinformatics pipelines is the medical field. This poses a challenge for system administrators and software developers who provide web-facing services - securing the client's data. Certain data sets in genomics can be considered sufficiently identifiable and thus qualify as electronic protected health information (ePHI), which is then further protected by HIPAA (Health Insurance Portability and Accountability Act).

This talk will be an outline of hurdles associated with making Galaxy robust in a clinical setting. Best practices leverage a two-tiered approach at both operating system and application layers. Initially, systems configuration will be explored including least privilege for service accounts and database users, encryption of files, and system access. Later, best uses of Galaxy will be highlighted as they apply to moving data, storage, and account policies, following a rigorous NIST-based cyber risk management framework. 

Speakers
avatar for Carrie Ganote

Carrie Ganote

Indiana University



11:00am

Session 6
Session 6 features 5 accepted talks.

Moderators
avatar for Margaret Staton

Margaret Staton

University of Tennessee Knoxville
My lab works on genome databases, web applications, cyberinfrastructure, and RNASeq data. Our main website is hardwoodgenomics.org.


11:20am

Increasing Beer Time: Decreasing the Galaxy System Administration Burden
Slides    doi: 10.7490/f1000research.1112736.1

Author

Nate Coraor, Penn State University 

Abstract
Galaxy is a large application with many moving pieces and dependencies on numerous outside applications and libraries. As Galaxy is a Python application, some of these dependencies are Python modules. Other dependencies include a proxy (web) server, database server, and possibly a distributed resource manager (for cluster job submission). The task of installing and orchestrating the operation of these components can be difficult, in part due to Galaxy’s desire to support the wide variation in computing environments and policies at sites where Galaxy is installed.

In order to ease the burden for Galaxy administrators, we have made several improvements to Galaxy dependency handling and installation. Galaxy’s Python dependencies were tightly controlled by Galaxy and used an outdated format. Significant work was undertaken to modernize the handling of these dependencies while loosening control, in order to give more flexibility to administrators. Galaxy is now fully compatible with the standard Python packaging tool chain, including pip and wheel, and further, it can now be used with the Anaconda Python distribution.

Another point of administrator frustration is installing and updating the Galaxy code itself. Galaxy is currently distributed via git, but administrators often prefer system package managers. Building upon the dependency management changes, it is now possible to create Galaxy packages and install them in the same manner as more traditional system software packages. This also allows for tighter integration with system-level dependencies such as proxy and database servers.

Speakers
avatar for Nate Coraor

Nate Coraor

Galaxy Project, Penn State University



11:40am

The Intergalactic Utilities Commission - driving Galaxy tool development
Slides    doi:10.7490/f1000research.1112466.1

Authors

Marius van den Beek, Daniel Blankenberg, Dave Bouvier, John Chilton, Peter Cock, Nate Coraor, Björn Grüning, Youri Hoogstrate, James Johnson, Greg von Kuster, Eric Rasche and Nicola Soranzo

Abstract
Galaxy provides abstractions to make it easy to integrate tools, so virtually any tool that can be run from the command line can be integrated into Galaxy. The ability to seamlessly integrate tools into Galaxy spawned a large community of Galaxy tool developers, with the Galaxy Tool Shed as a distrubtion platform for installation into any Galaxy instance. This proliferation of tools resulted in the need for an oversight committee to set standards, define best practices, and vet tools for the Galaxy community. In 2012, the Intergalactic Utilities Commission (IUC) was founded as an organized body to provide these services, and has developed best-practice guidelines for tool development. These standards are a continual work-in-progress as new technologies are introduced into the Galaxy environment.

We will highlight IUC achievements over the past year, including enhanced reproducible installations via Starforge and cargo-port, new dependency resolution systems like Conda, and various enhancements to Galaxy tool syntax that enable more powerful and user-friendly tools. We’ll introduce new processes that have enhanced Galaxy tool development, testing and maintenance using Planemo and Conda, with details about how these applications can be used as complementary components to Galaxy and the Galaxy Tool Shed.

Important goals of the IUC are to continue to grow not only the community, but also the committee itself so that we can provide the friendly oversight benefits to every Galaxy Tool developer that is interested. This past year the IUC has welcomed 3 new members and organised 3 Codefests. We welcome others that have an interest in joining this committee and work with us.

Speakers
avatar for Björn Grüning

Björn Grüning

University of Freiburg



12:00pm

Planemo – A Scientific Workflow SDK
Slides

Authors

John Chilton, Galaxy Project
Aysam Guerler, Galaxy Project
Galaxy Team, Galaxy Project 

Abstract
A novel approach to building, refining, and running scientific workflows leveraging Galaxy through Planemo will be presented. The Galaxy workflow editor and workflow extraction interface are great tools enabling any Galaxy user to easily build workflows. However, tool authors using Planemo and sophisticated bioinformaticians may prefer driving workflow development through their existing tool chains such as programming text editors, command-line testing, and revision control. The approach presented leverages YAML-based workflow descriptions as plain files allowing exactly this.

The approach will be used as a lens to highlight these workflows formats (Format 2 Galaxy workflows and Common Workflow Language (CWL) workflows) as well as important highlights from the myriad of recent Galaxy workflow enhancements that have made them dramatically more usable, powerful, and performant.

Available today, Format 2 Galaxy workflows map directly to existing Galaxy tool and workflow concepts and are described in a very concise and readable YAML format. CWL specifications for tools and workflows are developed in an open fashion by many organizations with the aim of creating truly portable descriptions. The execution of CWL workflows in Galaxy is being actively worked on and progress will be discussed.

Underlying all of this is core Galaxy enhancements that will be demonstrated. The user interface for workflows has been overhauled and improved. Additionally, workflows now allow nesting, labels, non-data inputs, implicit connections between steps, and many new operations over collections - greatly increasing the expressive power of Galaxy workflows. Finally, recent performance enhancements allow Galaxy workflows to scale to thousands of datasets.

Speakers
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University



12:20pm

CloudLaunch as a multi-cloud, multi-application launch platform
Slides    doi:10.7490/f1000research.1112589.1

Authors

Enis Afgan, Johns Hopkins University, USA
Nuwan Goonasekera, University of Melbourne, Australia

Abstract
CloudLaunch started as BioCloudCentral.org, and provided a simple, intuitive way to launch Galaxy CloudMan on the Amazon cloud. The original idea has expanded over the years to accommodate launching of Virtual Machines for multiple applications, on various clouds, with additional configuration options. The Cloud Computing landscape has also evolved to facilitate deployment of complex applications, with increasing support for containers.

To adapt to these new realities - we have rewritten CloudLaunch from the ground up as a general purpose application launch platform, targeting multiple applications, clouds and containers.

End users can use the new CloudLaunch as their cloud application deployment and management dashboard. From an app-store-like interface, cloud applications can be selected and launched from multiple clouds (Amazon, OpenStack and soon, GCE). Furthermore, users can view their live and shut-down instances from any supported cloud from this single location.

Technically, CloudLaunch has a fully-defined, documented and browsable ReST API, as well as an extensible web-based front-end for easy management. CloudLaunch’s UI allows each application to define its own custom UI, which can be dynamically plugged into CloudLaunch using simple descriptor metadata. This allows each application to present complex configuration options, allowing the application deployer an easy mechanism for providing launch-time configuration options.

This talk will present the new CloudLaunch features, from an end-user perspective as well as describe how developers and deployers can use it to define and deploy applications. Sample applications such as Galaxy on the Cloud, the Genomics Virtual Lab, a SLURM cluster, and RStudio will be showcased.

Speakers
avatar for Enis Afgan

Enis Afgan

Galaxy Project, Johns Hopkins University
Everything 'Galaxy on the Cloud' related!



12:40pm

Arts & Crafts
GCC sure can be overwhelming sometimes! This is a quiet place to do some stress free, science related, arts and crafts.

Moderators
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus Medical Center
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology

12:40pm

Birds-of-a-Feather Flocking

There is no better place than a Galaxy Community Conference to meet and learn from others doing data-intensive biology.  GCC2016 will continue this tradition by again including Birds of a Feather (BoF) meetups.  Birds of a Feather meetups are informal gatherings where participants group together based on common interests.

BoF meetups are encouraged throughout GCC2016.   

If you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


 

 


Wednesday June 29, 2016 12:40pm - 1:40pm
IMU: Indiana Memorial Union 900 E 7th St, Bloomington, IN

12:40pm

12:40pm

Bioinformatics education at undergraduate level

Bioinformatics skills have become essential to modern biologists, yet many schools do not have bioinformatics programs or even a designated bioinformatics course at the undergraduate level. What are the list of qualifications you wish to see from your incoming graduate students? How can we prepare them to meet your expectations? If you have any thoughts on bioinformatics curriculum development and/or faculty training, please join me. (If the meeting time does not work for you, please contact me at zxu@bgsu.edu at any time. Thank you!)

If you are interested in participating in this BoF, create a Sched login (if you don't already have one), and add this BoF to your personal schedule.


And
, if you are interested in proposing a BoF, please submit your idea here and we'll add it to the schedule.


Moderators

1:40pm

FlowGalaxy: Developing a workflow for Flow Cytometry Analysis in Galaxy
Slides    doi: 10.7490/f1000research.1112455.1)

Authors

Cristel G Thomas, Northrop Grumman TS,
Elizabeth Thomson, Northrop Grumman TS,
Patrick Dunn, Northrop Grumman TS,
Henry Schaefer, ESAC, Inc,
Jeff Wiser, Northrop Grumman TS,
John C Campbell, Northrop Grumman TS

Abstract
Flow cytometry is generating increasingly massive multi-dimensional datasets. Available analysis tools exist, but they require extensive human intervention and are not readily scalable for the increasing size of the datasets. More effort has recently been put into developing tools allowing automated analysis for high-throughput flow data, but they are geared toward bioinformaticians.

We are taking advantage of the Galaxy framework to create a workspace for high-throughput Flow Cytometry Data analysis that can be better understood and accessible for the average bench immunologist. We leveraged Galaxy’s innate ability to support multiple programming languages to develop a user-friendly analysis workflow allowing conversion and manipulation of flow cytometry binary data to text, clustering analysis and interactive visualization of the results. We have ported existing tools from Immport to Galaxy written in R, C or Python and created novel text manipulation tools in Python, and data interactive visualization tools in Javascript. These tools will be made freely available to the public through FlowGalaxy, which is deployed on an AWS Cloud instance.

Speakers
avatar for Cristel G. Thomas

Cristel G. Thomas

Research Scientist, NG



1:40pm

Session 7
Session 7 features a mix of sponsor and accepted talks.

Moderators
avatar for Carrie Ganote

Carrie Ganote

Indiana University


2:00pm

Outbreak surveillance and investigation using IRIDA and SNVPhyl
Slides    doi:10.7490/f1000research.1112590.1

Authors

Aaron Petkau (1), Franklin Bristow (1), Thomas Matthews (1), Josh Adam (1), Philip Mabon (1), Cameron Sieffert (1), Eric Enns (1), Jennifer Cabral (2), Joel Thiessen (2), Natalie Knox (1), Damion Dooley (3), Aleisha Reimer (1), Eduardo Taboada (6), Alex Keddy (7), Robert G. Beiko (7), William Hsiao (3,4), Morag Graham (1,2), Gary Van Domselaar (1,2), The IRIDA Consortium and Fiona Brinkman (5)

(1) National Microbiology Laboratory, Winnipeg, Canada
(2) University of Manitoba, Winnipeg, Canada
(3) BC Public Health Microbiology and Reference Laboratory, Vancouver, Canada
(4) University of British Columbia, Vancouver, Canada
(5) Simon Fraser University, Burnaby, Canada
(6) National Microbiology Laboratory, Lethbridge, Canada
(7) Dalhousie University, Halifax, Canada

Abstract
Modern epidemiological investigations of infectious disease outbreaks are transitioning to routinely incorporate Whole Genome Sequencing (WGS) data for microbial pathogens. WGS provides a wealth of information previously unavailable, enabling fine-level resolution of isolates using data from the entire genome, down to Single Nucleotide Variants (SNVs). However, the application of WGS for genomic epidemiology continues to be hindered by the complexities of data management and analysis, often requiring considerable expertise as data progresses from the sequencer into a final report.

Here, we present IRIDA (Integrated Rapid Infectious Disease Analysis) and SNVPhyl (SNV Phylogenomics) our platform for genomic epidemiology and pipeline for SNV-based phylogenies respectively. IRIDA stores and manages WGS data and associated epidemiological metadata; provides the execution of analysis pipelines via an internal Galaxy instance, as well as visualization and evaluation of results. Capacity also exists for incorporation of IRIDA-managed data into external tools, such as independent Galaxy installations, through a REST-like API. SNVPhyl enables the classification and clustering of bacterial isolates by identifying phylogenetically informative SNVs from sequence reads. SNVPhyl is distributed as a Galaxy workflow and suite of tools; enabling incorporation within independent Galaxy instances, batch execution via a provided command-line controller script, or execution as part of the larger IRIDA package.

IRIDA and SNVPhyl have shown considerable success within Canada as we transition towards routine sequencing for surveillance and outbreak investigations. With the help of the Galaxy community we have made significant improvements over previous years and IRIDA and SNVPhyl are now freely available at https://github.com/phac-nml/irida and http://snvphyl.readthedocs.org/.

Speakers
avatar for Aaron Petkau

Aaron Petkau

Bioinformatician, Public Health Agency of Canada



2:20pm

Metavisitor, a suite of Galaxy tools and workflows for detection or discovery of viruses in NGS datasets
Slides    doi:10.7490/f1000research.1112468.1

Authors

Marius van den Beek, Institut de Biologie Paris Seine
Guillaume Carissimo, Institut Pasteur; Juliana Pegoraro, Institut de Biologie Paris Seine
Kenneth D Vernick, Institut Pasteur; and Christophe Antoniewski, Institut de Biologie Paris Seine

Abstract
In the aim of providing biologists and medical doctors with an accessible and adaptable software to detect and reconstruct viral genomes from sequencing datasets, we implemented in Galaxy a set of tools and workflows called Metavisitor. This suite of tools and workflows can be used directly upon access to our Mississippi server or installed on any Galaxy server instance. Using the graphical Galaxy workflow editor, the Metavisitor workflows can be adapted to suit specific needs, by adding analysis steps or replacing/modifying existing ones. Metavisitor works with DNA, RNA or small RNA sequencing data that provide different read lengths and can use combination of a de novo and guided approaches to assemble viral genomes from sequencing reads. Thus, the software has the potential for quick diagnosis as well as discovery of viruses (or other pathogens) from a vast array of organisms. Importantly, we are working at an executable paper on how to use Metavisitor in various use-cases as well as at an ansible-based procedure to easily deploy a Metavisitor Galaxy instance on available hardware. We hope that these development lines will increase the accessibility and transparency of Metavisitor and help researchers to focus on biological or medical issues.

Speakers
avatar for Christophe Antoniewski

Christophe Antoniewski

Head of ARTbio bioinformatics, CNRS - Institut de Biologie Paris Seine
avatar for Marius van den Beek

Marius van den Beek

IBPS / Université Pierre et Marie Curie



2:40pm

Chemflow, chemometrics using Galaxy
Slides    doi:10.7490/f1000research.1112573.1

Authors

Virginie Rossard, INRA-LBE
Fabien Gogé, IRSTEA Montpellier
Eric Latrille, INRA-LBE
Jean-Michel Roger, IRSTEA Montpellier
Jean-Claude Boulet, INRA-SPO

Abstract
Infrared spectroscopy is widely used in academic research and industry as simple, fast, cheap and safe measurement tool. The infrared data are displayed as spectra, and chemometric is a science which aims at extracting informations from spectra.

We are developing a comprehensive package which contains (1) a MOOC broadcasted in september 2016; (2) a chemometric tool, named ChemFlow, which is an application of Galaxy; and (3) a spectral database. We will focus on ChemFlow.

The required specifications were:
  • a free tool;
  • a tool which recycles code from Matlab, Scilab, R, Python and C;
  • a tool accessible via internet with new devices such as smartphones.


That's why we chose Galaxy. ChemFlow is being implemented with our own functions. By now it includes most of the processing tools : import and convert our data; run chemometrics methods such as calibrations and classifications.

We are very satisfied of the performances of Chemflow running on a server. Nevertheless, some issues were fixed, others are still pending:

  • Speed performance was improved by switching the Galaxy server to Apache and PostgreSQL.
  • Hundreds of users are expected. We plan to deploy 2 servers of 48-cores each, without knowing how ChemFlow will behave with many users submitting little tasks.
  • The graphical toolbox in Galaxy is our main work in progress, and we are currently implementing several original visualisation tools such as R-Shiny.    
  • The development of a specific toolshed is discussed.    


As a summary, Galaxy is used in a new domain, chemometrics, adressed to a new user community, and will be a central platform for a new e-learning module, as a MOOC.


Speakers
avatar for Virginie Rossard

Virginie Rossard

engineer, INRA-LBE
computer scientist, specialized in databases and development



3:15pm

P02: Science Gateways Community Institute
Poster    doi:10.7490/f1000research.1112592.1

Authors

Maytal Dahan, University of Texas at Austin
Sandra Gesing, University of Notre Dame
Linda B. Hayden, Elizabeth City State University
Katherine Lawrence, University of Michigan
Marlon E. Pierce, Indiana University
Nancy Wilkins-Diehr, The University of California, San Diego
Michael Zentner, Purdue University

Abstract 
Science gateways, also known as web portals, virtual research environments, virtual laboratories, are a fundamental part of today’s research landscape. But they can be difficult to develop in a sustainable fashion. This talk will provide an overview of the newly funded NSF Science Gateways Community Institute, which aims to address these challenges by offering services to and building community among the research communities developing gateways. The institute is comprised of five areas to support gateways throughout their lifecycle:
  •  Incubator will provide shared expertise in business and sustainability planning, cybersecurity, user interface design, and software engineering practices.
  • Extended Developer Support will provide expert developers for up to one year to projects that request assistance and demonstrate the potential to achieve the most significant impacts on their research communities.
  • Scientific Software Collaborative will offer a component-based, open-source, extensible framework for gateway design, integration, and services, including gateway hosting and capabilities for external developers to integrate their software into Institute offerings.
  • Community Engagement and Exchange will provide a forum for communication and shared experiences among gateway developers, user communities, within NSF, across federal agencies, and internationally. 
  • Workforce Development will increase the pipeline of gateway developers with training programs, including special emphasis on recruiting underrepresented minorities, and by helping universities form gateway support groups.
We envision close collaborations with gateway providers such as the Galaxy developer group to provide best practices for developers and use cases of real-world gateways to improve the experience and efficiency of developers and user communities.

Presenters
SM

Suresh Marru

Indiana University
avatar for Nancy Wilkins-Diehr

Nancy Wilkins-Diehr

Associate Director, San Diego Supercomputer Center
Science gateways and running


3:15pm

P04: Dynamic Tool Destination – A universal rule based job to destination mapper
Poster    doi:10.7490/f1000research.1112739.1

Authors

Eric Enns1, Philip Mabon1, Daniel Bouchard2, Mark Iskander2, Gary Van Domselaar1,2

1 National Microbiology Laboratory, Winnipeg, Canada
2 University of Manitoba, Winnipeg, Canada

Abstract
Galaxy has been in use at Canada’s federal public health laboratory, the National Microbiology Laboratory, since 2010. Prior to our incorporation of Galaxy, all bioinformatics tools were run manually on our cluster by bioinformaticians and a select few biologists. As we have transitioned to Galaxy as our primary computing environment, more biologists have been empowered to run their own analysis, which has increased the load on our cluster. To prevent job failures, Galaxy was configured to request a static amount of resources per tool which was suboptimal.

Galaxy allows tools to use a dynamic destination, which permits resource optimization. A survey of available dynamic destinations revealed that these are specific to a tool. Rather than develop a specific dynamic destination for every tool we have installed, our objective was to develop a universal dynamic destination solution that would work with every tool.

To this end, we have developed Dynamic Tool Destination (DTD), which is tool and destination independent. When DTD is set as the default destination in Galaxy’s job_conf.xml it can replace all tool destinations. DTD matches tools to destinations using rules setup in its own configuration file. If any rules match, it will apply them; if none match, either tool specific or DTD default destination is applied. The benefit is that once job_conf.xml has all of your possible destinations defined, configuring a new tool to use DTD can be done on the fly as it doesn’t require Galaxy to be restarted.

Dynamic Tool Destination is freely available at https://github.com/phac-nml/dynamic-tool-destination

Presenters
EE

Eric Enns

Senior Bioinformatician, Public Health Agency of Canada


3:15pm

P08: COMBAT TB Explorer, a TB data exploration workbench
Poster    doi:10.7490/f1000research.1112402.1

Authors

Peter van Heusden, Ziphozake Mashologu, Alan Christoffels, South African National Bioinformatics Institute

Abstract 
Tuberculosis (TB), an infectious disease caused by the Mycobacterium tuberculosis, ranks as one of the leading causes of death worldwide, with WHO recording 9.6 people falling ill with 1.5 million deaths in 2014. This disease burden has arguably been matched with continued increase in genomic, transcriptomic and proteomic data for Mycobacterium tuberculosis as a result of NGS technologies. This continued expansion of data is exemplified by the growth of data repositories such as the tuberculosis database (TBDB)  and the pathosystems resource integrated center (PATRICBRC). Unfortunately, these resources only present pre-computed data and do not provide the computational toolkit for biomedical researchers to analyze their own data. We have created the COMBAT TB Explorer, a Galaxy-based environment for annotating and exploring M. tuberculosis sequence data. COMBAT TB Explorer currently combines a genomic variant calling pipeline with a web based tool for exploring the relationship between variants and known annotation and allows the user to perform geneset enrichment analysis.

Presenters
avatar for Ziphozake Mashologu

Ziphozake Mashologu

Developer, UWC - South African National Bioinformatics Institute
I am part of the Software Development Team at SANBI - University of the Western Cape. I've been involved in building web and mobile applications for nearly a decade and is have high interest in the Galaxy project.


3:15pm

P10: A resource for metabolomics and transcriptomics analysis
Poster    doi: 10.7490/f1000research.1112741.1

Authors

Manhoi Hur, Iowa State University
Jason R. Miller, J Craig Venter Institute
Christopher D. Town (J Craig Venter Institute
Erik Ferlanti, J Craig Venter Institute
Irina Belyaeva, J Craig Venter Institute
Eve Syrkin Wurtele, Iowa State University

Abstract

PMR (Plant/Eukaryotic and Microbial Systems Resource) and its database are a community resource for deposition and analysis of metabolomics data and related transcriptomics data. PMR currently houses terabytes of data and metadata from over 25 species of eukaryotes, and provides a unique resource for computational modeling and hypothesis development. PMR’s web APIs enables PMR data and analytic functions to integrate with other community resources. In this talk, we introduce the PMR database and illustrate its analytic tools. We present a proof-of-concept for the utility of the API as a research science app using Araport to provide Arabidopsis metabolomics data and its functionality to diverse users.

Presenters


3:15pm

P12: A Galaxy Workflow for the Generation of Synthetic FASTQ Samples
Poster    doi:10.7490/f1000research.1112743.1

Authors

Michael Ta, Philip D. Cotter, Mathew W. Moore, Bioinformatics Department, ResearchDx, Irvine CA, USA 

Abstract 
Developing and validating a bioinformatics pipeline for a clinical assay is frequently a costly process; in some cases it includes the acquisition of limited patient samples with extremely rare genotypes. To aide in the validation process, we have developed a Galaxy workflow to generate synthetic FASTQ files with known mutations provided by the user along with those sourced from dbSNP. Researchers can use these FASTQ files to mimic various mutations types including single nucleotide variants, insertions, deletions, translocations, and copy number variations. Researchers can optimize and evaluate the expected efficiency of their bioinformatics pipeline using synthetic samples simulated at various sequencing depths to mimic both germline and somatic events. In addition, changes to existing pipelines can easily be re-verified using static synthetic datasets as the gold standard reference. The workflow uses several open source tools to retrieve the genomic location of variants in HGVS notation (Transvar) and simulate reads from common sequencing platforms (ART). A custom sequencing profile can be provided to ART to simulate reads with a base quality and call rate similar to specific sequencing machines used in the lab. The workflow described here provides a solution to the regulatory requirement for validation and re-validation of clinical bioinformatics pipelines.

Presenters


3:15pm

P14: Outbreak surveillance and investigation using IRIDA and SNVPhyl
Poster    doi:10.7490/f1000research.1112511.1

Authors

Aaron Petkau (1), Franklin Bristow (1), Thomas Matthews (1), Josh Adam (1), Philip Mabon (1), Cameron Sieffert (1), Eric Enns (1), Jennifer Cabral (2), Joel Thiessen (2), Natalie Knox (1), Damion Dooley (3), Aleisha Reimer (1), Eduardo Taboada (6), Alex Keddy (7), Robert G. Beiko (7), William Hsiao (3,4), Morag Graham (1,2), Gary Van Domselaar (1,2), The IRIDA Consortium and Fiona Brinkman (5)

(1) National Microbiology Laboratory, Winnipeg, Canada
(2) University of Manitoba, Winnipeg, Canada
(3) BC Public Health Microbiology and Reference Laboratory, Vancouver, Canada
(4) University of British Columbia, Vancouver, Canada
(5) Simon Fraser University, Burnaby, Canada
(6) National Microbiology Laboratory, Lethbridge, Canada
(7) Dalhousie University, Halifax, Canada 

Abstract
Modern epidemiological investigations of infectious disease outbreaks are transitioning to routinely incorporate Whole Genome Sequencing (WGS) data for microbial pathogens. WGS provides a wealth of information previously unavailable, enabling fine-level resolution of isolates using data from the entire genome, down to Single Nucleotide Variants (SNVs). However, the application of WGS for genomic epidemiology continues to be hindered by the complexities of data management and analysis, often requiring considerable expertise as data progresses from the sequencer into a final report.

Here, we present IRIDA (Integrated Rapid Infectious Disease Analysis) and SNVPhyl (SNV Phylogenomics) our platform for genomic epidemiology and pipeline for SNV-based phylogenies respectively. IRIDA stores and manages WGS data and associated epidemiological metadata; provides the execution of analysis pipelines via an internal Galaxy instance, as well as visualization and evaluation of results. Capacity also exists for incorporation of IRIDA-managed data into external tools, such as independent Galaxy installations, through a REST-like API. SNVPhyl enables the classification and clustering of bacterial isolates by identifying phylogenetically informative SNVs from sequence reads. SNVPhyl is distributed as a Galaxy workflow and suite of tools; enabling incorporation within independent Galaxy instances, batch execution via a provided command-line controller script, or execution as part of the larger IRIDA package.

IRIDA and SNVPhyl have shown considerable success within Canada as we transition towards routine sequencing for surveillance and outbreak investigations. With the help of the Galaxy community we have made significant improvements over previous years and IRIDA and SNVPhyl are now freely available at https://github.com/phac-nml/irida and http://snvphyl.readthedocs.org/.

Presenters
avatar for Aaron Petkau

Aaron Petkau

Bioinformatician, Public Health Agency of Canada


3:15pm

P16: Comparison of Metagenomics Taxonomy Assignment Methods: Popular Softwares and NCBI Mega-Blast
Poster    doi: 10.7490/f1000research.1112744.1

Authors
Huaiying Lin, Stefan Green, Pinal Kanabar, Neil Bahroos, and Mark Maienschein-Cline, University of Illinois at Chicago

Abstract
There are numerous taxonomy assignment tools available in the bioinformatics field for metagenomics studies, but the performance of these tools has not been well studied on the same set of samples by a third party. Our goal is to measure the discrepancies and consistencies of several popular taxonomy classifiers on the same samples and to compare the results obtained with different sequencing technologies. In this study, 8 stool samples were collected and sequenced with both whole genome shotgun and 16s sequencing methods. We compared the consistency of taxonomy profiling using (1) five popular off-the-shelf taxonomy profiling softwares: Mothur and Qiime for 16s amplicons, and Gottcha, Metaphlan2 and Metaphyler for shotgun reads; (2) MEGAN’s Lowest Common Ancestor (LCA) taxonomy profiling algorithm using NCBI Megablast-based output between three databases: nt (non-redundant nucleotide), nr (non-redundant amino acid) and 16s microbial nucleotide. From approach (1), we found the taxonomic composition from 16s amplicons is a useful estimate of the whole genome shotgun reads. Mothur, Qiime, Metaphyler and Metaphlan2 showed similar clustering patterns, although Gottcha returned distinctive results. From approach (2), we found that when comparing across different sequencing/processing methods, shotgun reads are the most stable regardless of database types, while 16s reads show different beta diversity among the three databases. Comparing across different databases, we found shotgun reads have a higher beta diversity when comparing to 16S amplicons, and nt database gives the most different taxonomic profiles. 

Presenters


3:15pm

P18: Examining Genomic Variants in a Polymorphic Species
Poster    doi: 10.7490/f1000research.1112745.1

Authors 

Jennifer Callaway, Indiana State University
Rusty Gonser, Indiana State University
Elaina Tuttle, Indiana State University

Abstract 
Populations of individuals may now be sequenced due to new technological advances, allowing for studies into the correlation between the genome and behavior. Understanding the genome of a species is important for analyzing species adaptation and diversity. We seek to utilize genomic data to understand how variations in the genome are influencing behavior. We resequenced white-throated sparrows (Zonotrichia albicollis) with the Ion Torrent Personal Genome Machine (PGM) and identified variants, including single nucleotide polymorphisms, insertions, and deletions. Z. albicollis has two morphological variations due to a chromosomal rearrangement, resulting in different behaviors, including aggression, promiscuity, and nesting in diverse habitat types. To identify potential variants, we compared sequences to the NCBI reference genome of a tan male Z. albicollis. We hypothesize that genomic adaptations are correlated with phenotypic characteristics and expect phenotypic and genotypic differences to exist both within and between morphs due to individual differences. We further hypothesize greater genomic differences between morphs than within morphs due to reduced recombination within the rearrangement known to vary between morphs. We have currently identified 1172 unique variants in Z. albicollis (1088 SNPs, 18 insertions, 23 deletions, and 43 multiple nucleotide polymorphisms) within six sequenced individuals. Significantly more SNPs were identified than other variant types combined. Individual variants have also been identified which are unique to morph or sex. Understanding the genomic components driving behavior and local adaptation can improve management techniques for the conservation of species and habitat and improve understanding of how the genome impacts phenotypes, including behaviors and diseases.

Presenters


3:15pm

P20: Analysis of small non-coding RNAs of poorly annotated species in Galaxy with the help of ortholog information of well annotated species
→ Poster    doi: 10.7490/f1000research.1112746.1

Authors 

Jochen Bick, ETH Zurich
Susanne E. Ulbrich, ETH Zurich
Stefan Bauersachs, ETH Zurich

Abstract 
The analysis of RNA-seq data with a basic analysis pipeline including quality control, filtering, trimming, and adapter clipping followed by mapping to a reference genome or transcriptome is a straightforward task using Galaxy. For processing of smallRNA-Seq data it is necessary to modify this analysis pipeline because the resulting reads correspond, at least in theory directly to a small RNA. This leads to a different mapping strategy using BLASTn for short sequences. An additional common problem is that the number of annotated small non-coding RNAs is very low for certain species including pig and cattle which makes it very difficult to annotate smallRNA data in such species. In human a great variety of non-coding RNAs are known compared to other mammalian species that gave us the idea to use the ortholog information of well annotated species. Our workflow is mainly based on basic Galaxy tools and some own in-house scripts. The idea is to use well annotated related species information to improve the annotation of each sequence found in smallRNA-Seq results. First we use the basic analysis pipeline and check for quality, filter, trim and clip the adapter sequence. Afterwards we count and filter the unique sequence reads directly with a combination of different Galaxy tools plus additional tools developed in our group. These reads are mapped with BLASTn-short to align them to all transcripts of our sequenced species including non-coding RNAs and related well annotated species. The collection of BLAST databases contain sequences from mirBASE (precursor and mature mircoRNAs), sequences from NCBI and Ensembl, mostly non-coding RNAs but also protein-coding transcripts, as well as tRNA and piRNA cluster sequences. Finally, all BLAST results have to be filtered and joined by removing all duplicated hits. The annotated sequences are used for DEG analysis with EdgeR and/or DESeq2.

Presenters
avatar for Jochen Bick

Jochen Bick

ETH Zürich


3:15pm

P22: SeqResults for development of RNA-seq reagents
Poster    doi: 10.7490/f1000research.1112747.1

Authors

Timur Shtatland, NEB; Erbay Yigit, NEB; Keerthana Krishnan, NEB; Mehmet Karaca, NEB; Deyra N. Rodriguez, NEB; Eileen T. Dimalanta, NEB; Theodore B Davis, NEB; Bradley W. Langhorst, NEB

Abstract 
We have developed SeqResults to enable simple comparison of libraries across experiments. The Galaxy-integrated component captures metadata and results in a relational database. Results are available via a simple web site and Tableau visualizations. SeqResults has recently been extended with RNA-seq features. It aggregates simple metrics like fractions of reads on exons, introns and other genomic regions, average 5'-3' coverage and alignment efficiency. However, summary metrics are only part of the story. Accurate representation of transcript levels is important to any RNA-seq experiment. We present a simple interface to compare transcript levels as well as 5'-3' coverage profiles of individual transcripts across experiments. SeqResults now contains millions of individual results from 6841 libraries produced during development of NEBNext library preparation reagents.



3:15pm

P24: Sequence Data Analysis and the Clinical Genomics Database
Poster    doi:10.7490/f1000research.1112591.1

Authors

John H. Letaw, Carol Beadling, Julja Burchard, Charles Scott Dahl, Andrew Hadd, Douglas King, William Moore, Mandy Terrill, Jane Thanner, Richard Press, Sue Richards, Christopher L. Corless, Oscar Barney

Abstract 
Clinical laboratories have begun to offer a range of next-generation sequencing services to bring precision medicine to patients. Delivering secure, cost-effective, timely, clinically informative results from precision-medicine assays requires an end-to end integrated system. At Oregon Health and Science University, we have designed such a system and reduced it to practice. We describe here the system, the steps of its implementation, and selected use cases spanning from the sequencing of a patient sample to the creation of a report that is handed off to the physician. The first step in the process is properly validating a single invocation of a clinical sequencing analysis workflow against known results. Second, these sequencing analyses are documented and certified, in accordance with CLIA (Clinical Laboratory Improvement Amendments) and CAP (College of American Pathologists) regulations. Third, we engineered a robust process to encapsulate the certified analyses using the Galaxy platform as a workflow engine. Fourth, a team from our ITG (Information Technology Group) team built a Clinical Genomics Database (CGD) that allows us to collect, annotate, and report from automated, Galaxy-managed, CLIA/CAP certified pipelines to clinicians, in a convenient and organized manner. The CGD also collects and disseminates all current and relevant clinical genomics knowledge to provide decision support to our physicians and geneticists. The CGD has been successful in streamlining sequence data analysis, partnered with the Galaxy platform. We illustrate benefits of our system to patients, clinicians, and researchers.

Presenters


3:15pm

P26: National Resource for Translational and Developmental Proteomics Galaxy Portal
Poster    doi:10.7490/f1000research.1112748.1

Authors

Joseph Greer, Ryan Fellers, Richard LeDuc, Bryan Early, Alexandra Johanna VanNispen, Paul Thomas, Neil Kelleher 

Abstract 
The National Resource for Translational and Developmental Proteomics (NRTDP) at Northwestern University has developed a Galaxy Project based Top-Down proteomics search portal. This portal allows academic researchers free access to the NRTDP’s most advanced search algorithms in an easy to use and structured way. Academic researchers request access to the portal through the NRTDP webpage.

The NRTDP has created a standard workflow that allows academics to easily search their instrument data files. This workflow utilizes NRTDP developed tools to convert instrument data files, search those files, estimate false discovery rate and create a search output tdReport. The tdReport is viewable in another free NRTDP software offering, the Top Down Viewer.

After the search has completed, the workflow adds confidently identified proteoforms to the Proteoform Repository at the Consortium for Top Down Proteomics. As of March, 2016, UniProt includes cross-references to this repository.

The portal is accessible through a local Galaxy instance hosted on Northwestern University Virtual Infrastructure, uses Pulsar (version 0.5.0) to connect to a Windows 10 Server and to Northwestern University’s HPC cluster environment and uses DRMAA to access the Northwestern University scheduler.

Presenters


3:15pm

P28: Aligning with Architecture for NGS
→ Poster

Authors

John C. Hoag, Ohio University

Abstract 
Designs for large scale computing in commercial and even scientific settings, including virtual and cloud models, do not serve sequencing tasks well. This research is developed from the perspective of computation, networking, and storage, and it posits that a NGS execution requirements drive infrastructure in a different direction. Or, the current environments limit inquiry to smaller models that may lack insight and capability – such cases are optimized by in-memory compute, which does not scale.

This research commences with an analysis and taxonomy of components and systems - noting how they are misaligned with pipelines, thus contributing both to latency and underutilization. The scope of this task includes caching, parallel processing, GPU usage, storage-area networking; as well as hypervisor types and their interaction with commodity cloud storage. NGS tools in this analysis include Burrows- Wheeler Aligner and Novalign.

The goal of this research is to develop an approach and infrastructure around the Galaxy Project with the intention to acquiring and operating assets to support research and instruction – sensitive to workloads and the need to scale. The project methodology is to discern functional and performance requirements, assess materiel on-hand (based on a donated compute cluster), specify remaining components, and prepare for operations. Engagement with the Galaxy community is essential in this endeavor.

Presenters

3:15pm

3:15pm

D02: The Genomics Virtual Lab - A turnkey Galaxy platform for the cloud
Authors
  • Enis Afgan - Johns Hopkins University
  • Clare Sloggett, Nuwan Goonasekera, Simon Gladman, Yousef Kowsar, Andrew Lonie - Victorian Life Sciences Computation Initiative (VLSCI), University of Melbourne
  • Igor Makunin, Derek Benson, Michael Pheasant, Ron Horst - Research Computing Centre, University of Queensland.
  • Mark Crowe - Queensland Facility for Advanced Bioinformatics (QFAB), University of Queensland

Abstract

Australia’s researchers have access to a national cloud (NeCTAR) comprising 30,000 cores. To maximize the utility of this cloud to the growing number of genomics researchers, we have developed the Genomics Virtual Laboratory (GVL). The GVL allows anyone to create and launch a scalable, flexible personal cluster which all the key components preconfigured, which can be deployed on Openstack, Amazon and soon Google. It supports three user interfaces: the browser, SSH command line, and VNC desktop. Preinstalled web applications include Galaxy, RStudio, Jupyter Hub, Pacbio SMRT Portal, and Web Apollo. Under the hood Linuxbrew provides hundreds of command line and GUI bioinformatics tools. All in all, the GVL provides a turn-key solution for most bioinformatics tasks on the cloud.

In our GVL demonstration we will:
  • Launch different flavours of the GVL instance on various clouds
  • Show the various components of these GVLs
  • Import data from GenomeSpace into Galaxy
  • Use tools installed via Galaxy on the command line
  • Access the Galaxy filesystem from the command line
  • Describe how custom flavours of the GVL can be built via Ansible scripts
  • Discuss the success of GVL as a training platform

Presenters
avatar for Simon Gladman

Simon Gladman

Bioinformatician, VLSCI / University of Melbourne

3:15pm

D04: Managing Galaxy services with Ansible and GalaxyKickstart
Authors
Marius van den Beek, Christophe Antoniewski, http://artbio.fr 

Abstract
A production-grade galaxy server has many different parts that need to be setup and configured. This includes a relational database, a proxy server, libraries to interface with computing clusters, galaxy itself and the tools that are required for a Galaxy instance.

Ansible is a configuration management system for the configuration and maintenance of unix machines. In Ansible, a set of actions are organized into roles, and roles can be used in plays. The galaxy team is providing a number of roles and playbooks, which can be used as building blocks to setup one or more production servers. We have developed an Ansible playbook which sets up thematic Galaxy servers for different research communities by including specific tools, workflows, references, etc. We will explain how to keep server configurations up to date and online, and how one can use various types of infrastructures to build a well-controlled and tested galaxy server, where the iterations of the deployed server can be followed in a version control system.
https://artbio.github.io/ansible-artimed/ Licensed under GNU GPL
 

Presenters
avatar for Marius van den Beek

Marius van den Beek

IBPS / Université Pierre et Marie Curie

3:15pm

D06: BLAST with JBrowse and Galaxy
Authors
  • Eric Yao, JBrowse Lead Developer, CA Institute of Quantitative Biosciences at UC Berkeley
  • Ian Holmes, Prof of Bioengineer, Department of Bioengineering, University of California, Berkeley
  • Monica Muñoz-Torres, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
  • Colin Diesh, University of Missouri
  • Eric Rasche, Department of Biochemistry and Biophysics, Texas A&M University
  • Nathan Dunn, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
  • Suzi Lewis,  Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory

Abstract

This presentation will demonstrate BLAST analysis with an integration of JBrowse and Galaxy. The integrated platform uses a new, extensible JBrowse server (a Node-based web service framework) with Galaxy and BLAST modules as installable components. This project is intended to serve as an example for others developing JBrowse plugin-based extensions backed by Galaxy tools. The platform hosts an extensible REST API, sub/pub messaging integrated with the JBrowse, has its own job queue and a policy engine supporting OAuth.

Presenters

3:15pm

D08: MiModD - a streamlined tool suite for genetic variant identification and mapping with Galaxy
Authors
Wolfgang Maier, Mark Seifert, Katharina Moos, Ralf Baumeister, University of Freiburg

Abstract
MiModD (http://www.celegans.de/mimodd) is a GPLv3-licensed comprehensive tool suite for variant mapping and identification. It extends ideas and concepts found in CloudMap, for which it can serve as a drop-in replacement. Package highlights include: i) a fully integrated mapping-by-sequencing analysis pipeline without external dependencies, ii) multisample variant calling and filtering for improved call statistics and straightforward variant identification, iii) NacreousMap linkage analysis and plotting engine with full compatibility, but improvements over CloudMap. While MiModD can be installed and used as a standalone package for command line use, it features a full set of tool wrappers for seamless integration into Galaxy and is also available from the Galaxy Main Tool Shed (https://toolshed.g2.bx.psu.edu/view/wolma/mimodd). For users who just want to take advantage of the NacreousMap plotting engine to replot their existing CloudMap analyses, we host a dedicated public Galaxy server at http://mapping-by-sequencing.vm.uni-freiburg.de:8080.

Presenters
WM

Wolfgang Maier

University of Freiburg

3:15pm

D10: PMR (Plant/Eukaryotic and Microbial Systems Resource) and its Science app using Araport
Authors
Manhoi Hur, Iowa State University
Jason R. Miller, J Craig Venter Institute
Christopher D. Town, J Craig Venter Institute
Erik Ferlanti, J Craig Venter Institute
Irina Belyaeva, J Craig Venter Institute
Eve S. Wurtele, Iowa State University

Abstract

PMR (Plant/Eukaryotic and Microbial Systems Resource) uses a RESTful web API to share omics data and its functionality with researchers. In the demo, I demonstrate usage of PMR and its statistical analysis, including volcano plots, co-analysis of transcriptomic and metabolomic data, and data exploration. In addition, we present a proof-of-concept for the utility of the API as a research science app called PMR Plotter (v0.8), using Araport to provide and visualize Arabidopsis metabolomics data.

Presenters

3:15pm

3:15pm

4:30pm

Lightning Talks
The call for Lightning Talks will go out shortly before GCC2016 events begin.

4:30pm

Session 8
Lightning Talks and conference close.  

The call for lightning taks will go out just before GCC2016 events start.

Moderators
avatar for Nancy Wilkins-Diehr

Nancy Wilkins-Diehr

Associate Director, San Diego Supercomputer Center
Science gateways and running


4:33pm

Annotation integration of M. tuberculosis data using the Neo4j graph database
Slides    doi:10.7490/f1000research.1112749.1

Authors

Peter van Heusden

Abstract
At SANBI we are building a database to integrate annotation related to M. tuberculosis using the Neo4j graph database as a storage platform. We will present the construction of this database, demonstrate some sample queries using the Cypher graph query languages, show how Neo4j graph databases can be integrated with Galaxy and mention some strengths and weaknesses of the Neo4j graph database.

Presenters


4:40pm

Galaxy & Docker & Users

Slides    doi: 10.7490/f1000research.1112750.1

Author
Abdulrahman Azab
Björn Grüning

Abstract
This talk is relevant mainly for advanced developers and sysadmins who wish to support docker on their systems but skeptical about docker being insecure. This is also relevant for running Galaxy  in production on the top of a HPC system.

How to configure the system to run docker containers as the local user in a very simple and quick way without having to worry about e.g. having connection to LDAP from containers. 



Speakers
avatar for Björn Grüning

Björn Grüning

University of Freiburg

Presenters
avatar for Abdulrahman Azab

Abdulrahman Azab

Head Engineer, University of Oslo


4:47pm

Dynamic Tool Destination - A Universal Rule Based Job to Destination Mapper

Slides    doi: 10.7490/f1000research.1112751.1

As use of Galaxy increases and computational resources are continuously busy it becomes important to optimize resource usage. To address this issue, we have developed Dynamic Tool Destination (DTD), which is a dynamic job destination that works with all tools and destinations. In DTD an administrator sets up rules for each tool in a YAML file, these rules define what destination a tool should go to when particular parameters are present, input data is large or small, etc. DTD is open source under the Apache License and is available on github at https://github.com/phac-nml/dynamic-tool-destination


Presenters
EE

Eric Enns

Senior Bioinformatician, Public Health Agency of Canada


4:54pm

Applied Bioinformatics - Interdisciplinary Curriculum Built on Open Source Technology

Slides    doi: 10.7490/f1000research.1112752.1

Classic bioinformatics curricula are limited by a relatively rigid course compartmentalization, employment of expensive IT/Bioinformatics proprietary tools, and limited grading system as an outcome for completing the course. Here we present a curriculum infused with real-life research-based projects such as whole genome analysis, gene expression array and molecular dynamics, applied for aging, cancer and pharmacogenomics. These projects serve as pivotal points for integrating biomedical, computer science and statistics into one coherent interdisciplinary subject known as bioinformatics. Each project has scientific objectives serving as underlying platform for educational goals. Students join the projects after completing a basic course familiarizing them with the technical and scientific aspects of the projects. The curriculum is based on 100% open source, cutting edge, evolving technology. This allows teaching students to use the most current technology at the fraction of proprietary software price. The utilization of real-life projects brings excitement of involvement in pertinent discoveries and facilitates learning and open sharing of ideas. As the outcome of completing the projects, students will develop the skills, knowledge, and hands-on experience that will make them competitive in today's intensive and rapidly changing field of computational biology.


Presenters


5:01pm

Embracing Complexity and Diversity: Metaproteomics Within The Galaxy Framework.

Slides    doi:10.7490/f1000research.1112753.1

Metaproteomics characterizes proteins expressed by microorganism communities (microbiome) present in environmental samples or a host organism. Mass spectrometry (MS)-based metaproteomics has catalyzed new discoveries into the functional dynamics of microbiomes (Wilmes et al 2015, doi: 10.1002/pmic.201500183). Metaproteomic informatics is distinctly challenging due to the large databases and complex processing steps involved. This challenge limits widespread use of metaproteomics. Through modular workflows, we demonstrate the use of the Galaxy bioinformatics framework as a metaproteomic informatics solution (Jagtap et al 2015; doi: 10.1002/pmic.201500074). The workflow output results are compatible with tools for taxonomic and functional characterization (Unipept and MEGAN5). MEGAN5 was used to generate functional characterization of the metaproteome using Inter2Pro pathway analysis. These workflows enable new discoveries from diverse communities such as dental plaques (Rudney et al 2015, doi: 10.1186/s40168-015-0136-z), bronchoalveolar lavage fluid (BALF), lung tissue, and cervical-vaginal fluid (CVF). Our results demonstrate the power of discovery metaproteomics to add functional understanding to microbiomes, beyond what is possible using traditional metagenomic approaches.


Speakers
avatar for Pratik Jagtap

Pratik Jagtap

Center for Mass Spectrometry and Proteomics, University of Minesota



5:08pm

Distributing Galaxy Data Through CVMFS

Slides    doi: 10.7490/f1000research.1112754.1

GenAP is a Canadian platform that provides Galaxy instances across different Canadian HPC centers. Having more that 7 TB of reference genomes, replicating this data in all HPC centers becomes expensive and hard to keep in synch. Cern VM files system (CVMFS) allow us to centralize the provisioning, replicate the data and distribute genome references on demand. In CVMFS the local machine only imports the genomes necessary for the job being run allowing the use of a minimal storage by the HCP centers.


Presenters
avatar for David Morais

David Morais

Bioinformatics specialist, Compute Canada


5:15pm

A Generic Circos Galaxy Tool

Slides    doi: 10.7490/f1000research.1112755.1

Circos is a biologist favourite tool for production quality plots, however there is an extremely large activation energy in building the initial plots due to Circos' steep learning curve. We have worked to developing a generic and easily configurable Galaxy tool permitting the generation of Circos plots, while providing the generated configuration files in order to allow further tweaking and customization after the fact. We have made the tool publicly available during development and have already received contributions during the GCC2016 Hackathon. 


Moderators
avatar for Eric Rasche

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology



5:22pm

Integrating workflow support for GenomeSpace into Galaxy

Slides    doi:10.7490/f1000research.1112756.1

Integrating workflow support for GenomeSpace into Galaxy. 

The GenomeSpace importer/exporter itself has been rewritten as a standalone pip installable tool, available here: https://github.com/gvlproject/python-genomespaceclient. We hope to transfer that code back back into GenomeSpace or Galaxy as a set of Python bindings + commandline client for GenomeSpace.

There's a 3 minute video of how things work here:
https://www.youtube.com/watch?v=5QPtWS_ab0I

Seehttps://github.com/galaxyproject/galaxy/pull/1814

 


Instructors
avatar for Nuwan Goonasekera

Nuwan Goonasekera

VLSCI / University of Melbourne



5:29pm

Tool Framework Developments

Slides

This talk is aimed at Galaxy tool developers and will serve as an
overview of the largest and most relevant Galaxy tool development
framework changes over the past year.


Speakers
avatar for John Chilton

John Chilton

Galaxy Project, Penn State University



5:36pm

The Monarch Initiative and Phenopackets Wrapped

Slides    doi: 10.7490/f1000research.1112757.1

Monarch (https://monarchinitiative.org) integrates a variety of genomic, phenotypic, and disease data by leveraging ontologies to create relationships across multiple organisms.   
We have (quickly using planemo!) created a galaxy tool to wrap the web services exposed by monarch, including the phenopacket implementation.   

Please let us know how we can improve on this first cut. Looking forward to getting some feedback from you


Moderators
SL

Suzanna Lewis

Lawrence Berkeley National Laboratory

Presenters
avatar for Nathan Dunn

Nathan Dunn

Lead Software Engineer, Lawrence Berkeley National Laboratory
I primarily work on the Apollo project, a web-based genome editor used for real-time collaborative manual curation, AKA Google Docs for genome editing. Apollo is built using JBrowse as our genomic viewer. http://genomearchitect.org https://github.com/GMOD/Apollo We use Grails + GWT + Angular (in addition to the JBrowse stack). I'm a scientific programmer having worked in a variety of domains including biology, psychology, automated speech... Read More →


5:43pm

5:50pm

7:00pm

Conference Dinner
The confernece dinner is included in your conference registration.