Galaxy Community Conference 2016 (GCC16): Full Schedule

Visit the main conference website: http://galaxyproject.org/gcc2016

8:00am EDT

Conference desk open

Tuesday June 28, 2016 8:00am - 6:00pm EDT
IMU Alumni Hall

A. Help E.x Conference - All

9:00am EDT

Opening and Welcome

Opening comments and welcoime to the 2016 Galaxy Community Conference.

Speakers

Robert Ping

Program Management Specialist, Indiana University

In 2024, he accepted program management responsibilities for Jetstream2 and the Midwest Research Computing and Data Consortium where he will facilitate the success of the National Science Foundation-sponsored programs.As RDA-US Program Manager, he oversees the multiple projects within... Read More →

Tuesday June 28, 2016 9:00am - 9:15am EDT
IMU Alumni Hall

E.x Conference - All

9:00am EDT

Session 1

Session 1 features the Keynote Address by Yoav Gilad of the University of Chicago, and two accepted talks from the Galaxy Community.

Moderators

Robert Ping

Program Management Specialist, Indiana University

Tuesday June 28, 2016 9:00am - 10:40am EDT
IMU Alumni Hall

E.x Conference - All

9:15am EDT

Keynote: Genomic variation. Impact of regulatory variation from RNA to protein

→ Slides doi:10.7490/f1000research.1112708.1

Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation. Overall, about ~65% of eQTLs have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.

Speakers

Yoav Gilad

University of Chicago

The keynote speaker will be Dr. Yoav Gilad, a professor of human genetics at the University of Chicago. Dr. Gilad earned a PhD in molecular genetics from the Weizmann Institute of Science in Israel, and completed an EMBO postdoctoral fellowship training at Yale University

S1 T1 Galaxy 2016 pdf

Tuesday June 28, 2016 9:15am - 10:00am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

10:00am EDT

Proteogenomics in Galaxy: Identifying novel ‘constellations’ of proteoforms using transcriptomic and proteomic data.

→ Slides doi:10.7490/f1000research.1112709.1
→ Video

Authors:
Pratik Jagtap, University of Minnesota
Getiria Onsongo, University of Minnesota
Candace Guerrero, University of Minnesota
James Johnson, University of Minnesota
Thomas McGowan, University of Minnesota
Matthew Andrews, University of Minnesota-Duluth
Timothy Griffin, University of Minnesota

Abstract
Proteogenomics has emerged as an effective approach for identifying novel proteoforms and improve genome annotation. For example, matching mass spectrometry proteomic data to customized sample-specific RNASeq-derived databases facilitates identification of previously unidentified peptides. Proteogenomic identification of such peptides, however, requires greater scrutiny to qualify them as bonafide novel proteoform candidates.

In order to address these challenges we have developed a blueprint of modular galaxy workflows (doi: 10.1021/pr500812t). These include a) database generation from RNASeq (doi: 10.1186/1471-2164-15-703) or cDNA datasets; b) database search strategies that improve sensitivity of peptide spectral matches (doi: 10.1002/pmic.201200352); c) Filtering tools for quality control and d) modules for visualization and interpretation of results.

These Galaxy workflows were used in several studies to provide biological insights. In a fractionated human salivary dataset, we identified multiple, novel peptides that mapped to the basic proline-rich proteins (PRB1 and PRB2) located on chromosome 12. In a quantitative study of heart muscle (doi: 10.1021/acs.jproteome.5b00575) and skeletal muscle protein expression (doi: 10.1021/acs.jproteome.5b01138) during hibernation in 13-lined ground squirrel, researchers were able to identify peptides corresponding to previously uncharacterized proteins. Identification of these peptides allowed for improved genomic annotation of this non-model organism and provides insights into muscle physiology during hibernation.

We will present recent improvements by Galaxy-P team to the above described blueprint workflows. This includes development of Multi-Omics Visualization Platform (MVP) Galaxy plugin that facilitates viewing novel peptide sequences in the context of reference genome sequences and RNASeq data - enabling interpretation and hypothesis generation for testing to understand biological significance.

Speakers

Pratik Jagtap

Research Assistant Professor, University of Minnesota

Metaproteomics . DIA . Proteogenomics

S1 T2 Bloomington GCC 2016 pdf

Tuesday June 28, 2016 10:00am - 10:20am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

10:20am EDT

An Interactive Tool for Reproducible Analysis of Affinity Proteomics Data

→ Slides doi:10.7490/f1000research.1112710.1
→ Video

Authors:
Brent. M. Kuenzi (1), Adam Borne (2), Jiannong Li (3), Eric B. Haura (2), John Koomen (4), Paul A. Stewart (2), Uwe Rix (1)

Departments of (1) Drug Discovery, (2) Thoracic Oncology, (3) Biostatistics Core Facility, (4) Molecular Oncology, Moffitt Cancer Center, Tampa, FL 33612

Abstract
Understanding protein interactions and how they are altered in cancer is crucial for identifying new drug targets. Purification methods such as tandem affinity purification, affinity enrichment of labeled baits, and drug affinity chromatography have all been combined with mass spectrometry (affinity purification MS or AP-MS) to study protein interactions and complexes in cancer. However, if the scientist (e.g. a bench biologist or analytical chemist) lacks a computational background, then managing large proteomics datasets can be challenging, manually formatting data for input into analysis software can be error-prone, and data visualization involving dozens of variables can be laborious. These difficulties presented an opportunity to develop a solution that could move data from unprocessed AP-MS results to publication-quality figures in a single workflow. Here, we present Automated Processing of SAINT Templated Layouts (APOSTL), a Galaxy-based analysis pipeline for reproducible analysis of AP-MS data, and we demonstrate that this application streamlines the AP-MS data analysis workflow, improving both efficiency and consistency of the process. APOSTL utilizes Significance Analysis of INTeractome (SAINT), popular command-line software for analyzing AP-MS data. APOSTL can process AP-MS results from both MaxQuant and Scaffold, two widely used proteomics software, and APOSTL can create a number of publication-quality visualizations including interactive bubble plots, protein-protein interaction networks through Cytoscape.js integration, and pathway enrichment/gene ontology plots. All visualizations are accomplished through Shiny, an interactive and open-source visualization package for the R programming language. APOSTL is open-source software released under GPLv3, and it is freely available on the Galaxy Tool Shed and GitHub.

Speakers

Paul A. Stewart

Moffitt Cancer Center

S1 T3 stewart gcc presentation 06 27 16 pdf

Tuesday June 28, 2016 10:20am - 10:40am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

11:10am EDT

Coevolution of biological research and cyberinfrastructure from now till the end of Moore’s Law

→ Slides doi:10.7490/f1000research.1112711.1
→ Video

Speakers

Craig A. Stewart

Indiana University

S2 T1 Stewart CA Jetstream Galaxy 2016 jun 28R1 pdf

Tuesday June 28, 2016 11:10am - 11:40am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

11:10am EDT

Session 2

Session 2 features a talk from IU about what's happening here to support data-intensive science (and how Galaxy fits into that), plus 3 accepted talks.

Moderators

Tea Muelia

The Ohio State University

Tuesday June 28, 2016 11:10am - 12:40pm EDT
IMU Alumni Hall

E.x Conference - All

11:40am EDT

Sample Size Does Matter: Scaling Up Analysis in Galaxy with Metagenomics

→ Slides doi:10.7490/f1000research.1112712.1
→ Video

Authors

Daniel Blankenberg, Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA,
Sarah J. Carnahan-Craig, Department of Biology, Penn State University, University Park, PA

Abstract
Metagenomics provides an exciting opportunity to begin to explore large-scale multiple sample analysis with Galaxy. As part of an obesity study, we have obtained over 400 buccal and stool samples from mother-child pairs. These samples have been subjected to 16S RNA extraction and sequencing on a MiSeq instrument. While sequencing 400 samples is no small feat, once generated, the data analysis reveals itself as crippling bottleneck.

Galaxy provides researchers with a vast quantity of tools and methods to analyze a wide-array of data, and makes connecting any number of tools together easy via Workflows. Although running a workflow individually over a handful of samples is approachable, how does one deal with 10, 20, or even 100 samples without becoming frustrated, introducing errors, breaking their mouse, or falling back to writing an API script? While Dataset Collection functionality provides a significant portion of a solution to this problem, there are still major hurdles that need to be overcome before Galaxy is usable for large multiple sample analysis.

Here we describe a generalizable metagenomic pipeline as implemented within Galaxy that is able to handle the simultaneous analysis of over 5,000 Human Microbiome Project samples. In addition to integrating a number of third-party algorithms and toolsets, some requiring the creation of upstream fixes and enhancements, we have developed new tools and approaches for dealing with large collections of data. Furthermore, we discuss the problems encountered using Galaxy at a large-scale, what has been done to overcome these issues, as well as initial results.

Speakers

Daniel Blankenberg

Assistant Professor, Genomic Medicine Institute, Cleveland Clinic Lerner Research Institute

blankenberg gcc 2016 scaling up with metagenomics pdf

Tuesday June 28, 2016 11:40am - 12:00pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

12:00pm EDT

FROGS: Find Rapidly OTU with Galaxy Solution

→ Slides doi:10.7490/f1000research.1112713.1
→ Video

Authors

Frederic ESCUDIE, INRA Toulouse
Lucas AUER, INRA Toulouse
Maria BERNARD, INRA Jouy-en-Josas
Laurent CAUQUIL, INRA Toulouse
Katia VIDAL, INRA Toulouse
Sarah MAMAN, INRA Toulouse
Mahendra MARIADASSOU, INRA Jouy-en-Josas
Guillermina HERNANDEZ-RAQUET, INRA Toulouse
Geraldine PASCAL, INRA Toulouse

Abstract
High-throughput sequencing of 16S/18S/23S RNA amplicons has opened new horizons in the study of microbe communities. With the sequencing at great depth the current processing pipelines struggle to run rapidly and the most effective solutions are often designed for specialists. These tools are designed to give both the abundance table of operational taxonomic units (OTUs) and their taxonomic affiliation. In this context we developed the pipeline FROGS: « Find Rapidly OTU with Galaxy Solution ». Developed for biologists on the Galaxy platform.

A preprocessing tool merges paired sequences into contigs with flash, cleans the data with cutadapt, deletes the chimeras with VSEARCH combined with a cross-validation method and dereplicates sequences with a home-made python script. The clusterisation tool runs with SWARM that uses a local clustering threshold, not a global clustering threshold like other software do. The affiliation tool returns taxonomic affiliation for each OTU using both RDPClassifier and NCBIBlast+ on different databases (Silva, Greengenes). And finally, the post processing tool allows users to process this table with the user-specified filters and provides statistical results and numerous graphical illustrations of these data.

FROGS has been developed to be very fast even on large amounts of 454/HiSeq/MiSeq data in using cutting-edge tools and an optimized design, also it is portable on all Galaxy platforms. FROGS was tested on numerous simulated datasets. The tool has been extremely rapid, robust and highly sensitive for the OTU detection with very few false positives compared to other pipelines widely used by the community.

Speakers

Yvan Le Bras

Research engineer, French National Museum of Natural History

S2 T3 FROGS GCC 2016 pdf

Tuesday June 28, 2016 12:00pm - 12:20pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

12:20pm EDT

Ktoolu and idFusion - Galaxy Solutions for Plant Immunity and Pathogen Informatics

→ Slides doi:10.7490/f1000research.1112714.1
→ Video

Authors

Christian Schudoma, The Genome Analysis Centre, The Sainsbury Laboratory, Norwich UK
Yogesh Gupta, The Sainsbury Laboratory, Norwich, UK
Pirasteh Pahlavan, Leibniz-Institut DSMZ, Braunschweig, University of Würzburg, Germany
Agathe Jouet, , The Sainsbury Laboratory, Norwich, UK
Dan MacLean, The Sainsbury Laboratory, Norwich, UK
Ksenia Krasileva, The Genome Analysis Centre, The Sainsbury Laboratory, Norwich UK

Abstract
Background
The analysis of plant immunity and plant-pathogen interactions are major topics in plant disease research.

Plant immunity is conferred by so-called nucleotide-binding leucine-rich repeat (NLR) proteins. A specific group of these proteins is fused to additional (integrated) domains that can recognise pathogen effector molecules. In a recent study, 41 plant genomes were computationally screened for such NLR-ID proteins.

Analysis of the interactions between plant-pathogens and their host can provide insight into both the pathogen’s effector proteins, i.e. its attack mechanisms, and the plant’s defense mechanisms. These interactions can be investigated by tailored metagenomics approaches.

Results
We present Galaxy tools/pipelines for screening of plant NLR-ID proteins (idFusion) and for complementing metagenomics analysis of plant-pathogen interactions (Ktoolu - Kraken tools and utilities). idFusion is a Galaxy implementation of the NLR-ID screening pipeline described in (Sarris et al, BMC Biology 2016). Ktoolu is a collection of tools and their Galaxy wrappers that allow to dissect sequencing datasets using the taxonomy information assigned by the Kraken metagenomics classifier as well as to visualise the results utilising the Krona tools.

Speakers

Christian Schudoma

The Genome Analysis Centre (TGAC)

S2 T4 GCC2016 talk cschu pdf

Tuesday June 28, 2016 12:20pm - 12:40pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

12:40pm EDT

Arts & Crafts

GCC sure can be overwhelming sometimes! This is a quiet place to do some stress free, science related, arts and crafts.

Moderators

Saskia Hiltemann

Erasmus Medical Center

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University

Tuesday June 28, 2016 12:40pm - 1:40pm EDT
IMU Alumni Hall

B. Networking & Sustenance E.6 Birds-of-a-feather E.x Conference - All

1:40pm EDT

Galaxy Community Update

→ Slides doi:10.7490/f1000research.1112715.1
→ Video

Speakers

Jeremy Goecks

Galaxy Project, Oregon Health and Science University

Anton Nekrutenko

Penn State University

James Taylor

Professor, Johns Hopkins University

James Taylor is the Ralph S. O'Connor Professor of Biology and professor of computer science at Johns Hopkins University. Until 2014, he was an associate professor in the departments of biology and mathematics and computer science at Emory University. He is one of the original developers... Read More →

S3 T1 GCC 2016 Galaxy update pdf

Tuesday June 28, 2016 1:40pm - 2:15pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

1:40pm EDT

Session 3

Galaxy Community Update, two accepted talks, and a sponsor talk

Moderators

Chris Hemmerich

Bioinformatician, Indiana University

Indiana University

Tuesday June 28, 2016 1:40pm - 3:10pm EDT
IMU Alumni Hall

E.x Conference - All

2:15pm EDT

GSuite Tools: integrative genomic analyses across cells and epigenetic factors using Galaxy

→ Slides doi:10.7490/f1000research.1112716.1
→ Video

Authors
Boris Simovski, University of Oslo
Sveinung Gundersen, University of Oslo
Daniel Vodák, Oslo University Hospital
Abdulrahman Azab, University of Oslo
Diana Domanska, University of Oslo
Eivind Hovig, University of Oslo
Geir Kjetil Sandve, University of Oslo

Abstract
Genomic investigations increasingly involve multiple genome-scale datasets (genomic tracks) representing diverse cell types and epigenetic factors. This raises the need for software tools that allow efficient management and appropriate statistical analysis of such data collections. The dataset lists in Galaxy represents a major advancement in this direction, allowing efficient management and per-dataset analysis of large numbers of datasets within a web-based system. A natural next step is to consider integrative analyses of collections of datasets. An example is to comparatively assess the co-occurrence of a dataset of disease-associated SNPs against a large number of datasets representing chromatin accessibility in diverse cell types. In addition to managing a large number of datasets (of chromatin accessibility), such an analysis requires efficient means of locating and compiling a collection of relevant datasets, as well as the provision of appropriate statistical measures.

We have developed GSuite Tools, a Galaxy-based analysis framework that offers a broad range of novel tools for managing and performing statistical analysis on collections of datasets (genomic tracks). The tools operate on GSuite files - an alternative to the standard Galaxy dataset lists that provides additional robustness and flexibility as needed for this range of tools. A prototype of GSuite Tools was presented at GCC2015. After a year of further focused work, GSuite Tools now has much broader capabilities, is based on a deeper statistical treatment of the problems, and is ready for practical use by the community. It is publicly available at: http://hyperbrowser.uio.no/gsuite

Speakers

Boris Simovski

University of Oslo

S3 T2 GCC2016 GSuite HyperBrowser pdf

Tuesday June 28, 2016 2:15pm - 2:35pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

2:35pm EDT

The Galaxy workflow for epigenetic profiling of progressing melanoma

→ Slides
→ Video

Authors
Katarzyna Murat 1 and Krzysztof Poterlowicz 2

1. Faculty of Mathematics, Physics and Informatics, University of Gdansk
2. Centre for Skin Sciences, University of Bradford

Abstract
Recent studies have found that distinct poor-prognosis tumours lack genetic alterations but are epigenetically heterogeneous, pointing to the important role that multi-domain epigenetic regulation in cancer progression.
Researchers studying epigenetic regulation generated a vast amount of high-throughput sequencing data for processes such as DNA methylation, histone modifications and chromatin remodelers activity and transcriptomic profiling of non-coding DNA.

Although galaxy offers range of standalone tools that allow to investigate next generation sequencing data (i.e. Bismark, MACS, SICER, edgeR ), there is a lack of multi-layers epigenetic workflows characterizing tumor progression regulation.

Here we extend use of these software to investigate epigenetic profiling of the progressive melanoma which recognized early is almost always curable. Otherwise it spreads very quickly to other parts of the body making it one of the most deadliest cancer.

By integrating DNA methylation profiles, ChIP-Seq profiles for H3K27me3, H3K4me3, MITF and BRG1 for normal melanocytes and different stages of melanoma we were able to identify novel epigenetic switches responsible for metastatic progression of this tumor.
Results and experiences using this framework demonstrate the potential for Galaxy to be a bioinformatics solution for multi-omics cancer biomarker discovery tool.

Speakers

Katarzyna Murat

University of Gdansk

Krzysztof Poterlowicz

University of Bradford

S3 T3 GCC2016 pdf

Tuesday June 28, 2016 2:35pm - 2:55pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

2:55pm EDT

Infinite Galaxy!

→ Slides
→ Video

Authors
Brian Finley, Principal Architect for Big Data Solutions at Lenovo

Abstract
The efficacy, benefit, and potential impact of running Galaxy on Spark. Ever expanding possibilities by leveraging the paradigm of Big Data.

Speakers

Brian Finley

Principal Architect for Big Data Solutions, Lenovo

Brian Finley is the Principal Architect for Big Data Solutions at Lenovo. Mr. Finley is an Open Group certified Distinguished IT Specialist and holds a number of other technical certifications, writes articles for industry publications, is the creator of SystemImager (popular Linux mass-deployment software), is an xCAT (cluster management software) deve... Read More →

S4 T4 Infinite Galaxy! Brian Finley, Lenovo Galaxy Conference 2016 v2016.06.28 4 pdf

Tuesday June 28, 2016 2:55pm - 3:10pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

4:25pm EDT

The LAPPS Grid and Galaxy

→ Slides doi:10.7490/f1000research.1112586.1
→ Video

Authors
Nancy Ide, Keith Suderman, James Pustejovsky, Marc Verhagen, Eric Nyberg, Chris Cieri

Abstract
The NSF/SI2-funded Language Applications (LAPPS) Grid project (http://www.lappsgrid.org) is a collaborative effort among Brandeis University, Vassar College, Carnegie-Mellon University (CMU), and the Linguistic Data Consortium (LDC) at the University of Pennsylvania, which has developed an open, web service-based infrastructure through which massive and distributed language resources can be accessed, and tailored language services can be composed, evaluated, disseminated and consumed by researchers, developers, and students.

We recently adopted Galaxy as the primary workflow management system for the LAPPS Grid. We have worked with the Galaxy development team to adapt the system to our domain and continue this collaboration to enhance the capabilities we require and contribute to the expansion of Galaxy to domains outside the life sciences.

We have contributed a “Galaxy Flavor" including all LAPPS Grid services and resources, and have developed or are developing the following capabilities for use in Galaxy : (1) exploitation of our web service metadata to automatically detect input/output requirements and invoke converters where necessary; (2) incorporation of authentication procedures for protected data using OAuth; and (3) addition of a visualization plugin for linguistic analyses.

An additional outcome of the LAPPS/Galaxy collaboration is that it provides researchers in the life sciences with access to a wide array of NLP tools. So, for example, biologists will be able to take advantage of bio-oriented NLP web services to mine bio-entities and relations from textual sources such as PubMed, and via capabilities already present in Galaxy, integrate them into existing bio-data resources and analysis tools.

Speakers

Keith Suderman

Research Assistant, Vassar College

S4 T1 Lappsgrid gcc2016 pdf

Tuesday June 28, 2016 4:25pm - 4:45pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

4:25pm EDT

Session 4

Accdepted and Lightning Talks. The call for lightning taks will go out just before GCC2016 events start.

Moderators

Suzanna Lewis

Lawrence Berkeley National Laboratory

Tuesday June 28, 2016 4:25pm - 5:30pm EDT
IMU Alumni Hall

E.x Conference - All

4:45pm EDT

Lightning Talks

The call for Lightning Talks will go out shortly before GCC2016 events begin.

Tuesday June 28, 2016 4:45pm - 5:30pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

4:48pm EDT

SeqResults - Simple comparisons of results across libraries

→ Slides
→ Video

Authors

Brad Langhorst, New England Biolabs

Abstract
We have developed SeqResults to enable simple comparision of libraries across experiments. The Galaxy-integrated component captures metadata and results in a relational database. Results are available via a simple web site and Tableau visualizations. SeqResults has recently been extended with RNA-seq features. It aggregates simple metrics like fractions of reads on exons, introns and other genomic regions, average 5'-3' coverage and alignment efficiency. However, summary metrics are only part of the story. Accurate representation of transcript levels is important to any RNA-seq experiment. We present a simple interface to compare transcript levels as well as 5'-3' coverage profiles of individual transcripts across experiments. SeqResults now contains millions of individual results from 6841 libraries produced during development of NEBNext library preparation reagents.

Presenters

Brad Langhorst

Developmenet Group Leader, NEB

L1 T1 slides seqresults gcc2016 zip

Tuesday June 28, 2016 4:48pm - 4:55pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

4:55pm EDT

Common Workflow Language v1.0 & How It Will Affect You

→ Slides doi:10.7490/f1000research.1112726.1

Author
Michael R. Crusoe, Common Workflow Language Project

Abstract
Version 1.0 of the CWL standards are coming soon. This talk will review what has changed in the last year and how the CWL benefits the Galaxy community. Talk will include a side-by-side demonstration of a popular Galaxy workflow and its CWL incarnation.

Presenters

Michael R. Crusoe

Project Lead & Co-founder, Common Workflow Language project

Michael R. Crusoe is one of the co-founders of the CWL project and is the CWL Project Lead. His facilitation, technical contributions, and training on behalf of the project draw from his time as the former lead developer of C. Titus Brown's k-h-mer project, his previous career as... Read More →

L1 T2 2016 06 28 Galaxy Community Conference Lightning Talk pdf

Tuesday June 28, 2016 4:55pm - 5:02pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:02pm EDT

A resource for metabolomics and transcriptomics analysis

→ Slides doi:10.7490/f1000research.1112727.1

Authors
Manhoi Hur, Iowa State University
Jason R. Miller, J Craig Venter Institute
Christopher D. Town, J Craig Venter Institute
Erik Ferlanti, J Craig Venter Institute
Irina Belyaeva, J Craig Venter Institute
Eve Syrkin Wurtele, Iowa State University

Abstract
PMR (Plant/Eukaryotic and Microbial Systems Resource) and its database are a community resource for deposition and analysis of metabolomics data and related transcriptomics data. PMR currently houses terabytes of data and metadata from over 25 species of eukaryotes, and provides a unique resource for computational modeling and hypothesis development. PMR’s web APIs enables PMR data and analytic functions to integrate with other community resources. In this talk, we introduce the PMR database and illustrate its analytic tools. We present a proof-of-concept for the utility of the API as a research science app using Araport to provide Arabidopsis metabolomics data and its functionality to diverse users.

Presenters

Manhoi Hur

L1 T3 GCC2016 PMR and its APIs final manhoi hur v4 pdf

Tuesday June 28, 2016 5:02pm - 5:09pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:09pm EDT

Science Gateways Community Institute

→ Slides doi:10.7490/f1000research.1112593.1
→ Video

Authors
Maytal Dahan, University of Texas at Austin
Sandra Gesing, University of Notre Dame
Linda B. Hayden, Elizabeth City State University
Katherine Lawrence, University of Michigan
Marlon E. Pierce, Indiana University
Nancy Wilkins-Diehr, The University of California, San Diego
Michael Zentner, Purdue University

Abstract
Science gateways, also known as web portals, virtual research environments, virtual laboratories, are a fundamental part of today’s research landscape. But they can be difficult to develop in a sustainable fashion. This talk will provide an overview of the newly funded NSF Science Gateways Community Institute, which aims to address these challenges by offering services to and building community among the research communities developing gateways. The institute is comprised of five areas to support gateways throughout their lifecycle:

Incubator will provide shared expertise in business and sustainability planning, cybersecurity, user interface design, and software engineering practices.
Extended Developer Support will provide expert developers for up to one year to projects that request assistance and demonstrate the potential to achieve the most significant impacts on their research communities.
Scientific Software Collaborative will offer a component-based, open-source, extensible framework for gateway design, integration, and services, including gateway hosting and capabilities for external developers to integrate their software into Institute offerings.
Community Engagement and Exchange will provide a forum for communication and shared experiences among gateway developers, user communities, within NSF, across federal agencies, and internationally.
Workforce Development will increase the pipeline of gateway developers with training programs, including special emphasis on recruiting underrepresented minorities, and by helping universities form gateway support groups.

We envision close collaborations with gateway providers such as the Galaxy developer group to provide best practices for developers and use cases of real-world gateways to improve the experience and efficiency of developers and user communities.

Presenters

Nancy Wilkins-Diehr

Associate Director, San Diego Supercomputer Center

Science gateways and running

L1 T4 SGCI Galaxy 06 29 16 pdf

Tuesday June 28, 2016 5:09pm - 5:16pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:16pm EDT

Galaxy at the Pittsburgh Supercomputing Center

→ Slides doi:10.7490/f1000research.1112729.1
→ Video

Authors
Alexander J. Ropelewski, Pittsburgh Supercomputing Center
Philip D. Blood, Pittsburgh Supercomputing Center
Robert Light, Pittsburgh Supercomputing Center

Abstract
The Pittsburgh Supercomputing Center's (PSC) new computational system Bridges, funded by the National Science Foundation (NSF), is available to U.S. academic researchers through NSF's XSEDE program. Bridges is a unique system that consists of a variety of specialized nodes including: compute nodes, GPU nodes, database nodes, webserver nodes and data transfer nodes. A unique feature of Bridges is that the compute nodes are tiered in terms of memory, containing either 128GB, 3TB, or 12TB of hardware-supported shared memory, which makes the system ideal for Galaxy workflows involving Next Generation Sequencing data.

In this talk we will discuss the history of Galaxy at the PSC and describe various Galaxy usage scenarios for Bridges. These scenarios include (1) a shared galaxy instance for users with XSEDE allocations, (2) private "virtualized" instances of Galaxy, and (3) back-end computational support for remote Galaxy instances. We will also discuss the system that we developed to authenticate and charge usage against specific user-selected projects.

Presenters

Alex Ropelewski

Director, Biomedical Applications Group, Pittsburgh Supercomputing Center

L1 T5 Ropelewski GCC 2016 Talk pdf

Tuesday June 28, 2016 5:16pm - 5:23pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:23pm EDT

65 millions of observers

→ Slides doi:10.7490/f1000research.1112730.1
→ Video

Collecting and analysing information from increasingly diverse origins is needed to understand complex systems life scientists are studying. This means in particular to mobilize a large number of human and technical resources for acquisition and analysis of data. Regarding human resources, it seems appropriate to involve citizens in research projects. In the meantime, it's clear that the relationship between science and citizens are degraded and for example, it is very difficult for a citizen to have access to the results of research projects and even more to participate to them. Citizen science approaches can be a good way to face these issues, but until now, a majority of citizen science projects are considering citizens only for data production. The "65 millions d'observateurs" project is an interesting French initiative who wants to test involving citizens to others part of the research lifecycle. Can this be the beginning of a Galaxy-E, for Ecology?

Presenters

Yvan Le Bras

Research engineer, French National Museum of Natural History

L1 T6 65MO lightning talk pdf

Tuesday June 28, 2016 5:23pm - 5:30pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

8:00am EDT

Conference desk open

Wednesday June 29, 2016 8:00am - 6:00pm EDT
IMU Alumni Hall

A. Help E.x Conference - All

9:00am EDT

Welcome Day 2

Wednesday June 29, 2016 9:00am - 9:10am EDT
IMU Alumni Hall

E.x Conference - All

9:00am EDT

Session 5

Session 5 features four accepted talks from the Galaxy Community.

Moderators

Scott Michaels

Indiana University

Wednesday June 29, 2016 9:00am - 10:30am EDT
IMU Alumni Hall

E.x Conference - All

9:10am EDT

Enhancements to Galaxy for delivering on NIH Commons

→ Slides doi:10.7490/f1000research.1112588.1
→ Video

Author
Ravi K Madduri, University of Chicago,

Abstract
The Big Data for Discovery Science (BDDS) Center is one on the NIH BD2K centers. In BDDS, we are building tools to move, share, analyze, discover and publish big biomedical data. These tools constitute the BDDS platform. We are leveraging the platform to enable data-driven discovery across our center and are also working with other BD2K centers, both directly and through the Commons initiative, to build standard interfaces for various data management activities. Galaxy is an integral part of our platform and we are enhancing Galaxy to support working with Digital object identifiers (DoIs), analyze data at scale using identified docker containers, publish results in to Globus Publication services thus providing an end-to-end framework for reproducible research in support of the NIH Commons vision

Speakers

Ravi K. Madduri

Computation Institute, University of Chicago, and Argonne National Laboratory

S5 T1 GCC Madduri June 2016 pdf

Wednesday June 29, 2016 9:10am - 9:30am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

9:30am EDT

Moving data from the warehouse to the workbench: a bridge to Galaxy from the Tripal community genome database software platform

→ Slides doi: 10.7490/f1000research.1112734.1
→ Video

Authors
Margaret Staton1, Ming Chen1, Nathan Henry1, Emily Grau2, Connor Wytko3, Brian Soto3, Sook Jung3, Kuangching Wang4, Nick Watts5, Chun-huai Cheng3, Lacey A. Sanderson6, Jill Wegrzyn2, Doreen Main3, F. Alex Feltus7, Stephen P. Ficklin3

University of Tennessee Institute of Agriculture Department of Entomology and Plant Pathology, Knoxville, TN 37996, USA
University of Connecticut Department of Ecology and Evolutionary Biology, Storrs, CT 06269 USA
Washington State University Department of Horticulture, Pullman, WA 99164 USA
Clemson University Department of Electrical & Computer Engineering, Clemson, SC 29634 USA
Clemson University, Clemson Computing and Information Technology, Anderson, SC 29625 USA
University of Saskatchewan, Department of Plant Sciences, Saskatoon, Saskatchewan, SK S7N Canada
Clemson University Department of Genetics & Biochemistry, Clemson, SC 29634 USA

Abstract
Online community genome databases offer curated and mission-specific data and information to scientists with shared basic and applied research goals. In an effort to share a common code base, standardize storage formats, and simplify site construction, a coalition of genome databases have developed the software Tripal. Tripal is an open-source platform that bridges Drupal, a popular content management system (CMS), and Chado, a standardized relational database for storage of biological data. There is a need for users of community databases to not only discover, visualize and download genomic information but to directly port it to analysis workflow software such as the Galaxy platform. Through development of the new Tripal Galaxy module, site visitors will be able to select custom datasets from within and across Tripal databases and import those directly to a Galaxy instance from within a Tripal-based site. Additionally, a set of pre-designed workflows for common analyses needed by users of community databases will be made publicly available, including functional annotation of gene sequences, genomic variant discovery and genotype/phenotype association. Current efforts are focused on enabling authenticated users to move data from within a Tripal community database to the Tripal community Galaxy instance or a public Galaxy instance, creation of PHP bindings for the Galaxy API, and establishment of the most commonly needed analysis workflows for database users.

Speakers

Margaret Staton

University of Tennessee Knoxville

My lab works on genome databases, web applications, cyberinfrastructure, and RNASeq data. Our main website is hardwoodgenomics.org.

S5 T2 Galaxy Staton pdf

Wednesday June 29, 2016 9:30am - 9:50am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

9:50am EDT

Apollo: Collaborative Manual Annotation for Genomic Sequencing Projects

→ Slides doi:10.7490/f1000research.1112336.1
→ Video

Authors

Nathan Dunn, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
Monica Muñoz-Torres, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory
Colin Diesh, University of Missouri
Deepak Unni, University of Missouri
Eric Rasche, Department of Biochemistry and Biophysics, Texas A&M University
Eric Yao, University of California Berkeley
Ian Holmes, University of California Berkeley
Chris Elsik, University of Missouri
Suzie Lewis, Berkeley Bioinformatics Open-source Projects Lawrence Berkeley National Laboratory

Abstract
Manual annotation is a crucial step in the annotation portion of a genome sequencing project. It enables curators to improve automated gene predictions by visually comparing a variety of experimental evidence tracks from different sources to more accurately represent the underlying biology.

Apollo is a web-based genome annotation editor that allows curators to manually revise and edit genomic elements. It provides a reporting structure for annotated genomic elements and an ‘Annotator Panel’ that allows users to quickly browse the genome and its annotations. Users can manually edit the structure of a genomic element as well as add metadata, including references to other databases and functional assignments with specific lookup support for Gene Ontology (GO) terms.

Apollo is currently being used in over one hundred genome annotation projects around the world, ranging from annotation of a single species to lineage-specific efforts supporting annotation for dozens of organisms at a time. Collaborators are able to visualize each others changes in real time (similar to Google Docs), restrict access to annotations depending on the role of users and groups within the community, and share tracks of evidence data with the public. Finally, users are able to export their manual annotations via FASTA, GFF3, the Chado database schema, and web services. Lastly, Apollo is available for integration with Galaxy via Docker, allowing users to run genome analyses sequencing using the Galaxy platform.

Apollo is an Open-Source project. Further details and code are available at http://genomearchitect.org/.

Speakers

Nathan Dunn

Software Developer, Lawrence Berkeley National Lab

S5 T3 NathanGalaxy2016Talk pdf

Wednesday June 29, 2016 9:50am - 10:10am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

10:10am EDT

Accurate and Complete Gene Construction with EvidentialGene Pipeline

→ Slides doi:10.7490/f1000research.1112467.1
→ Video

Author
Don Gilbert, Indiana University

Abstract
Precision genomics is essential in medicine, environmental health, sustainable agriculture, and biological research. Yet popular genome informatics methods lag behind the high levels of accuracy and completeness in gene construction that are attainable with current RNA-seq data.

EvidentialGene is a genome informatics pipeline for gene construction that has a measurably high accuracy and completeness rate for animals and plants, from insects, ticks and crustaceans to crop plants and trees, to fishes and other vertebrates. It uses big data from gene sequencers, generating bigger gene sets than alternate methods, then reduces those with biological criteria of protein codes and orthology into accurate species gene sets. EvidentialGene is in production use at compute centers in USA, Sweden, Australia and elsewhere.

The software pair of MAKER and Trinity form a common recipe now in gene discovery publications, but greater accuracy is possible and easy to obtain. Recent examples with disease vector mosquitoes Aedes (yellow fever, Zika virus) and Anopheles (malaria), show EvidentialGene surpasses accuracy of published genes from MAKER, Trinity and Vectorbase. For fishes, Evigene surpasses those recently published from MAKER, Trinity and NCBI Eukaryote genome annotation pipelines.

Galaxy installations that provide genome and transcriptome services will benefit by adding EvidentialGene. This author challenges Galaxy centers with MAKER, Trinity or other gene construction pipelines to reach comparable accuracy and completeness of EvidentialGene, and will collaborate on such with select genomics projects.

Speakers

Don Gilbert

Indiana University

S5 T4 evigenegalx1606iu pdf

Wednesday June 29, 2016 10:10am - 10:30am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

11:00am EDT

Galaxy security practices in an age of clinical data for point of care services

→ Slides doi: 10.7490/f1000research.1112735.1
→ Video

Authors
Carrie Ganote, Indiana University

Abstract
One of the major users of bioinformatics pipelines is the medical field. This poses a challenge for system administrators and software developers who provide web-facing services - securing the client's data. Certain data sets in genomics can be considered sufficiently identifiable and thus qualify as electronic protected health information (ePHI), which is then further protected by HIPAA (Health Insurance Portability and Accountability Act).

This talk will be an outline of hurdles associated with making Galaxy robust in a clinical setting. Best practices leverage a two-tiered approach at both operating system and application layers. Initially, systems configuration will be explored including least privilege for service accounts and database users, encryption of files, and system access. Later, best uses of Galaxy will be highlighted as they apply to moving data, storage, and account policies, following a rigorous NIST-based cyber risk management framework.

Speakers

Carrie Ganote

Indiana University

S6 T1 carrietalk pdf

Wednesday June 29, 2016 11:00am - 11:20am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

11:00am EDT

Session 6

Session 6 features 5 accepted talks.

Moderators

Margaret Staton

University of Tennessee Knoxville

My lab works on genome databases, web applications, cyberinfrastructure, and RNASeq data. Our main website is hardwoodgenomics.org.

Wednesday June 29, 2016 11:00am - 12:40pm EDT
IMU Alumni Hall

E.x Conference - All

11:20am EDT

Increasing Beer Time: Decreasing the Galaxy System Administration Burden

→ Slides doi: 10.7490/f1000research.1112736.1
→ Video

Author
Nate Coraor, Penn State University

Abstract
Galaxy is a large application with many moving pieces and dependencies on numerous outside applications and libraries. As Galaxy is a Python application, some of these dependencies are Python modules. Other dependencies include a proxy (web) server, database server, and possibly a distributed resource manager (for cluster job submission). The task of installing and orchestrating the operation of these components can be difficult, in part due to Galaxy’s desire to support the wide variation in computing environments and policies at sites where Galaxy is installed.

In order to ease the burden for Galaxy administrators, we have made several improvements to Galaxy dependency handling and installation. Galaxy’s Python dependencies were tightly controlled by Galaxy and used an outdated format. Significant work was undertaken to modernize the handling of these dependencies while loosening control, in order to give more flexibility to administrators. Galaxy is now fully compatible with the standard Python packaging tool chain, including pip and wheel, and further, it can now be used with the Anaconda Python distribution.

Another point of administrator frustration is installing and updating the Galaxy code itself. Galaxy is currently distributed via git, but administrators often prefer system package managers. Building upon the dependency management changes, it is now possible to create Galaxy packages and install them in the same manner as more traditional system software packages. This also allows for tighter integration with system-level dependencies such as proxy and database servers.

Speakers

Nate Coraor

System Administrator, Galaxy Project, Penn State University

S6 T2 GCC2016 Coraor Increasing Beer Time (Final) pdf

Wednesday June 29, 2016 11:20am - 11:40am EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

11:40am EDT

The Intergalactic Utilities Commission - driving Galaxy tool development

→ Slides doi:10.7490/f1000research.1112466.1
→ Video

Authors
Marius van den Beek, Daniel Blankenberg, Dave Bouvier, John Chilton, Peter Cock, Nate Coraor, Björn Grüning, Youri Hoogstrate, James Johnson, Greg von Kuster, Eric Rasche and Nicola Soranzo

Abstract
Galaxy provides abstractions to make it easy to integrate tools, so virtually any tool that can be run from the command line can be integrated into Galaxy. The ability to seamlessly integrate tools into Galaxy spawned a large community of Galaxy tool developers, with the Galaxy Tool Shed as a distrubtion platform for installation into any Galaxy instance. This proliferation of tools resulted in the need for an oversight committee to set standards, define best practices, and vet tools for the Galaxy community. In 2012, the Intergalactic Utilities Commission (IUC) was founded as an organized body to provide these services, and has developed best-practice guidelines for tool development. These standards are a continual work-in-progress as new technologies are introduced into the Galaxy environment.

We will highlight IUC achievements over the past year, including enhanced reproducible installations via Starforge and cargo-port, new dependency resolution systems like Conda, and various enhancements to Galaxy tool syntax that enable more powerful and user-friendly tools. We’ll introduce new processes that have enhanced Galaxy tool development, testing and maintenance using Planemo and Conda, with details about how these applications can be used as complementary components to Galaxy and the Galaxy Tool Shed.

Important goals of the IUC are to continue to grow not only the community, but also the committee itself so that we can provide the friendly oversight benefits to every Galaxy Tool developer that is interested. This past year the IUC has welcomed 3 new members and organised 3 Codefests. We welcome others that have an interest in joining this committee and work with us.

Speakers

Bjorn Gruning

University of Freiburg

S6 T3 The Intergalactic Utilities Commission driving Galaxy tool development pdf

Wednesday June 29, 2016 11:40am - 12:00pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

12:00pm EDT

Planemo – A Scientific Workflow SDK

→ Slides
→ Video

Authors
John Chilton, Galaxy Project
Aysam Guerler, Galaxy Project
Galaxy Team, Galaxy Project

Abstract
A novel approach to building, refining, and running scientific workflows leveraging Galaxy through Planemo will be presented. The Galaxy workflow editor and workflow extraction interface are great tools enabling any Galaxy user to easily build workflows. However, tool authors using Planemo and sophisticated bioinformaticians may prefer driving workflow development through their existing tool chains such as programming text editors, command-line testing, and revision control. The approach presented leverages YAML-based workflow descriptions as plain files allowing exactly this.

The approach will be used as a lens to highlight these workflows formats (Format 2 Galaxy workflows and Common Workflow Language (CWL) workflows) as well as important highlights from the myriad of recent Galaxy workflow enhancements that have made them dramatically more usable, powerful, and performant.

Available today, Format 2 Galaxy workflows map directly to existing Galaxy tool and workflow concepts and are described in a very concise and readable YAML format. CWL specifications for tools and workflows are developed in an open fashion by many organizations with the aim of creating truly portable descriptions. The execution of CWL workflows in Galaxy is being actively worked on and progress will be discussed.

Underlying all of this is core Galaxy enhancements that will be demonstrated. The user interface for workflows has been overhauled and improved. Additionally, workflows now allow nesting, labels, non-data inputs, implicit connections between steps, and many new operations over collections - greatly increasing the expressive power of Galaxy workflows. Finally, recent performance enhancements allow Galaxy workflows to scale to thousands of datasets.

Speakers

John Chilton

PSU

S6 T4 chilton zip

Wednesday June 29, 2016 12:00pm - 12:20pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

12:20pm EDT

CloudLaunch as a multi-cloud, multi-application launch platform

→ Slides doi:10.7490/f1000research.1112589.1
→ Video

Authors
Enis Afgan, Johns Hopkins University, USA
Nuwan Goonasekera, University of Melbourne, Australia

Abstract
CloudLaunch started as BioCloudCentral.org, and provided a simple, intuitive way to launch Galaxy CloudMan on the Amazon cloud. The original idea has expanded over the years to accommodate launching of Virtual Machines for multiple applications, on various clouds, with additional configuration options. The Cloud Computing landscape has also evolved to facilitate deployment of complex applications, with increasing support for containers.

To adapt to these new realities - we have rewritten CloudLaunch from the ground up as a general purpose application launch platform, targeting multiple applications, clouds and containers.

End users can use the new CloudLaunch as their cloud application deployment and management dashboard. From an app-store-like interface, cloud applications can be selected and launched from multiple clouds (Amazon, OpenStack and soon, GCE). Furthermore, users can view their live and shut-down instances from any supported cloud from this single location.

Technically, CloudLaunch has a fully-defined, documented and browsable ReST API, as well as an extensible web-based front-end for easy management. CloudLaunch’s UI allows each application to define its own custom UI, which can be dynamically plugged into CloudLaunch using simple descriptor metadata. This allows each application to present complex configuration options, allowing the application deployer an easy mechanism for providing launch-time configuration options.

This talk will present the new CloudLaunch features, from an end-user perspective as well as describe how developers and deployers can use it to define and deploy applications. Sample applications such as Galaxy on the Cloud, the Genomics Virtual Lab, a SLURM cluster, and RStudio will be showcased.

Speakers

Enis Afgan

Research Scientist, Johns Hopkins University (JHU)

A long-standing member of the Galaxy community, spanning roles from system deployment to leadership.

S6 T5 CloudLaunch as a multi cloud, multi application launch platform pdf

Wednesday June 29, 2016 12:20pm - 12:40pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

12:40pm EDT

Arts & Crafts

GCC sure can be overwhelming sometimes! This is a quiet place to do some stress free, science related, arts and crafts.

Moderators

Saskia Hiltemann

Erasmus Medical Center

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University

Wednesday June 29, 2016 12:40pm - 1:40pm EDT
IMU Alumni Hall

B. Networking & Sustenance E.6 Birds-of-a-feather E.x Conference - All

1:40pm EDT

FlowGalaxy: Developing a workflow for Flow Cytometry Analysis in Galaxy

→ Slides doi: 10.7490/f1000research.1112455.1)
→ Video

Authors
Cristel G Thomas, Northrop Grumman TS,
Elizabeth Thomson, Northrop Grumman TS,
Patrick Dunn, Northrop Grumman TS,
Henry Schaefer, ESAC, Inc,
Jeff Wiser, Northrop Grumman TS,
John C Campbell, Northrop Grumman TS

Abstract
Flow cytometry is generating increasingly massive multi-dimensional datasets. Available analysis tools exist, but they require extensive human intervention and are not readily scalable for the increasing size of the datasets. More effort has recently been put into developing tools allowing automated analysis for high-throughput flow data, but they are geared toward bioinformaticians.

We are taking advantage of the Galaxy framework to create a workspace for high-throughput Flow Cytometry Data analysis that can be better understood and accessible for the average bench immunologist. We leveraged Galaxy’s innate ability to support multiple programming languages to develop a user-friendly analysis workflow allowing conversion and manipulation of flow cytometry binary data to text, clustering analysis and interactive visualization of the results. We have ported existing tools from Immport to Galaxy written in R, C or Python and created novel text manipulation tools in Python, and data interactive visualization tools in Javascript. These tools will be made freely available to the public through FlowGalaxy, which is deployed on an AWS Cloud instance.

Speakers

Cristel G. Thomas

Research Scientist, NG

S7 T1 gcc2016 FlowGalaxy pdf

Wednesday June 29, 2016 1:40pm - 2:00pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

1:40pm EDT

Session 7

Session 7 features a mix of sponsor and accepted talks.

Moderators

Carrie Ganote

Indiana University

Wednesday June 29, 2016 1:40pm - 3:15pm EDT
IMU Alumni Hall

E.x Conference - All

2:00pm EDT

Outbreak surveillance and investigation using IRIDA and SNVPhyl

→ Slides doi:10.7490/f1000research.1112590.1
→ Video

Authors
Aaron Petkau (1), Franklin Bristow (1), Thomas Matthews (1), Josh Adam (1), Philip Mabon (1), Cameron Sieffert (1), Eric Enns (1), Jennifer Cabral (2), Joel Thiessen (2), Natalie Knox (1), Damion Dooley (3), Aleisha Reimer (1), Eduardo Taboada (6), Alex Keddy (7), Robert G. Beiko (7), William Hsiao (3,4), Morag Graham (1,2), Gary Van Domselaar (1,2), The IRIDA Consortium and Fiona Brinkman (5)

(1) National Microbiology Laboratory, Winnipeg, Canada
(2) University of Manitoba, Winnipeg, Canada
(3) BC Public Health Microbiology and Reference Laboratory, Vancouver, Canada
(4) University of British Columbia, Vancouver, Canada
(5) Simon Fraser University, Burnaby, Canada
(6) National Microbiology Laboratory, Lethbridge, Canada
(7) Dalhousie University, Halifax, Canada

Abstract
Modern epidemiological investigations of infectious disease outbreaks are transitioning to routinely incorporate Whole Genome Sequencing (WGS) data for microbial pathogens. WGS provides a wealth of information previously unavailable, enabling fine-level resolution of isolates using data from the entire genome, down to Single Nucleotide Variants (SNVs). However, the application of WGS for genomic epidemiology continues to be hindered by the complexities of data management and analysis, often requiring considerable expertise as data progresses from the sequencer into a final report.

Here, we present IRIDA (Integrated Rapid Infectious Disease Analysis) and SNVPhyl (SNV Phylogenomics) our platform for genomic epidemiology and pipeline for SNV-based phylogenies respectively. IRIDA stores and manages WGS data and associated epidemiological metadata; provides the execution of analysis pipelines via an internal Galaxy instance, as well as visualization and evaluation of results. Capacity also exists for incorporation of IRIDA-managed data into external tools, such as independent Galaxy installations, through a REST-like API. SNVPhyl enables the classification and clustering of bacterial isolates by identifying phylogenetically informative SNVs from sequence reads. SNVPhyl is distributed as a Galaxy workflow and suite of tools; enabling incorporation within independent Galaxy instances, batch execution via a provided command-line controller script, or execution as part of the larger IRIDA package.

IRIDA and SNVPhyl have shown considerable success within Canada as we transition towards routine sequencing for surveillance and outbreak investigations. With the help of the Galaxy community we have made significant improvements over previous years and IRIDA and SNVPhyl are now freely available at https://github.com/phac-nml/irida and http://snvphyl.readthedocs.org/.

Speakers

Aaron Petkau

Bioinformatician, Public Health Agency of Canada

S7 T2 IRIDASNVPhylGCC2016Presentation pdf

Wednesday June 29, 2016 2:00pm - 2:20pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

2:20pm EDT

Metavisitor, a suite of Galaxy tools and workflows for detection or discovery of viruses in NGS datasets

→ Slides doi:10.7490/f1000research.1112468.1
→ Video

Authors
Marius van den Beek, Institut de Biologie Paris Seine
Guillaume Carissimo, Institut Pasteur; Juliana Pegoraro, Institut de Biologie Paris Seine
Kenneth D Vernick, Institut Pasteur; and Christophe Antoniewski, Institut de Biologie Paris Seine

Abstract
In the aim of providing biologists and medical doctors with an accessible and adaptable software to detect and reconstruct viral genomes from sequencing datasets, we implemented in Galaxy a set of tools and workflows called Metavisitor. This suite of tools and workflows can be used directly upon access to our Mississippi server or installed on any Galaxy server instance. Using the graphical Galaxy workflow editor, the Metavisitor workflows can be adapted to suit specific needs, by adding analysis steps or replacing/modifying existing ones. Metavisitor works with DNA, RNA or small RNA sequencing data that provide different read lengths and can use combination of a de novo and guided approaches to assemble viral genomes from sequencing reads. Thus, the software has the potential for quick diagnosis as well as discovery of viruses (or other pathogens) from a vast array of organisms. Importantly, we are working at an executable paper on how to use Metavisitor in various use-cases as well as at an ansible-based procedure to easily deploy a Metavisitor Galaxy instance on available hardware. We hope that these development lines will increase the accessibility and transparency of Metavisitor and help researchers to focus on biological or medical issues.

Speakers

Christophe Antoniewski

Head of ARTbio bioinformatics, CNRS - Institut de Biologie Paris Seine

Institut de Biologie Paris-Seine (IBPS)

Marius van den Beek

Penn State University

S7 T3 GCC2016 Metavisitor presentation pdf

Wednesday June 29, 2016 2:20pm - 2:40pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

2:40pm EDT

Chemflow, chemometrics using Galaxy

→ Slides doi:10.7490/f1000research.1112573.1
→ Video

Authors
Virginie Rossard, INRA-LBE
Fabien Gogé, IRSTEA Montpellier
Eric Latrille, INRA-LBE
Jean-Michel Roger, IRSTEA Montpellier
Jean-Claude Boulet, INRA-SPO

Abstract
Infrared spectroscopy is widely used in academic research and industry as simple, fast, cheap and safe measurement tool. The infrared data are displayed as spectra, and chemometric is a science which aims at extracting informations from spectra.

We are developing a comprehensive package which contains (1) a MOOC broadcasted in september 2016; (2) a chemometric tool, named ChemFlow, which is an application of Galaxy; and (3) a spectral database. We will focus on ChemFlow.

The required specifications were:

a free tool;
a tool which recycles code from Matlab, Scilab, R, Python and C;
a tool accessible via internet with new devices such as smartphones.

That's why we chose Galaxy. ChemFlow is being implemented with our own functions. By now it includes most of the processing tools : import and convert our data; run chemometrics methods such as calibrations and classifications.

We are very satisfied of the performances of Chemflow running on a server. Nevertheless, some issues were fixed, others are still pending:

Speed performance was improved by switching the Galaxy server to Apache and PostgreSQL.
Hundreds of users are expected. We plan to deploy 2 servers of 48-cores each, without knowing how ChemFlow will behave with many users submitting little tasks.
The graphical toolbox in Galaxy is our main work in progress, and we are currently implementing several original visualisation tools such as R-Shiny.
The development of a specific toolshed is discussed.

As a summary, Galaxy is used in a new domain, chemometrics, adressed to a new user community, and will be a central platform for a new e-learning module, as a MOOC.

Speakers

Virginie Rossard

French National Institute for Agricultural Research, INRA

INRA

S7 T4 2016 06 29 GCC2016 ChemFlow VirginieRossard pdf

Wednesday June 29, 2016 2:40pm - 3:00pm EDT
IMU Alumni Hall

E.1 Conference - Talks E.x Conference - All

4:30pm EDT

Lightning Talks

The call for Lightning Talks will go out shortly before GCC2016 events begin.

Wednesday June 29, 2016 4:30pm - 5:50pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

4:30pm EDT

Session 8

Lightning Talks and conference close.

The call for lightning taks will go out just before GCC2016 events start.

Moderators

Nancy Wilkins-Diehr

Associate Director, San Diego Supercomputer Center

Science gateways and running

Wednesday June 29, 2016 4:30pm - 6:00pm EDT
IMU Alumni Hall

E.x Conference - All

4:33pm EDT

Annotation integration of M. tuberculosis data using the Neo4j graph database

→ Slides doi:10.7490/f1000research.1112749.1
→ Video

Authors
Peter van Heusden

Abstract
At SANBI we are building a database to integrate annotation related to M. tuberculosis using the Neo4j graph database as a storage platform. We will present the construction of this database, demonstrate some sample queries using the Cypher graph query languages, show how Neo4j graph databases can be integrated with Galaxy and mention some strengths and weaknesses of the Neo4j graph database.

Presenters

Peter van Heusden

L2 T1 gcc2016 sanbi lightningtalk slides petervh pdf

Wednesday June 29, 2016 4:33pm - 4:40pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

4:40pm EDT

Galaxy & Docker & Users

→ Slides doi: 10.7490/f1000research.1112750.1
→ Video

Author
Abdulrahman Azab
Björn Grüning

Abstract
This talk is relevant mainly for advanced developers and sysadmins who wish to support docker on their systems but skeptical about docker being insecure. This is also relevant for running Galaxy in production on the top of a HPC system.

How to configure the system to run docker containers as the local user in a very simple and quick way without having to worry about e.g. having connection to LDAP from containers.

Speakers

Bjorn Gruning

University of Freiburg

Presenters

Abdulrahman Azab

Senior Engineer, University of Oslo

L2 T2 lightning talk pdf

Wednesday June 29, 2016 4:40pm - 4:47pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

4:47pm EDT

Dynamic Tool Destination - A Universal Rule Based Job to Destination Mapper

→ Slides doi: 10.7490/f1000research.1112751.1
→ Video

As use of Galaxy increases and computational resources are continuously busy it becomes important to optimize resource usage. To address this issue, we have developed Dynamic Tool Destination (DTD), which is a dynamic job destination that works with all tools and destinations. In DTD an administrator sets up rules for each tool in a YAML file, these rules define what destination a tool should go to when particular parameters are present, input data is large or small, etc. DTD is open source under the Apache License and is available on github at https://github.com/phac-nml/dynamic-tool-destination

Presenters

Eric Enns

Senior Bioinformatician, Public Health Agency of Canada

L2 T3 DTD GCC2016 pdf

Wednesday June 29, 2016 4:47pm - 4:54pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

4:54pm EDT

Applied Bioinformatics - Interdisciplinary Curriculum Built on Open Source Technology

→ Slides doi: 10.7490/f1000research.1112752.1
→ Video

Classic bioinformatics curricula are limited by a relatively rigid course compartmentalization, employment of expensive IT/Bioinformatics proprietary tools, and limited grading system as an outcome for completing the course. Here we present a curriculum infused with real-life research-based projects such as whole genome analysis, gene expression array and molecular dynamics, applied for aging, cancer and pharmacogenomics. These projects serve as pivotal points for integrating biomedical, computer science and statistics into one coherent interdisciplinary subject known as bioinformatics. Each project has scientific objectives serving as underlying platform for educational goals. Students join the projects after completing a basic course familiarizing them with the technical and scientific aspects of the projects. The curriculum is based on 100% open source, cutting edge, evolving technology. This allows teaching students to use the most current technology at the fraction of proprietary software price. The utilization of real-life projects brings excitement of involvement in pertinent discoveries and facilitates learning and open sharing of ideas. As the outcome of completing the projects, students will develop the skills, knowledge, and hands-on experience that will make them competitive in today's intensive and rapidly changing field of computational biology.

Presenters

Acu Ilie Dorin

L2 T4 ACU Galaxy Presentation 2016lastPPT pdf

Wednesday June 29, 2016 4:54pm - 5:01pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:01pm EDT

Embracing Complexity and Diversity: Metaproteomics Within The Galaxy Framework.

→ Slides doi:10.7490/f1000research.1112753.1
→ Video

Metaproteomics characterizes proteins expressed by microorganism communities (microbiome) present in environmental samples or a host organism. Mass spectrometry (MS)-based metaproteomics has catalyzed new discoveries into the functional dynamics of microbiomes (Wilmes et al 2015, doi: 10.1002/pmic.201500183). Metaproteomic informatics is distinctly challenging due to the large databases and complex processing steps involved. This challenge limits widespread use of metaproteomics. Through modular workflows, we demonstrate the use of the Galaxy bioinformatics framework as a metaproteomic informatics solution (Jagtap et al 2015; doi: 10.1002/pmic.201500074). The workflow output results are compatible with tools for taxonomic and functional characterization (Unipept and MEGAN5). MEGAN5 was used to generate functional characterization of the metaproteome using Inter2Pro pathway analysis. These workflows enable new discoveries from diverse communities such as dental plaques (Rudney et al 2015, doi: 10.1186/s40168-015-0136-z), bronchoalveolar lavage fluid (BALF), lung tissue, and cervical-vaginal fluid (CVF). Our results demonstrate the power of discovery metaproteomics to add functional understanding to microbiomes, beyond what is possible using traditional metagenomic approaches.

Speakers

Pratik Jagtap

Research Assistant Professor, University of Minnesota

Metaproteomics . DIA . Proteogenomics

L2 T5 Metaproteomics@GCC Lightning Talk pdf

Wednesday June 29, 2016 5:01pm - 5:08pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:08pm EDT

Distributing Galaxy Data Through CVMFS

→ Slides doi: 10.7490/f1000research.1112754.1
→ Video

GenAP is a Canadian platform that provides Galaxy instances across different Canadian HPC centers. Having more that 7 TB of reference genomes, replicating this data in all HPC centers becomes expensive and hard to keep in synch. Cern VM files system (CVMFS) allow us to centralize the provisioning, replicate the data and distribute genome references on demand. In CVMFS the local machine only imports the genomes necessary for the job being run allowing the use of a minimal storage by the HCP centers.

Presenters

David Morais

Bioinformatics specialist, Compute Canada

L2 T6 Distributing Galaxy Data Through CVMFS pdf

Wednesday June 29, 2016 5:08pm - 5:15pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:15pm EDT

A Generic Circos Galaxy Tool

→ Slides doi: 10.7490/f1000research.1112755.1
→ Video

Circos is a biologist favourite tool for production quality plots, however there is an extremely large activation energy in building the initial plots due to Circos' steep learning curve. We have worked to developing a generic and easily configurable Galaxy tool permitting the generation of Circos plots, while providing the generated configuration files in order to allow further tweaking and customization after the fact. We have made the tool publicly available during development and have already received contributions during the GCC2016 Hackathon.

Moderators

Eric Rasche

Sysadmin / Bioinformatician, Center for Phage Technology @ Texas A&M University

L2 T7 main pdf

Wednesday June 29, 2016 5:15pm - 5:22pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:22pm EDT

Integrating workflow support for GenomeSpace into Galaxy

→ Slides doi:10.7490/f1000research.1112756.1
→ Video

Integrating workflow support for GenomeSpace into Galaxy.

The GenomeSpace importer/exporter itself has been rewritten as a standalone pip installable tool, available here: https://github.com/gvlproject/python-genomespaceclient. We hope to transfer that code back back into GenomeSpace or Galaxy as a set of Python bindings + commandline client for GenomeSpace.

There's a 3 minute video of how things work here:
https://www.youtube.com/watch?v=5QPtWS_ab0I

Seehttps://github.com/galaxyproject/galaxy/pull/1814

Instructors

Nuwan Goonasekera

University of Melbourne

L2 T8 Galaxy workflow support for GenomeSpace pdf

Wednesday June 29, 2016 5:22pm - 5:29pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:29pm EDT

Tool Framework Developments

→ Slides
→ Video

This talk is aimed at Galaxy tool developers and will serve as an
overview of the largest and most relevant Galaxy tool development
framework changes over the past year.

Speakers

John Chilton

PSU

L2 T9 gxtoolschilton zip

Wednesday June 29, 2016 5:29pm - 5:36pm EDT
IMU Alumni Hall

E.5 Conference - Lightning E.x Conference - All

5:36pm EDT

The Monarch Initiative and Phenopackets Wrapped

→ Slides   doi: 10.7490/f1000research.1112757.1
→ Video

Monarch (https://monarchinitiative.org) integrates a variety of genomic, phenotypic, and disease data by leveraging ontologies to create relationships across multiple organisms.
We have (quickly using planemo!) created a galaxy tool to wrap the web services exposed by monarch, including the phenopacket implementation.

Please let us know how we can improve on this first cut. Looking forward to getting some feedback from you

Moderators