Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Visit the main conference website:  http://galaxyproject.org/gcc2016
View analytic
Tuesday, June 28 • 3:10pm - 4:25pm
P27: GeneSeqToFamily: the Ensembl GeneTree pipeline as a Galaxy workflow

Sign up or log in to save this to your schedule and see who's attending!

Poster    doi:10.7490/f1000research.1112472.1

Authors

Anil S. Thanki, Nicola Soranzo, Robert P. Davey, The Genome Analysis Centre, Norwich, UK,
 
Abstracts
The Ensembl GeneTrees pipeline [1] infers the evolutionary history of gene families, represented as gene trees. These are analysed alongside the corresponding species tree to detect duplication and speciation events. This pipeline is a large and complex suite of interconnected tools and scripts with many dependencies and is therefore quite difficult to port and replicate on a different platform.

We have simplified this process by converting the command line GeneTrees pipeline into an open-source Galaxy workflow, called GeneSeqToFamily. This workflow consists of more than 20 steps and uses existing tools already available in the Galaxy Toolshed, as well as new tools that we developed, such as wrappers for TreeBest and hcluster_sg, alongside data format converters and output parsers. We have also developed tools for retrieving sequences, features and gene trees from Ensembl using its REST API, which can be used as inputs for the workflow.

The outputs of the GeneSeqToFamily workflow are a collection of discovered gene families from genes of interest, a gene tree and multiple sequence alignments for each gene family. These are then merged with gene feature information for each family to generate a dataset which can be visualised inside Galaxy with Aequatus.js, a new JavaScript library derived from Aequatus.

1. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19(2):327–335.
 

Presenters
avatar for Nicola Soranzo

Nicola Soranzo

The Genome Analysis Centre (TGAC)


Attendees (7)