→ Poster doi:10.7490/f1000research.1112472.1
AuthorsAnil S. Thanki, Nicola Soranzo, Robert P. Davey,
The Genome Analysis Centre, Norwich, UK,
AbstractsThe Ensembl GeneTrees pipeline [1] infers the evolutionary history of gene families, represented as gene trees. These are analysed alongside the corresponding species tree to detect duplication and speciation events. This pipeline is a large and complex suite of interconnected tools and scripts with many dependencies and is therefore quite difficult to port and replicate on a different platform.
We have simplified this process by converting the command line GeneTrees pipeline into an open-source Galaxy workflow, called GeneSeqToFamily. This workflow consists of more than 20 steps and uses existing tools already available in the Galaxy Toolshed, as well as new tools that we developed, such as wrappers for TreeBest and hcluster_sg, alongside data format converters and output parsers. We have also developed tools for retrieving sequences, features and gene trees from Ensembl using its REST API, which can be used as inputs for the workflow.
The outputs of the GeneSeqToFamily workflow are a collection of discovered gene families from genes of interest, a gene tree and multiple sequence alignments for each gene family. These are then merged with gene feature information for each family to generate a dataset which can be visualised inside Galaxy with Aequatus.js, a new JavaScript library derived from
Aequatus.
1. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19(2):327–335.