→ Poster doi: 10.7490/f1000research.1112746.1
Authors Jochen Bick, ETH Zurich
Susanne E. Ulbrich, ETH Zurich
Stefan Bauersachs, ETH Zurich
Abstract The analysis of RNA-seq data with a basic analysis pipeline including quality control, filtering, trimming, and adapter clipping followed by mapping to a reference genome or transcriptome is a straightforward task using Galaxy. For processing of smallRNA-Seq data it is necessary to modify this analysis pipeline because the resulting reads correspond, at least in theory directly to a small RNA. This leads to a different mapping strategy using BLASTn for short sequences. An additional common problem is that the number of annotated small non-coding RNAs is very low for certain species including pig and cattle which makes it very difficult to annotate smallRNA data in such species. In human a great variety of non-coding RNAs are known compared to other mammalian species that gave us the idea to use the ortholog information of well annotated species. Our workflow is mainly based on basic Galaxy tools and some own in-house scripts. The idea is to use well annotated related species information to improve the annotation of each sequence found in smallRNA-Seq results. First we use the basic analysis pipeline and check for quality, filter, trim and clip the adapter sequence. Afterwards we count and filter the unique sequence reads directly with a combination of different Galaxy tools plus additional tools developed in our group. These reads are mapped with BLASTn-short to align them to all transcripts of our sequenced species including non-coding RNAs and related well annotated species. The collection of BLAST databases contain sequences from mirBASE (precursor and mature mircoRNAs), sequences from NCBI and Ensembl, mostly non-coding RNAs but also protein-coding transcripts, as well as tRNA and piRNA cluster sequences. Finally, all BLAST results have to be filtered and joined by removing all duplicated hits. The annotated sequences are used for DEG analysis with EdgeR and/or DESeq2.