→ Poster doi:10.7490/f1000research.1112722.1
AuthorsMohammad Heydarian (1,2), Jevon Cutler (1,2), Mike Sauria (3), Barbara-Sollner-Webb(1), James Taylor (3), and Karen Reddy (1,2)
1. Department of Biological Chemistry, Johns Hopkins University, Baltimore, MD, USA
2. Center for Epigenetics, Johns Hopkins University, Baltimore, MD, USA
3. Department of Biology, Johns Hopkins University, Baltimore, MD, USA
Abstract Long non-coding RNAs (lncRNAs) are a class of RNA that lack protein coding potential and exhibit features similar to protein coding RNAs, in that they are transcribed by RNA polymerase II, are 5' capped, and spliced in most cases. LncRNAs are expressed at levels lower than protein coding RNAs and exhibit tissue/cell type restricted expression. To identify lncRNAs in early B cell development in mouse, we performed RNA-sequencing on two developmentally arrested models of early hematopoiesis, a multi-potent progenitor (MPP) with the capacity to differentiate towards monocyte/lymphocyte lineages and a lineage committed pro-B (pro-B) cell system. Using the Tuxedo RNA-seq analysis suite with a de novo transcriptome reconstruction approach, we identified ~ 45,000 transcripts deemed to be long non-coding RNAs. To prioritize high confidence lncRNAs, we developed a Galaxy based workflow for discovery of novel and known high confidence lncRNAs that we call the 'TRUElncRNA workflow'. This workflow requires standard output file formats from the Tuxedo suite, as well as widely available reference data from the UCSC table browser, and returns high confidence novel and known lncRNAs. Using the TRUElncRNA workflow, we identified ~ 200 novel and ~2,300 known high confidence lncRNAs expressed in early B cell development. These high confidence lncRNAs demonstrate low coding potential relative to protein coding RNAs by PhyloCSF scoring. The identified high confidence lncRNAs exhibit chromatin profiles similar to annotated protein coding genes and show tissue restricted expression patterns across a comprehensive array of mouse tissues. Lastly, these lncRNAs also tend to reside in topological domains with genes of relevance to early hematopoiesis and in some cases interact with promoters of relevant genes across hundreds of kilobases.