lncRNA-screen: an interactive platform for computationally screening long non-coding RNAs in large genomics datasets

http://media.springernature.com/full/springer-static/image/art%3A10.1186%2Fs12864-017-3817-0/MediaObjects/12864_2017_3817_Fig1_HTML.gifLong non-coding RNAs (lncRNAs) have emerged as a class of factors that are important for regulating development and cancer. Computational prediction of lncRNAs from ultra-deep RNA sequencing has been successful in identifying candidate lncRNAs. However, the complexity of handling and integrating different types of genomics data poses significant challenges to experimental laboratories that lack extensive genomics expertise. To address this issue, we have developed lncRNA-screen, a comprehensive pipeline for computationally screening putative lncRNA transcripts over large multimodal datasets. The main objective of this work is to facilitate the computational discovery of lncRNA candidates to be further examined by functional experiments. lncRNA-screen provides a fully automated easy-to-run pipeline which performs data download, RNA-seq alignment, assembly, quality assessment, transcript filtration, novel lncRNA identification, coding potential estimation, expression level quantification, histone mark enrichment profile integration, differential expression analysis, annotation with other type of segmented data (CNVs, SNPs, Hi-C, etc.) and visualization. Importantly, lncRNA-screen generates an interactive report summarizing all interesting lncRNA features including genome browser snapshots and lncRNA-mRNA interactions based on Hi-C data. lncRNA-screen provides a comprehensive solution for lncRNA discovery and an intuitive interactive report for identifying promising lncRNA candidates. lncRNA-screen is available as open-source software on GitHub.

lncRNA-screen: an interactive platform for computationally screening long non-coding RNAs in large genomics datasets
Gong Y, Huang HT, Liang Y, Trimarchi T, Aifantis I*, Tsirigos A*. (*co-corresponding authors)
BMC Genomics, June 2017.

HiC-bench: comprehensive and reproducible Hi-C data analysis designed for parameter exploration and benchmarking

Chromatin conformation capture techniques have evolved rapidly over the last few years and have provided new insights into genome organization at an unprecedented resolution. Analysis of Hi-C data is complex and computationally intensive involving multiple tasks and requiring robust quality assessment at each step of the analysis. This has led to the development of several tools and methods for processing Hi-C data. However, most of the existing tools do not cover all aspects of the analysis and only offer few quality assessment options. Additionally, availability of a multitude of tools makes scientists wonder how these tools and associated parameters can be optimally used, and how potential discrepancies can be interpreted and resolved. Most importantly, investigators need to be ensured that slight changes in parameters and/or methods do not affect the conclusions of their studies. Finally, any analysis, no matter how complex, should be reproducible by keeping track of the tool versions, parameters and input data. To address these issues (compare, explore and reproduce), we developed HiC-bench, a configurable computational platform for comprehensive and reproducible analysis of Hi-C sequencing data. HiC-bench performs all common Hi-C analysis tasks, such as alignment, filtering, contact matrix generation and normalization, identification of topological domains, scoring and annotation of specific interactions using both published tools and our own. We have also embedded various tasks that perform quality assessment and visualization. HiC-bench is implemented as a data flow platform with an emphasis on analysis reproducibility. Additionally, the user can readily perform parameter exploration and comparison of different tools in a combinatorial manner that takes into account all desired parameter settings in each pipeline task. This unique feature facilitates the design and execution of complex benchmark studies that may involve combinations of multiple tool/parameter choices in each step of the analysis. To demonstrate the usefulness of our platform, we performed a comprehensive benchmark of existing and new TAD callers exploring different matrix correction methods, parameter settings and sequencing depths. Users can extend our pipeline by adding more tools as they become available. HiC-bench is distributed as free open-source software on GitHub and Zenodo, and our bioinformatics team offers installation and usage support.

HiC-bench: comprehensive and reproducible Hi-C data analysis designed for parameter exploration and benchmarking
Lazaris C, Kelly S, Ntziachristos P, Aifantis I* and Tsirigos A* (*co-corresponding authors)
BMC Genomics, January 2017.

The role of Phf5a in pluripotency and cell differentiation


Pluripotent embryonic stem cells (ESCs) self-renew and give rise to all adult tissues. In this study, we delineated the function of the PHD-finger protein 5a (Phf5a) in ESC self-renewal and the regulation of pluripotency, as well as in myoblast specification. Bioinformatically, we demonstrated that Phf5a is essential for maintaining pluripotency, since Phf5a-depleted ESCs exhibit defective elongation of pluripotency genes and they differentiate. Mechanistically, we attributed Phf5a function to the stabilization of Paf1, a complex with major role in transcriptional elongation. GRO-Seq analysis showed that Phf5a-depleted cells exhibited defective elongation of pluripotency genes, while ChIP-Seq analysis of the Paf1 complex components in presence and absence of Phf5a revealed a role of the latter in maintaining the complex integrity. Similarly, Gene Ontology analysis showed that Phf5a depletion resembled Paf1 depletion in terms of the genes/pathways affected. Apart from an ESC-specific factor, we demonstrated that Phf5a also controls differentiation of adult myoblasts, again through Paf1 stabilization. Our findings suggest a potent role of Phf5a in regulating pluripotency and cellular reprogramming.

Regulation of transcriptional elongation in pluripotency and cell differentiation by the PHD-finger protein Phf5a
Strikoudis A, Lazaris C, Trimarchi T, Galvao Neto AL, Yang Y, Ntziachristos P, Rothbart S, Buckley S, Dolgalev I, Stadtfeld M, Strahl BD, Dynlacht BD, Tsirigos A* and Aifantis I* (*co-corresponding authors)

Nature Cell Biology, October 2016

Contrasting roles of H3K27 demethylases in T cell leukemia


T-cell acute lymphoblastic leukemia (T-ALL) is a hematological malignancy with a dismal overall prognosis, including a relapse rate of up to 25%, mainly because of the lack of non-cytotoxic targeted therapy options. Drugs that target the function of key epigenetic factors have been approved in the context of haematopoietic disorders, and mutations that affect chromatin modulators in a variety of leukemias have recently been identified; however, ‘epigenetic’ drugs are not currently used for T-ALL treatment. Recently, we showed that the polycomb repressive complex 2 (PRC2) has a tumor-suppressor role in T-ALL. Here we delineated the role of the histone 3 lysine 27 (H3K27) demethylases JMJD3 and UTX in T-ALL. We show that JMJD3 is essential for the initiation and maintenance of T-ALL, as it controls important oncogenic gene targets by modulating H3K27 methylation. By contrast, we found that UTX functions as a tumor suppressor and is frequently genetically inactivated in T-ALL. Moreover, we demonstrated that the small molecule inhibitor GSKJ4 affects T-ALL growth, by targeting JMJD3 activity. These findings show that two proteins with a similar enzymatic function can have opposing roles in the context of the same disease, paving the way for treating hematopoietic malignancies with a new category of epigenetic inhibitors.

Contrasting roles for histone 3 lysine 27 demethylases in acute lymphoblastic leukemia
Ntziachristos P, Tsirigos A* et al. (*co-first and co-corresponding author)
Nature, August 2014  

Long non-coding RNAs in T cell leukemia


Accumulating evidence suggests that by focusing on protein-coding genes only, a significant part of cancer biology is being overlooked. Recent studies have revealed that a large portion of the human genome is transcriptionally active despite that fact that only a small portion contains protein-coding genes. In particular, long non-coding RNAs (lncRNAs) have been shown to be important in disease in many different cell types, suggesting a ubiquitous role in regulation of cellular state. While several groups have investigated possible roles for lncRNAs as players in the TP53 tumor-suppressor transcriptional program and solid tumors, our overall knowledge of lncRNAs in cancer, including leukemia, remains extremely limited. In this work generated sequencing data and used computational methods and pipelines for the identification of lncRNAs in ALL in an effort to identify novel biomarkers and biological targets for therapy. 

Genome-wide mapping and characterization of novel Notch-regulated long non-coding RNAs in acute leukemia 
Trimarchi T, Bilal E, Ntziachristos P, Fabbri G, Dalla-Favera R, Tsirigos A* and Aifantis I*. (*co-corresponding authors)
Cell, July 2014

Genetic inactivation of EZH2 in T cell leukemia


In collaboration with Dr. Aifantis' lab at NYU Medical Center, we study epigenetic mechanisms that transform T cells and cause leukemia. Extensive high-throughput sequencing (RNA-seq and ChIP-seq) of samples obtained from mouse models, cell lines or patients and subsequent analyses help reveal the role of histone modifications and epigenetic modulators such as Ezh2, which we showed to be a tumor suppressor in T cell acute lymphoblastic leukemia. 

Genetic Inactivation of the PRC2 Complex in T-Cell Acute Lymphoblastic Leukemia
Ntziachristos P, Tsirigos A* et al. (*co-first and co-corresponding author)
Nature Medicine, January 2012 



Dr. Tsirigos developed GenomicTools (http://code.google.com/p/ibm-cbc-genomic-tools) a flexible open-source computational platform for the analysis, manipulation and visualization of high-throughput sequencing data. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks from preprocessing and quality control to meta-analyses. For example, the user can create average read profiles across transcriptional start sites or enhancer sites, quickly prototype customized peak discovery methods for ChIP-seq experiments, perform genome-wide statistical tests such as enrichment analyses, or design statistical controls via appropriate randomization schemes. We introduced genomic_apps, a series of operations whose goal is to create plots as output (in tiff format), such as heatmap representations and read profiles for sequencing data. The basic implementation idea is to first compute the necessary data using the GenomicTools API and then call an R script which is automatically generated. Users are also given the option to modify the script so as to obtain customized behavior. R version 2.14 (or later) is required. 

GenomicTools: a computational platform for developing high-throughput analytics in genomics
Tsirigos A*, Haiminen N, Bilal E, Utro F (*corresponding author)
Bioinformatics, November 2011

Repeat element evolution and function


Despite their fundamental role in cell regulation, protein-coding genes account for a small fraction of the human genome. Recent studies have shown that non-genic regions of our DNA may also play an important functional role in human cells. In collaboration with Dr. Rigoutsos, we studied Alu and B elements, a specific class of such non-genic elements that account for ~10% of the human genome and ~7% of the mouse genome respectively. Contrary to the prevailing hypothesis, we show that Alu and B elements have been preferentially retained in the proximity of genes that perform specific functions in the cell, whereas we found no evidence for selective loss of these elements in any functional class. Several of the functional classes associated with Alu and B elements in our study, such as DNA repair, are central to the proper working of the cell and their disruption has previously been shown to lead to the onset of disease.

Alu and B1 repeats have been selectively retained in the upstream and intronic regions of genes of specific functional classes
Tsirigos A*, Rigoutsos I* (*co-corresponding authors)
PLoS Computational Biology, December 2009

Intronic motifs


In collaboration with Dr. Rigoutsos, we identified the most frequent, variable-length DNA sequence motifs in the human and mouse genomes and sub-selected those with multiple recurrences in the intergenic and intronic regions and at least one additional exonic instance in the corresponding genome. Surprisingly, we discovered that these genome-specific motifs, although not conserved in sequence between human and mouse, are nevertheless enriched in the introns of genes belonging to the same biological processes and molecular functions in both the human and mouse genomes. We highlight the ramifications of this observation with a concrete example that involves the microsatellite instability gene MLH1. 

Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs
Tsirigos A, Rigoutsos I
Nucleic Acids Research, May 2008

Horizontal gene transfer


DNA exchange between organisms is believed to play an important role in evolution in general as well as in conferring drug resistance in particular. Dr. Tsirigos and Dr. Rigoutsos introduced a generalized computational framework for identifying horizontal transfers based on genomic pattern composition. Through extensive simulations in a wide range of archaeal and bacterial genomes, we evaluated the performance of our models and showed improvements over previously published approaches, such as the Codon Adaptation Index and C+G content. In an extension of our approach, we introduced Wn-SVM, which employs a one-class support-vector machine and can learn using small training sets. 

A sensitive, support-vector-machine method for the detection of horizontal gene transfers in viral, archaeal and bacterial genomes
Tsirigos A, Rigoutsos I
Nucleic Acids Research, July 2005

A new computational method for the detection of horizontal gene transfer events
Tsirigos A, Rigoutsos I
Nucleic Acids Research, February 2005