# MOLECULAR FUNCTION AND REGULATION OF NON-CODING RNAs IN MULTIFACTORIAL DISEASES

EDITED BY: Mohammadreza Hajjari, Seyed Javad Mowla and Mohammad Ali Faghihi PUBLISHED IN: Frontiers in Genetics

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-002-2 DOI 10.3389/978-2-88945-002-2

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

## What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **MOLECULAR FUNCTION AND REGULATION OF NON-CODING RNAs IN MULTIFACTORIAL DISEASES**

Topic Editors:

**Mohammadreza Hajjari,** Shahid Chamran University of Ahvaz, Iran **Seyed Javad Mowla,** Tarbiat Modares University, Iran **Mohammad Ali Faghihi,** University of Miami Miller School of Medicine, USA

Our understanding of the mechanisms underlying the development of multifactorial diseases such as diabetes, autism, Alzheimer's disease, and cancer has been greatly advanced. Non-coding RNAs (ncRNAs), generally including microRNAs and long non-coding RNAs, have recently been found to have potential roles in these diseases, and provide new opportunities for developing both specific biomarkers and therapeutic targets. However, the molecular function and regulation of these RNAs still remains challenging. Numerous studies are focusing on this field in order to fully appreciate the role and regulation of these molecules in human medicine and biology.

This e-book aims to bring together new findings on Non-coding RNAs in different complex diseases. It will highlight the characterization, roles, mechanism, and mode of action of these RNAs in complex disorders. We believe that the publications on this topic would be exponentially extended in future. The improved approaches at multiple levels may pave the way for designing and applying new biomarker and therapeutic targets for specific diseases based on these attractive molecules.

**Citation:** Hajjari, M., Mowla, S. J., Faghihi, M. A., eds. (2016). Molecular Function and Regulation of Non-coding RNAs in Multifactorial Diseases. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-002-2

# Table of Contents


## **Chapter 2: Long non-coding RNAs in complex diseases**


Alireza Shahryari, Marie Saghaeian Jazi, Nader M. Samaei and Seyed J. Mowla

*67 Exosomal lncRNA-p21 levels may help to distinguish prostate cancer from benign disease*

Mustafa Is¸ ın, Ege Uysaler, Emre Özgür, Hikmet Köseog˘lu, Öner S¸ anlı, Ömer B. Yücel, Ug˘ur Gezer and Nejat Dalay

# Editorial: Molecular Function and Regulation of Non-coding RNAs in Multifactorial Diseases

#### Mohammadreza Hajjari <sup>1</sup> \*, Seyed Javad Mowla<sup>2</sup> \* and Mohammad Ali Faghihi <sup>3</sup> \*

<sup>1</sup> Department of Genetics, Shahid Chamran University of Ahvaz, Ahvaz, Iran, <sup>2</sup> Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran, <sup>3</sup> Department of Psychiatry and Behavioral Sciences, Center for Therapeutic Innovation, University of Miami Miller School of Medicine, Miami, FL, USA

Keywords: long non-coding RNA (lncRNA), microRNA (miRNA), multifactorial disease, biomarker discovery, non-coding RNA (ncRNA)

**The Editorial on the Research Topic**

#### Edited by:

Subbaya Subramanian, University of Minnesota, USA

#### Reviewed by:

Venugopal Thayanithy, University of Minnesota, USA Mainá Bitar, Universidade Federal de Minas Gerais, Brazil Anne Elizabeth Sarver, University of Minnesota, USA

#### \*Correspondence:

Mohammadreza Hajjari m-hajari@scu.ac.ir; Mohamad.hajari@gmail.com; Seyed Javad Mowla sjmowla@modares.ac.ir; Mohammad Ali Faghihi mfaghihi@med.miam.edu

#### Specialty section:

This article was submitted to RNA, a section of the journal Frontiers in Genetics

Received: 31 October 2015 Accepted: 22 January 2016 Published: 19 February 2016

#### Citation:

Hajjari M, Mowla SJ and Faghihi MA (2016) Editorial: Molecular Function and Regulation of Non-coding RNAs in Multifactorial Diseases. Front. Genet. 7:9. doi: 10.3389/fgene.2016.00009 **Molecular Function and Regulation of Non-coding RNAs in Multifactorial Diseases**

Most of the parts in the human genomic DNA do not produce any proteins. They encode functional RNA molecules, which are not translated, and are thus named as non-coding RNAs (ncRNAs; Cammaerts et al.; Hajjari et al., 2014). Depending on the length and the function of these RNAs, they are categorized into different types such as microRNAs (miRNAs) and long non-coding RNAs (lncRNAs). Different studies have shown that these RNAs have crucial roles in cellular and molecular mechanisms. Considering their function, it is believed that the dysregulation of ncRNAs is involved in different diseases, especially such complex ones as cancers, neurological disorders, and cardiovascular diseases (Nouraee and Mowla; Merelo et al.; Hajjari et al., 2014). Accordingly, some researchers believe that these molecules can help with the diagnosis and the treatment of complex diseases. However, we have still a limited understanding of the functions of these molecules, and the different aspects of ncRNAs are still to be discovered in future. The above consideration was the motivation for the production of the topic entitled "Molecular function and regulation of non-coding RNAs in multi-factorial diseases." We believe that the articles in this issue and the attributed e-book can provide the interesting viewpoints for the researchers interested in this topic.

miRNAs are small RNAs with about 20–24 nucleotides length and function in post transcriptional regulation of gene expression. They usually destabilize and repress target RNAs via binding to 3′UTRs, 5′UTRs or the coding sequences of the transcripts (Lytle et al., 2007; Qin et al., 2010). miRNAs have been demonstrated to play major roles in a wide range of developmental processes as well as diseases. To understand the role of miRNA pathogenesis in different diseases such as cancers, two different strategies including the expression analysis and the genetic approaches are proposed (Reviewed in Cammaerts et al.). As such, the miRNAs are currently considered as new potential biomarkers for different diseases (Angelini and Emanueli). Furthermore, some reports have interestingly shown the plasma and saliva miRNAs as potentially sensitive and specific biomarkers (Lin et al.; Khoo et al., 2012). Nonetheless, there are still some concerns with the functionality of miRNAs that need to be addressed in future.

By modulating different targets or pathways, miRNAs have been quickly considered as potential therapeutic molecules for some diseases such as cancer (reviewed in Naidu et al., 2015) and cardiovascular ones (Nouraee and Mowla). miRNA-based therapeutics mainly focuses on modulation of the miRNA expression levels (Misso et al., 2014). MRX34, an agent mimicking miR-34 and developed by Mirna Therapeutics, has progressed into phase I clinical trials (identifier: NCT01829971, Currently recruiting participants, Estimated study completion date: Dec 2016). The results may hopefully provide more insights for the researchers. In spite of different researches, there are still some challenges in designing a suitable carrier for targeting miRNAs into the desired cells. Bakhshinejad reports the studies proposing the nanocarriers as the potential tools to overcome delivery problems concerned with miRNA-based pharmaceutical tools (Bakhshinejad).

lncRNAs, the RNAs longer that 200 nucleotides, comprise the largest proportion of the human transcriptome. The biology of lncRNAs seems more complicated in comparison to miRNAs. These RNAs act through more different pathways and modes of action within the cell. They function at transcriptional and post-transcriptional levels, in the cytosol and nucleus through cis and trans-regulatory mechanisms (Hajjari et al., 2014; Angrand et al.). Different lncRNAs such as HOTAIR, H19, MEG3, ANRIL, HULC, and XIST are transcribed within the genome and play important roles in different cellular pathways (reviewed in Hajjari et al., 2014). Multiple lines of observation increasingly associate the mutations and the dysregulations of lncRNAs to some human diseases such as cancer (reviewed in Angrand et al.; Hajjari et al., 2014). On the current subject matter, Shahryari et al. have provided some evidence showing that some lncRNAs such as SOX2OT are also involved in pluoripotency (Shahryari et al.). Peschanscky et al. have also described the novel facets of FMR4 lncRNA functionality and its relation to neurodevelopment (Peschansky et al.).

The growing evidence of the dysregulated lncRNAs not only introduces a new layer of complexity in the molecular mechanism of human diseases, but also opens up the opportunity to use

## REFERENCES


lncRNAs as therapeutic targets and biomarkers. The prominent example of lncRNAs application in clinical practices is PCA3 (Prostate cancer gene 3), an FDA-approved biomarker for the prostate cancer. According to this test, the PSA and PCA3 RNA molecules are measured, and then the PCA3 Score (a ratio of PCA3 RNA to PSA RNA) is calculated (http://www.fda. gov). Circulating lncRNAs have been also suggested as potential biomarkers for cancer and some other diseases. Exomal lncRNAs have been suggested as potential biomarkers to diagnose the malignant state of patients with prostate cancer (I¸sin et al.).

The rapidly growing list of ncRNAs holds promises that miRNAs and lncRNAs will become ever more important in illness management in future. The elucidation of the mechanisms by which lncRNAs act will help us to design suitable biomarkers or therapeutic agents for some diseases, especially cancer. The deciphering of the genetic networks and pathways regulated by the abnormally expressing lncRNAs, the characterization of miRNA-mRNA-lncRNA interaction, and the genetic/epigenetic mechanisms regulating the lncRNA expression provide more insights into the application of lncRNA research to human health. Considering the different molecular functions of ncRNAs, these RNAs can be the subject of a large diversity of further studies on complex diseases. We trust that the publications and the novelties on this subject will be exponentially extended in future.

# AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct, and intellectual contribution to the work, and approved it for publication.

Naidu, S., Magee, P., Michela Garofalo, M. (2015). MiRNA-based therapeutic intervention of cancer. J. Hematol. Oncol. 8:68. doi: 10.1186/s13045-015-0162-0

Qin, W., Shi, Y., Zhao, B., Yao, C., Jin, L., Ma, J., et al. (2010). MiR-24 regulates apoptosis by targeting the open reading frame (ORF) region of FAF1 in cancer cells. PLoS ONE 5:e9429. doi: 10.1371/journal.pone.0009429

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Hajjari, Mowla and Faghihi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# MicroRNAs as clinical biomarkers?

Timothy G. Angelini <sup>1</sup> and Costanza Emanueli 2, 3 \*

*<sup>1</sup> Royal Surrey County Hospital, Guildford, UK, <sup>2</sup> Bristol Heart Institute, School of Clinical Sciences, University of Bristol, Bristol, UK, <sup>3</sup> National Heart and Lung Institute, Imperial College London, London, UK*

Keywords: microRNAs, clinical biomarkers, exosomes, cardiac disease, myocardial infarction

A clinical biomarker has been defined as "any cellular, biochemical, molecular, or genetic alterations by which a normal, abnormal, or simple biological process can be recognized or monitored" (Drucker and Krapfenbauer, 2003; Rahim et al., 2015). Biomarkers are widely employed throughout medicine on a day to day basis to diagnose, prognosticate, and predict outcomes of disease and illness or even as guides of treatment. They can either be used as a standalone, or more commonly, in conjunction with other test results. Moreover, biomarkers are often chosen as "surrogate endpoints" in clinical trials. Biomarkers must fulfill certain criteria in order to make it in to regular use in clinical practice. They must be specific to the condition that they are indicated in, sensitive, and must be practical in terms of accessibility of the sample, ease of testing method and provide information in which to guide clinical decisions. However, many biomarkers have unsatisfactory specificity or are not as sensitive as they are made out to be. In addition, biomarkers for some conditions are not yet available. Consequently, research in to novel biomarkers should be encouraged.

#### Edited by:

*Seyed Javad Mowla, Tarbiat Modares University, Iran*

#### Reviewed by:

*Alessio Paone, Sapienza University of Rome, Italy Vijay Kumar Prajapati, Central University of Rajasthan, India*

> \*Correspondence: *Costanza Emanueli, costanza.emanueli@bristol.ac.uk*

#### Specialty section:

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

Received: *08 May 2015* Accepted: *29 June 2015* Published: *14 July 2015*

#### Citation:

*Angelini TG and Emanueli C (2015) MicroRNAs as clinical biomarkers? Front. Genet. 6:240. doi: 10.3389/fgene.2015.00240*

Biomarkers can be classified in to different classes, one being the sources where they measured from. In this respect, there are two distinct kinds of biomarker; intracellular, and extracellular (which can be broken down into three further groups by means of sampling; invasive, minimally invasive, and non-invasive). Examples of intracellular biomarkers include many of those used in oncology, such as BRCA1/2 (Drooger et al., 2015), and estrogen receptor and hormone receptor status in breast cancers (Yang et al., 2014), and BRAF genes in melanoma (Dar et al., 2015). For these tests, the cells from the suspected tumor must be invasively sampled, often by core sampling, or tissue scrapings. Extracellular biomarkers are often but not always linked with less invasive sampling, for example the withdrawal of a peripheral venous blood sample or urine collection. Examples are urine test for beta-human chorionic gonadotrophin (Chen et al., 2015) as an indicator of pregnancy or prostate specific antigen (PSA) (Ravery, 1999) in a suspected prostate cancer case.

An exciting, but not yet conclusive, new area of research with the potential to derive new clinical biomarkers has been into microRNAs (miRs). These short RNA molecules were discovered in 1993 (Lee et al., 1993) in C. elegans and found present in human samples in 2000 (van Rooij, 2011). From there, their love story with the biomedical community has seen a crescendo of enthusiasm and they have been highly regarded as both novel therapeutic targets, and possible intracellular and extracellular biomarkers. We might speculate that the reason behind this popularity is two-folds:

(1) The mode of action of miRs is relatively straightforward. MiRs do not contain any coding for protein production (and hence they are classified as noncoding RNAs–ncRNAs) and each miR is capable of post-translational regulation of the expression of a plethora of target messenger RNAs (mRNAs), which it recognizes through the semi complementary nucleotide base-pairing between its "seed sequence" (of just eight nucleotides) and one or more miR binding sites in the 3′ -untranslated region of the mRNA targeted (Jackson and Standart, 2007). The biology of miR appears much simpler in comparison to, as an example, the long ncRNA that act via multiple, with still largely un-clarified mechanisms, at transcriptional and post-transcriptional levels, in the nucleus and the cytosol, and in cis and trans (Rinn and Chang, 2012). Moreover, miRs exist in a limited number, thought to be around 2000 in humans. However, miRs variants are still being discovered through RNA-sequencing approaches.

(2) MiR are released by their producing cells in protected forms allowing them to remain for long periods in biological fluids. Moreover, miRs were initially believed to be tissue- and even cell-specific. This led to the assumption that what was found in the blood could sense the altered status of the cell types or tissues that were supposed to produce such miR. Now the concept of tissue/cell-specificity has been largely dismissed and it is becoming clear that at best we can talk of cell/tissue type enrichment for most, if not all, the known miRs. Hence, it is quite unrealistic to think that, under most clinical scenarios (with exceptions discussed later), the level of a single miR measured in the whole plasma or serum can be informative of a local slowly or relatively slowly evolving condition, such as, a developing cancer (Washam et al., 2013).

Circulating individual miRs, such as miR-21 (suggested to be useful for the detection of various carcinomas) (Wang et al., 2014; Wu et al., 2015), and circulating groups of miRs (such as a serum miR classifier encompassing miR-29a, miR-29c, miR-133a, miR-143, miR-145, miR-192, and miR-505, that has been proposed to detect hepatocellular carcinoma) (Lin et al., 2015) have been suggested as potential biomarkers that could be used in cancer detection and staging, and to follow-up already diagnosed cancer patients. However, in a recent article reporting the results of the comparison among 15 previous reports on potential new breast cancer biomarkers (Leinder et al., 2013) there was a scarce overlap between results: "Of the 143 circulating miRNAs reported to be differently regulated, 100 were supported by just 1 reference; 25 others had discordant results across publications and of the remaining 18 miRs, 8 had fold changes too low to be confirmed. Of the 10 concordant results, 9 were supported entirely by publications from the same institution and had authors in common" (Leinder et al., 2013; Witwer, 2015). This suggests that further efforts are needed before miR-based biomarkers can benefit cancer patients and that this could also apply to other disease conditions. Cancer has been the first translational area for miR work, closely followed by heart failure. Looking at both clinical scenarios, we find a typical example of unspecificity: circulating miR-21 has been proposed as a biomarker of both prostate cancer (Egidi et al., 2013) and myocardial fibrosis (in heart failure) (Thum et al., 2008). miR-21 is also the most expressed miR by vascular endothelial cells (Greco and Martelli, 2014), which are the cells directly lining the circulating blood and for this reason supposed to be the highest contributors to miRs circulating in the peripheral blood (Greco and Martelli, 2014; Witwer, 2015). A miR such as this, amongst others that are widely and highly expressed, might not be an ideal circulating candidate biomarker, suggesting that miRs that are usually under expressed, but upregulated under a particular condition could be better suited to be employed in a diagnostic test. For example, we recently found that miR-503 appears in the blood of diabetic patients at the last stage of critical limb ischemia, i.e., when they need an amputation (Caporali et al., 2011). It is possible that circulating miR-503 could have a diagnostic/prognostic value when measured at the earlier stages of the disease, however, measuring miR-503 in a small leg muscle biopsy could provide more reliable information.

There are areas where we believe that circulating miRs show more promise and this is in the recognition of acute events, such as myocardial infarct (MI), as well as in the surgical setting, where time-restricted changes in miR expression have been reported consistently. For examples, the heart (and skeletal muscle)—enriched miR-1 has been noted to increase in patients after open heart surgery, after MI or transcoronary ablation of septal hypertrophy, an interventional procedure that mimics MI (D'Alessandra et al., 2010; Widera et al., 2011; Liebetrau et al., 2013; Nabialek et al., 2013). Diagnostic biomarkers are a key part in emergency service provision and the rapid diagnosis, and therefore treatment of patients with life threatening conditions. One of the most widely used in the emergency department are cardiac troponins (cTns: cTn-T and cTn-I) for the diagnosis of MI. CTns are used in conjunction with other investigations, such as electrocardiogram (ECG) changes, allowing, for example, to determine whether a patient has had a "STEMI" type MI (with ST segment Elevation by ECG). ECG cannot pick up non-STEMI cases and here laboratory biomarkers are highly important (Alpert et al., 2000). However, cTns are not always specific to MI, they can be raised in patients with other cardiac conditions and also after infection. In light of this, research into miRs as potentially better biomarkers has been carried out showing timedependent increases in cardiac-enriched, ischemia-responsive miRs in the blood of MI patients (D'Alessandra et al., 2010; Nabialek et al., 2013). There has also been claim that miRs can help differentiating the diagnosis of a STEMI compared with other myocardial conditions, such as stable angina, non-STEMI, and Takotsubo cardiomyopathy (Nabialek et al., 2013; Ward et al., 2013; Jaguszewski et al., 2014). However, it is yet to be demonstrated that miRs can replace cTns as routine biomarkers used in the intensive coronary care unit, in depth investigations of specificity and sensitivity in different cohorts or patients are needed. Additionally, in the realm of emergency medicine, sensitivity and specificity of miRs response in terms of circulating changes are not the only issues, because the time necessary by this putative biomarker to appear elevated in the blood or another biological fluid is critical. Classically, cTns take a few hours to increase in the blood after an MI. MiR-1 has been proposed to go up earlier than cTns (Liebetrau et al., 2013). However, for these comparisons, the time to obtain the test results and the test reproducibility are big obstacles yet to be overcome. Differently from high sensitive cTns that today are measured by immune-enzymatic reactions allowing for results to be obtained in around 20 min, PCRbased miR analyses are still quite time-consuming. Alternative techniques for miR quantification have been proposed (Arata et al., 2012) but they are far away from being commonly employed by the scientific community, let alone the clinical diagnostic lab. Additionally, in a clinical laboratory staff cannot reason as in observational studies, where everybody is often quite satisfied with saying "miR-1 is increased in the blood of MI patients in comparisons to a control group." Diagnostic use of miRs needs a different rigor and first of all a definition of what the "normal" threshold of miR concentration is, above which we can suspect in a patient who is experiencing a heart attack. Scientists working in the miR field are aware that this is not an easy task and by using PCR-based methods quantitative differences between different studies are common. In addition, the data normalization approaches are still debated and interference by heparin (used in interventional procedures) with the PCR reaction has been reported (Mayr et al., 2013), even if protocol to nullify the "heparin effect" are adopted. Alternative technologies can be developed, but this will require further investment, time and validation efforts (Arata et al., 2012).

In conclusion, We believe that miRs hold potential value as clinical biomarkers, but, their journey to the diagnostic lab is still long and needs improved approaches at multiple levels, starting with technical refinement in the miR concentration

#### References


evaluation, the use of RNA-sequencing to possibly recognize new miRs that are better candidates (higher tissue/cell-specificity, lower expression under healthy conditions etc.) and the use of blood fractions potentially enriched in miRs (like exosomes and microparticles) for diagnostic tests. It is also possible that miR clusters have more specificity than single miRs. Moreover, miR could be associated to other biomarkers to improve the diagnostic power.

#### Funding and Acknowledgments

This work was funded by the National Institute of Health Research (NIHR) Bristol Cardiovascular Biomedical Research Unit (BRU) The views expressed are those of the Authors and not necessarily those of the NHS, the NIHR or the Department of Health. CE is a PI in the Leducq transatlantic network in vascular microRNAs (MIRVAD).

cardiomyopathy from acute myocardial infarction. Eur. Heart. 35, 999–1006. doi: 10.1093/eurheartj/eht392


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Angelini and Emanueli. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# miRNA therapeutics in cardiovascular diseases: promises and problems

#### *Nazila Nouraee and Seyed J. Mowla\**

*Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran*

microRNAs (miRNAs) are a novel class of non-coding RNAs which found their way into the clinic due to their fundamental roles in cellular processes such as differentiation, proliferation, and apoptosis. Recently, miRNAs have been known as micromodulators in cellular communications being involved in cell signaling and microenvironment remodeling. In this review, we will focus on the role of miRNAs in cardiovascular diseases (CVDs) and their reliability as diagnostic and therapeutic biomarkers in these conditions. CVDs comprise a variety of blood vessels and heart disorders with a high rate of morbidity and mortality worldwide. This necessitates introduction of novel molecular biomarkers for early detection, prevention, or treatment of these diseases. miRNAs, due to their stability, tissue-specific expression pattern and secretion to the corresponding body fluids, are attractive targets for cardiovascular-associated therapeutics. Explaining the challenges ahead of miRNA-based therapies, we will discuss the exosomes as delivery packages for miRNA drugs and promising novel strategies for the future of miRNA-based therapeutics. These approaches provide insights to the future of personalized medicine for the treatment of CVDs.

#### *Edited by:*

*Narasaiah Kolliputi, University of South Florida, USA*

#### *Reviewed by:*

*Gaetano Santulli, University of Naples Federico II, Italy Michael Teng, University of South Florida, USA*

#### *\*Correspondence:*

*Seyed J. Mowla, Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, P.O.Box 14115-175, Tehran, Iran sjmowla@modares.ac.ir*

#### *Specialty section:*

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

*Received: 27 February 2015 Accepted: 17 June 2015 Published: 30 June 2015*

#### *Citation:*

*Nouraee N and Mowla SJ (2015) miRNA therapeutics in cardiovascular diseases: promises and problems. Front. Genet. 6:232. doi: 10.3389/fgene.2015.00232*

Frontiers in Genetics | www.frontiersin.org June 2015 | Volume 6 | Article 232 |

Keywords: miRNA, cardiovascular disease, cardio-miR, exosome, cell communication, delivery vehicle, secretory miRNA, miRNA therapeutics

# Introduction

Among multifactorial diseases, cardiovascular diseases (CVDs) are significant due to their variable symptoms and high mortality rate accounting for one third of global deaths (Santulli, 2013). They include a wide range of disorders connected to blood vessels and heart ranging from coronary artery disease (CAD), pulmonary arterial hypertension (PAH), and congenital heart disease to deep vein thrombosis and cerebrovascular disease. Major CVD risk factors include family history, obesity, hypertension, diabetes mellitus, and hypercholesterolemia (Mendis et al., 2011). Since CVDs are hard-to-cure, several investigations have focused on different mechanisms underlying CVD in order to manage the symptoms. microRNAs have emerged as one of the most favorable molecular targets in this regard. microRNAs are a class of non-coding RNAs with a short length of 18–24 nucleotides. They mainly act as post-translational repressors of gene expression. By regulating the fundamental cellular mechanisms such as cell differentiation, proliferation, growth, and apoptosis, miRNAs have received enormous attention for therapeutic applications. Huge investigations and investments have been made to bring these molecules into the clinic. In this review, we will explain the novel diagnostic, therapeutic and prognostic approaches based on miRNAs in the field of CVDs (cardio-miRs) and the problems ahead of this research area; where we are and where we expect to be in the future.

# miRNA Mechanisms of Action

Calin et al. (2002) were first to link miRNAs with cancer progression. Since then several studies have focused on the role of these small molecules in the pathogenesis of different human diseases. The primary findings emphasized on the posttranscriptional regulatory role of miRNAs through base pairing with the 3 untranslated region (UTR) of their target mRNAs. This leads to the mRNA degradation or translation inhibition and in both cases the miRNA binding results in the suppression of their target mRNAs. This is the main mechanism reported for the miRNAs regulatory role; however, many variations have also been reported (Ha and Kim, 2014; Lin and Gregory, 2015). miRNA binding to the 5- UTR of transcripts has also been demonstrated to be able to activate or suppress target genes (Lee et al., 2009). Some miRNAs can also bind to the open reading frame (ORF) of mRNA transcripts and repress translation. This mechanism was first reported by Tay et al. (2008). They demonstrated that miRNA binding to the coding region of pluripotency genes can regulate the embryonic stem cell differentiation. miRNA binding might also happen at the promoter of target gene which causes repression of gene translation. On the other hand, some miRNAs modulate their target expression by binding to RNA-binding proteins that regulate the expression of mRNA transcripts (Eiring et al., 2010). Salmena et al. (2011) proposed the competing endogenous RNA (ceRNA) hypothesis according to which, miRNAs, long non-coding RNAs (lncRNAs) and target mRNAs are in a finely tuned interaction through miRNA response elements (MREs) and their competition for target binding based on their total concentration in the cytoplasm defines a higher level of regulation. Recent studies have indicated another fascinating aspect of miRNA regulatory network. Fabbri et al. (2012) proved that miRNAs excreted from cancer cells, can directly bind the toll-like receptors (TLRs) at the surface of neighboring immune cells and activate the relevant signaling pathways in the recipient cells. The complexity of miRNAmediated regulatory systems highlights the importance of these small molecules in the clinic and needs further proceeding technologies to get closer to clinical therapies.

# miRNA Diagnostics in CVD

microRNAs show tissue-specific and time-dependent expression patterns turning them into a leading fact in using these small molecules as diagnostic and therapeutic targets (Lu et al., 2005). Similar to other developmental phenomena in embryogenesis, miRNAs play critical roles in cardiovascular development and also their expression profile changes according to different pathological conditions. Cardiac resynchronization therapy (CRT) after heart failure helps improve the heart arrhythmia and it also affects the myocardial miRNA expression. In responder patients, CRT affects cardiac processes including cardiac fibrosis, apoptosis, angiogenesis, and channel alterations and consequently alters the expression levels of miR-29 (implicated in cardiac fibrosis), miR-30, miR-92, and miR-145 (involved with cardiac angiogenesis), miR-30 (modulated in cardiac apoptosis), and miR-26 (affected by modified ionic channel function; Sardu et al., 2014).

Recently circulatory miRNA expression profiling has provided strong molecular markers for detection of various diseases including CVD such as myocardial hyperthrophy, infarction, angiogenesis and fibrosis (Charan Reddy, 2014; Wronska et al., 2015). van Rooij et al. (2006) defined a miRNA signature for cardiac hypertrophy. They showed that miR-195 overexpression is sufficient to drive cardiac hypertrophy. miR-1 family, miR-133a/b and miR-208 are well-known for their implication in myogenesis both in skeletal muscles and cardiac development and their dysregulation has been detected in several CVDs including myocardial infarction, hypertrophy, and arrhythmias (Thum et al., 2008a). While miR-208 showed significant upregulation (pro-hypertrophic), miR-1 and miR-133a expression levels significantly decreased (anti-hypertrophic) in acute myocardial infarction (AMI) compared with normal individuals heart (Bostjancic et al., 2010). In CAD, miR-1 is upregulated in the left ventricular endocardium (Yang et al., 2007) and also plasma levels of miR-1, miR-133, and miR-208b have been reported to be elevated after AMI (Widera et al., 2011; Devaux et al., 2015). However, the circulatory miR-1 is not sensitive or specific enough to be known as an AMI specific biomarker since it is also affected by factors other than AMI. Moreover, by targeting Protein phosphatase 2A regulatory subunit B56 alpha (*PP2A*), miR-1 has been linked to the arrhythmia mainly through hyperphosphorylation of ryanodine receptor (RyR2; Terentyev et al., 2009). Following hypertension or cardiac infarction, cardiac hyperthrophy, and cardiac fibrosis are common diseases in which the excess amounts of extracellular matrix (ECM) proteins accumulate in cardiac tissue in order to adapt the system to the pathological conditions. Overexpression of miR-29 and miR-21 and downregulation of miR-133 and miR-30 have been linked to ECM remodeling and fibrosis by regulating several components of the ECM including collagen type I alpha 1 and 2 (*Col1A1* and *Col1A2*) as targets of miR-29 (van Rooij et al., 2008), sprouty homolog 1 (*SPRY1*) as a target of miR-21 (Thum et al., 2008b) and connective tissue growth factor (*CTGF*) as target of miR-133 and miR-30c (Duisters et al., 2009). miR-133 is specifically expressed in cardiomyocytes while miR-30 is also detectable in cardiac fibroblasts as well as cardiomyocytes. These and other miRNAs (including miR-328) have been linked to atrial fibrillation which is mainly resulted from structural remodeling and fibrosis (Santulli et al., 2014a). **Table 1** summarizes a list of miRNAs involved in different CVDs (cardio-miRs).

# miRNA-Bearing Exosomes as Communicators of CVD

Recent miRNA-based studies have focused on the role of exosomes as natural delivery vehicles for some proteins, mRNA, and miRNAs which facilitate communication between cells and their neighboring stroma. Exosomes are small vesicles (40–100 nm) originating from the plasma membrane or


*AMI, acute myocardial infarction; AP, angina pectoris; NYHA, New York Heart Association.*

multivesicular bodies (MVBs) and are present in almost all biological fluids. They shuttle between neighboring cells and transfer their cargoes which can be cellular components including proteins, mRNA, and non-coding RNAs. These cargoes can perform regulatory effects and control gene expression in the recipient cells (Pegtel et al., 2010). Exosomes are famous for being the communicators of microenvironment. Exosomal membrane proteins as well as other components are usually similar to their originating cell. Their membranes usually contain higher levels of sphingomyelin, cholesterol and phosphatidylserine compared to their originating cells. Their cargo also might represent the cellular composition of RNAs and proteins or they might show a separate profile. Methodologies including immunoblotting, affinity extraction into magnetic beads and flow cytometry have been used to identify the protein components of exosomes. Profiling of RNAs – miRNAs in particular – is usually performed by means of microarrays, qPCR-based arrays and most recently by next generation sequencing techniques which analyze the transcriptome – miRnome – of each sample.

Recent studies have proved targeted packaging of miRNAs and their biogenesis components including Dicer and AGO2 in exosomes (Melo et al., 2014). Exosome shuttling is implicated in a variety of disease including cancer, CVDs or viral infections. In CVD, exosomal transfer of miRNAs is a well-known mechanism through which, cells educate their adjacent environment in order to confront the pathological condition. In heart ischemia or

fibrosis, paracrine or endocrine secretion of miRNA-bearing exosomes into cardiomyocytes or active fibroblasts of the heart have been demonstrated. This leads to trans-differentiation of fibroblasts into an active state with a different rate of growth factors secretion (van Rooij and Olson, 2009; Yoo et al., 2011). Bang et al. (2014) proved that following heart infarction, exosome-secretion from cardiomyocytes with a new miRNA profile, can reprogram the cardiac fibroblasts leading to cardiac hypertrophy. These mechanisms can introduce potential novel therapeutic targets. As well, recognizing tissue-specific CAF markers, the mechanisms underlying their reactivation and the miRNA signals in these processes, will potentially provide promising targets for CVD therapies. In addition to the fibroblasts in the microenvironment, exosomal transfer of signals is also used for cardiovascular protection after a disease. For example in heart ischemia, secretory exosomes with miRNA cargos from cardiomyocytes have been shown to activate and induce the bone marrow-derived stem cells. These latter cells release the second subset of exosomes that lead to myocardial regeneration or protection (Sahoo and Losordo, 2014). Several investigations have demonstrated the selective packaging of miRNAs in exosomes and their secretion to the stroma of cells by means of signal molecules. This is beyond shedding the vesicles by plasma membrane (Yang et al., 2011; Montecalvo et al., 2012; Stoorvogel, 2012). These disease-specific expression patterns of exosomal miRNAs – which can be non-invasively detected in the body fluids – provides potential molecular markers and promising therapeutic targets for treatment of CVDs. Several circulating miRNAs have been linked to some CVDs. In patients with AMI and myocardial injury, higher levels of miR-208a have been detected in the plasma of patients and they showed higher specificity and sensitivity than conventional biomarkers such as cardiac troponin I (TnI) for diagnosis of the disease (Ji et al., 2009; Wang et al., 2010). Another potential biomarker for AMI is reported by Adachi et al. (2010) who detected higher plasma levels of miR-499 in patients with AMI compared with normal individuals. Using microarray analysis of exosomes obtained from cardiac progenitor cells, Gray et al. (2015) introduced 11 miRNAs significantly up-regulated in response to hypoxic conditions while miR-292 showed the highest variations in these exosomes. Moreover, exosome profiling from cardiac fibroblasts was shown to be enriched with miR-21-3p (miR-21∗). These fibroblasts affect cardiomyocytes and secrete the mediators of cardiac hypertrophy (Bang et al., 2014). Knowing the mechanisms underlying these communications and recognizing potential strong biomarkers for diagnosis and treatment of CVDs will help us in development of future molecular therapeutics.

# miRNA Therapeutics in CVD

In the past decade, outstanding researches in the field of miRNA drugs have changed the face of molecular medicine. miRNA-based therapeutics mainly focus on reinstating the miRNA expression levels. Two main approaches include overexpression of downregulated miRNAs and suppression of overexpressed ones. These approaches have been subject to different modifications in order to improve the efficiency of delivery and less off-target effects. Promising tools in this regard are the oligonucleotides that mimic the endogenous miRNA or suppress the mature miRNA by sequence complementarity. Addition of locked nucleic acids (LNAs) or 2- -*O*-methylation of the antisense oligonucleotides increases the binding specificity while cholesterol conjugation enhances the circulation time, serum stability, and cellular uptake. Expression vectors bearing the pre-miRNA sequences or tandem miRNA target sites (sponges) sound promising for *in vitro* overexpression or suppression of miRNAs, respectively. Another issue to overcome is the efficient delivery of each of the miRNA drugs. Naked oligonucleotides are less efficient due to their instability *in vitro* or *in vivo* which subject them to different nucleases. Lipid-based vehicles, viral systems, and cationic polymers are among the main delivery tools for miRNA-based therapeutics. Each of these strategies has its own challenges and still needs improvements to address problems such as cytotoxicity, immunogenicity, and low efficiency (van Rooij and Olson, 2012). Fiedler et al. (2011) compared the effect of different doses of cholesterol-based anti-miR-24 (antagomirs) on miR-24 expression in cardiomyocytes and cardiac endothelial cells. They showed a cell-type specific tendency or mechanism for antagomir uptake by these tissues. Cholesterol-based antagomirs for silencing miR-21 were also proved to inhibit cardiac fibrosis and dysfunction (Kumarswamy et al., 2012). van Rooij et al. (2008) used anti-miR-29b oligonucleotides (cholesterol modified) after myocardial infarction and observed upregulation of ECM proteins leading to cardiac fibrosis. Also, Bonauer et al. (2009) demonstrated that intravenous injection of miR-92a antagomir, improved the function of damaged tissue in models of myocardial infarction. Inhibition of miR-92a results in neoangiogenic effects and functional recovery of ischemic tissues (Bonauer et al., 2009). Another promising miRNA in CVD therapeutics is miR-208 which is implicated in cardiac remodeling. LNA-modified anti-miR-208 oligonucleotides have successfully prevented pathologically associated cardiac remodeling during diastolic heart disease (van Rooij et al., 2007). A common issue in these approaches is *in vivo* instability and the low homing efficiency of oligonucleotides which results in modest changes in the expression levels of the target protein. Accordingly, improved stabilization or efficient delivery vehicles are required. Examples of such vehicles are adeno-associated viruses (AAVs). AAV9, a specific serotype of AAVs, has tropism for myocardiocytes and enriches in the heart (Bish et al., 2008). In another study Santulli et al. (2014b). proved that miR-126-3p down-regulation using an adeno-viral vector containing miR-126-3p target sites inhibits proliferative vascular smooth muscle cells and prevents restenosis in animal models. miR-126 is an endothelial specific miRNA that regulates vascular integrity and angiogenesis. These strategies provide an efficient tool for cardiac gene transfer and targeted delivery of miRNA-based therapeutics.

Due to their natural role in miRNA secretion and shuttling between different cells, exosomes are of great interest in miRNA therapeutics. Exosomes are flexible in size and cargo type and their non-synthetic nature potentiates them for more efficient and non-immunogenic delivery of cargo while they maintain the cargo integrity and stability. Exosomal membranes contain certain proteins which have binding affinity to specific receptors on the surface of recipient cells. So, they can selectively target cell types of interest and manipulating their miRNA components – as well as other molecular cargoes – will provide promising tools for the future of personalized medicine. In CVD, exosomes can be used as therapeutic agents, as protein delivery carriers or as gene therapy devices. Mesenchymal stem cell- derived exosomes have been investigated in the field of cardiac regenerative medicine especially in myocardial ischemia/reperfusion injuries (Lai et al., 2010). Martinez et al. (2006) have demonstrated that exosomes bearing differentiation signals can affect neovascularization and they are promising for treatment of angiogenic defects. Moreover, several exosomal miRNAs are implicated in angiogenesis and vascular repair. Zhang et al. (2010) showed that exosomes enriched with pre-miR-150 enhance endothelial cell migration. They transfected pre-miR-150/anti-miR-150 oligonucleotides to the THP-1 cell line and collected the conditioned media of these cells which was enriched with the exogenous oligonucleotidesbearing exosomes (Zhang et al., 2010). This conditioned medium affects the migration ability of endothelial cells. Exosome are novel promising elements for the future of CVD treatment due to their targeted delivery capacity and their microenvironmentdependent nature. The latter property triggers their activation in relation to the pathological microenvironment such as pH or substrate concentration. However, further studies would be necessary to overcome obstacles such as engineering and purifying exosomes, cargo loading into them and optimizing their quality and characterization for targeted delivery.

## Conclusion

Cardiovascular diseases are the one of the leading causes of death worldwide. Most of CVDs can be prevented by controlling behavioral and environmental risk factors and early diagnosis and management play an important role in this regard. In this review, we focused on miRNAs as small non-coding RNAs involved in a variety of key cellular processes. Several miRNA biomarkers have been introduced for different CV situations. miRNAs have emerged as attractive novel therapeutics and they have several advantages over other molecular therapeutics due to their small size, conserved sequences and their stability in the body fluids. First anti-cancer miRNA-based drug, MRX-34 (a liposomebased miR-34 mimic) developed by Mirna Therapeutics came to the clinic in 2013 for the treatment of hepatocellular

# References


carcinoma (mirnarx.com; NCT01829971). Current achievements have portrayed a promising future for miRNA-based therapeutics although still several obstacles including their stability, renal clearance, off-target effects, inefficient endocytosis by target cells or the immunogenicity of delivery vehicles, need to be overcome. In CVD, some miRNA-based strategies have resulted in promising findings, including anti-miR-126 approaches that resulted in vascular smooth muscle cells and restenosis inhibition (Santulli et al., 2014b). Also, novel methodologies such as exosome-based delivery of miRNA drugs provide reliable evidence to overcome impediments such as inefficient, unspecific delivery, and immunogenic reactions. But these drugs still need investigations on different aspects of miRNA biology, their long term effects, subsequent biochemical and off-target effects and miRNA pathway analysis.

# Acknowledgment

NN is supported by a grant (#92002177) from Iran National Research Foundation (INSF).

implications for a role of microRNAs in myocardial matrix remodeling. *Circ. Res.* 104, 170–178. doi: 10.1161/CIRCRESAHA.108.182535


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Nouraee and Mowla. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Genetic variants in microRNA genes: impact on microRNA expression, function, and disease

*Sophia Cammaerts1, Mojca Strazisar1, Peter De Rijk1 and Jurgen Del Favero1,2\**

*<sup>1</sup> Applied Molecular Genomics Unit, Department of Molecular Genetics, VIB, University of Antwerp, Antwerp, Belgium, <sup>2</sup> Multiplicom N.V., Niel, Belgium*

MicroRNAs (miRNAs) are important regulators of gene expression and like any other gene, their coding sequences are subject to genetic variation. Variants in miRNA genes can have profound effects on miRNA functionality at all levels, including miRNA transcription, maturation, and target specificity, and as such they can also contribute to disease. The impact of variants in miRNA genes is the focus of the present review. To put these effects into context, we first discuss the requirements of miRNA transcripts for maturation. In the last part an overview of available databases and tools and experimental approaches to investigate miRNA variants related to human disease is presented.

#### *Edited by:*

*Mohammadreza Hajjari, Shahid Chamran University of Ahvaz, Iran*

#### *Reviewed by:*

*David Langenberger, ecSeq Bioinformatics, Germany Jun Yasuda, Tohoku Medical Megabank Organization, Japan*

#### *\*Correspondence:*

*Jurgen Del Favero, Applied Molecular Genomics Unit, Department of Molecular Genetics, VIB, University of Antwerp, Universiteitsplein 1, Antwerp 2610, Belgium jurgen.delfavero@molgen.vib-ua.be*

#### *Specialty section:*

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

*Received: 30 March 2015 Accepted: 05 May 2015 Published: 21 May 2015*

#### *Citation:*

*Cammaerts S, Strazisar M, De Rijk P and Del Favero J (2015) Genetic variants in microRNA genes: impact on microRNA expression, function, and disease. Front. Genet. 6:186. doi: 10.3389/fgene.2015.00186* Keywords: microRNA, genetic variants, expression, function, disease

### Introduction

miRNAs are short, non-protein-coding RNA molecules that mediate post-transcriptional regulation by affecting mRNA stability and translational repression or activation (Vasudevan et al., 2007; Filipowicz et al., 2008).

miRNAs were first discovered in 1993 in *Caenorhabditis elegans* (Lee et al., 1993). In 2000– 2001 many other miRNA genes were identified in *C. elegans* and miRNAs were shown to be widely present in other species (Pasquinelli et al., 2000; Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001). Since then, given their impact on expression, miRNAs have been extensively studied and the miRBase database, the central miRNA sequence repository, continues to expand, collecting novel miRNA genes in a wide variety of species (Kozomara and Griffiths-Jones, 2014). Even in the human genome, with 1881 precursors and 2588 mature miRNA sequences deposited in miRBase version 21, many novel miRNA genes are being identified and published continuously (Friedländer et al., 2014; Cheng et al., 2015; Londin et al., 2015). Despite the extensive research, we are still uncovering aspects of the miRNA maturation process and we have to date only a limited understanding of the functions of specific miRNAs.

miRNAs add a layer of complexity to gene regulation. They target transcripts mainly by complementarity to the seed region, nucleotides (nt) 2–7 of the mature miRNA molecule (Bartel, 2009). This minimal requirement for complementarity results in targeting of many mRNAs by a single miRNA and targeting of one mRNA by several miRNAs (Lim et al., 2005; Peter, 2010; Wu et al., 2010). In addition, miRNAs can have sequence and length variability (isomiRs), potentially resulting in altered targeting capacity and/or specificity. One of the aspects that further influences the complexity of the miRNA repertoire are genetic variants.

This review provides an overview of the current knowledge of the influence of genetic variants on miRNA biogenesis and function. First, miRNA biogenesis and requirements of the miRNA processing enzymes are described, followed by a discussion of isomiRs and their functional implications. In the second part, we highlight the effect of genetic variants on the expression and functioning of miRNAs. In the last part, approaches that are used to identify miRNAs involved in diseases are discussed, with a focus on genetic strategies. Furthermore, we present available tools, databases, and experimental approaches to aid functional characterization studies of disease-associated miRNA variants.

# MicroRNA Biogenesis and Generation of IsomiRs

miRNA genes can be classified in several categories according to their location in the genome. A majority of the currently known human miRNA genes deposited in miRBase are intergenic (68%). Of the intragenic miRNAs, most are intronic (12% of all genes). The remaining genes are located in repeats, lncRNAs, UTRs, or coding regions of host genes (Londin et al., 2015). miRNA genes are often located close to other miRNA genes in so-called clusters. miRNAs located in host genes or clustered with other miRNAs can be cotranscribed, which is supported by good correlation of their expression patterns for several genes (Baskerville and Bartel, 2005). Alternatively, intronic miRNAs can be transcribed independently from their host gene and polycistronic miRNA transcripts can undergo alternative splicing to yield specific miRNA expression (Ramalingam et al., 2014).

In the canonical biogenesis pathway, primary miRNA transcripts (pri-miRNAs) are cleaved in the nucleus by the Microprocessor complex, a complex consisting of ribonuclease Drosha and its cofactor DGCR8, to release the shorter precursor miRNA (pre-miRNA). This precursor molecule is transported to the cytosol by the nucleocytoplasmic transport protein Exportin-5, where it is cleaved by the ribonuclease Dicer to result in a mature miRNA duplex consisting of the mature 5p and 3p strands. The duplex is loaded onto an Argonaute protein, the core component of the RNA induced silencing complex (RISC). One of the strands is discarded, while the other mediates posttranscriptional regulation by base pairing to target mRNAs.

There are different types of target sites. Canonical sites for miRNA targeting are seed matches (nt 2–7 of the mature miRNA) with an adenine opposite from the first nt of the mature miRNA and/or complementarity to the eighth nt of the miRNA. This seed match can be supplemented with complementarity to nt 13–16 (3- supplementary sites). Extended sites of complementarity at the 3 end of the miRNA can also compensate for seed mismatches (3 compensatory sites) (Bartel, 2009). Though a majority of the miRNA-target interactions include seed matches, many non-seed interactions are also observed (Helwak et al., 2013). Similarly, though mRNAs are the predominant miRNA targets, 30% of all miRNA targets are other classes of RNA, such as rRNA, tRNA, snRNA, miRNA, lincRNA, and pseudogenes (Helwak et al., 2013).

Mirtrons, miRNAs encoded within short introns of host genes, bypass processing by the Microprocessor complex. By splicing the mirtron is released and can be, if needed after 5 or 3 tail trimming, transported to the cytosol to be further processed by Dicer. In the study of Ladewig et al. (2012), 240 human mirtrons were identified, indicating that bypassing Drosha is not a rare phenomenon. Other non-canonical miRNA biogenesis pathways have also been discovered and were recently reviewed by Abdelfattah et al. (2014). In the next sections, we will focus on the general miRNA characteristics and substrate requirements of the key enzymes of the canonical miRNA maturation pathway.

# Maturation Enzyme Requirements

Microprocessor Complex Drosha, a nuclear ribonuclease, is the enzyme that cleaves the miRNA hairpin from the primary transcript, whereby it generates a precursor with 3 end overhangs of 2 nt (Lee et al., 2003). An essential cofactor for its pri-miRNA processing activity is the protein DGRC8. The complex containing Drosha and DGCR8, also called the Microprocessor complex, is sufficient for primiRNA processing (Denli et al., 2004; Gregory et al., 2004; Han et al., 2004; Landthaler et al., 2004). The capacity of DGCR8 to bind pri-miRNAs is necessary for Drosha cleavage (Yeom et al., 2006). DCGR8 trimerizes upon binding of primiRNAs and formation of this protein-pri-miRNA complex may allow specific recognition of pri-miRNAs as opposed to other types of RNA and trigger subsequent cleavage by Drosha (Faller et al., 2010).

Several common features of pri-miRNAs were determined by thermodynamic profiling of secondary structure predictions on a set of human and fly pri-miRNAs: the transcript contains a local hairpin structure with an imperfectly paired stem of ∼33 base pairs (bp), consisting of an upper stem (∼22 bp, region of the mature miRNA duplex) and a lower stem (∼11 bp), a terminal loop region and flanking segments that are usually singlestranded or have large bulges or internal loops (Han et al., 2006). Essential requirements of pri-miRNAs for Drosha processing were experimentally determined. The stem-loop structure within the pri-miRNA needs to be larger than the pre-miRNA: it needs an extended stem (i.e., the lower stem, ∼11 bp) and an unstructured region flanking the hairpin (Lee et al., 2003; Chen et al., 2004; Zeng and Cullen, 2005; Zeng et al., 2005; Han et al., 2006; Auyeung et al., 2013). *In vitro* cleavage assays showed that Drosha was able to process pri-miR-30a and pri-miR-16-1 with as little as ∼20–25 nt of flanking regions (Lee et al., 2003; Han et al., 2004). However, for correct *in vivo* processing longer flanking regions are required (Chen et al., 2004). The presence of a large unstructured terminal loop region is beneficial for Drosha processing, as reduction of the size of the predicted terminal loop results in reduced processing (Zeng et al., 2005; Han et al., 2006; Zhang and Zeng, 2010). These properties correspond with the *in silico* determined characteristics of known miRNA hairpins.

Two different models have been proposed to explain how Drosha determines where to cleave the pri-miRNA. One study proposed that the Microprocessor mainly determines the distance from the terminal loop region of the hairpin and cleaves ∼22 bp upstream (Zeng et al., 2005). Another hypothesis proposed that Drosha predominantly cleaves ∼11 bp away from the junction between the lower hairpin stem and the flanking regions (Han et al., 2006). A recent study showed that the complex determines the distance to both ssRNA–dsRNA junctions and that both distances need to be optimal in order to result in precise cleavage (Ma et al., 2013).

Next to the structural prerequisites of the hairpin, positionspecific sequence motifs present in the hairpin flanking segments of the primary transcript (UG motif, CNNC motif) and in the loop region (UGU or GUG motif) can enhance processing efficiency in human cells. Nearly 80% of the human pri-miRNAs that are conserved between human and mice contain at least one of these motifs (Auyeung et al., 2013).

Exportin-5 The transport of pre-miRNAs from the nucleus to the cytosol is mediated by Exportin-5 in the presence of cofactor RanGTP (Yi et al., 2003; Bohnsack et al., 2004; Lund et al., 2004). The RNA structure is the main determinant for precursor binding: a stem of at least 18 bp and a blunt end or a 3 overhang, such as the overhang created by Drosha (Lee et al., 2003), is preferential (Zeng and Cullen, 2004). Upon binding, the 3- 2 nt overhang of the precursor and a large part of the stem is bound by Exportin-5- RanGTP in a sequence-independent manner, which protects the precursor from degradation by nucleases (Bohnsack et al., 2004; Zeng and Cullen, 2004; Okada et al., 2009).

Dicer Dicer, a ribonuclease that processes dsRNA into duplexes with a length of ∼22 nt, processes pre-miRNAs into mature miRNA duplexes (Bernstein et al., 2001; Grishok et al., 2001; Hutvágner et al., 2001; Ketting et al., 2001; Knight and Bass, 2001). Dicer cleaves its substrates gradually with a preference to start at the termini of the RNA duplex and generates 2 nt 3 overhangs (Zhang et al., 2002, 2004). The structure of the termini contributes to the size of the end product, with 2 nt 3 overhangs resulting in products less than 24 nt, while blunt ends result in longer fragments (Vermeulen et al., 2005). Dicer efficiency is affected by several substrate parameters, such as the size of the loop (in case of pre-miRNAs) and the sequence and the size of the overhangs, with 3 overhangs of 2 nt being the most efficient substrate for human Dicer (Vermeulen et al., 2005; Lund and Dahlberg, 2006; Zhang and Zeng, 2010; Park et al., 2011). Dicer binds both the 5 and 3 end of the precursor molecule and mainly cleaves at a distance of 22 nt from the 5 end (Park et al., 2011). The accuracy of the cleavage seems to depend on the distance between this canonical cleavage site and the nearest bulge or the terminal loop (Gu et al., 2012). The Drosha-mediated generation of pre-miRNAs with a 3- overhang of 2 nt and a stem of ∼22 bp thus provides suitable substrates for Dicer. This results in the removal of the loop by Dicer and acquirement of mature miRNA duplexes of ∼22 nt bearing 2 nt overhangs at the 3 end of both strands, on one side generated by Drosha and at the other side generated by Dicer.

Argonaute The mature miRNA duplex is loaded onto an Argonaute protein by the RISC loading complex. This complex matures to RISC when one of the strands, termed the passenger strand, is removed (Czech and Hannon, 2011). The other strand, termed the guide strand, can then, in complex with Argonaute, target RNAs via sequence complementarity for posttranscriptional regulation. The process of strand selection is asymmetric: for many miRNAs one of the strands is preferentially retained. This bias is in part explained by the relative stability of the 5 end of both strands, where the strand with the lowest stability is preferentially retained within RISC (Khvorova et al., 2003; Schwarz et al., 2003). Another contributing factor is the sequence composition of the strands: for human miRNA duplexes with a large difference in expression between both strands, the highest expressed strand has a bias toward a uracil at position 1 and a high purine content, while the lower expressed strand has a bias toward a cytosine at position 1 and a high pyrimidine content (Hu et al., 2009). However, as the dominant strand can vary between tissues (Cloonan et al., 2011), other, yet unknown, factors than sequence and structure of the miRNA duplex itself must be contributing to this variable selection bias.

### Proteins Involved in Regulation of Biogenesis

In addition to the biogenesis key proteins described above, a number of other proteins are known to be involved in the regulation of miRNA biogenesis. The following examples highlight the relevance of the terminal loop region in this regulation. The first example is the protein KSRP. This protein binds directly to the terminal loop of a set of miRNAs and enhances their processing by Drosha and Dicer (Trabucchi et al., 2009). Another example is the regulation by Lin28 and terminal uridyl transferase TUTase4. Lin28 binds a GGAG sequence motif in the terminal loop of specific pre-miRNAs, including let-7 family members, upon which TUTase4 binds to the Lin28-premiRNA complex and uridylates the 3 end of the pre-miRNA. This oligo-uridylation prevents processing by Dicer (Heo et al., 2008, 2009; Hagan et al., 2009). In addition to the Lin28-mediated inhibitory role of TUTase4, this enzyme (and other terminal uridyl transferases) also mono-uridylates a subset of pre-let-7 family members and other miRNAs with a 3- 1 nt overhang, promoting Dicer processing (Heo et al., 2012). An overview of all proteins involved in the regulation of miRNA biogenesis can be found in Siomi and Siomi (2010).

#### IsomiR Repertoire

Mature miRNAs originating from one arm of a pre-miRNA can have sequence and length heterogeneity *in vivo*, with sequences and lengths closely related to, but different from the canonical miRNA sequence reported in miRBase. Such alternative sequences are termed isomiRs (Morin et al., 2008). Different types of isomiRs have been observed: templated additions or deletions at the 5 and/or 3 end of the miRNA, non-templated additions at the 3 end and substitutions within the sequence (Landgraf et al., 2007; Morin et al., 2008; Martí et al., 2010). Alternative processing by Drosha or Dicer, Dicer-independent Argonaute2-mediated cleavage and exonuclease degradation of the 3 terminus of miRNAs can be a source of templated isomiR genesis. Non-templated changes can be established by nucleotidyl transferases, posttranscriptional editing or due to the presence of genetic variants within the transcript (Morin et al., 2008; Cloonan et al., 2011; Neilsen et al., 2012; Lee et al., 2013; Ma et al., 2013).

Since miRNAs target transcripts via imperfect sequence complementarity, isomiRs with a changed sequence for the nucleotides contributing to the target specificity are expected to affect the target spectrum of the miRNA. Given that the majority of miRNA-target interactions include matches to the miRNA seed (Helwak et al., 2013), isomiRs with an altered seed sequence, such as 5 isomiRs, may have a large influence on target specificity. Sets of isomiRs with an altered seed sequence were predicted to gain the capacity to regulate many additional genes and/or lose the capacity to regulate a subset of genes compared to the canonical miRNA (Gong et al., 2012; Tan et al., 2014). Gain or (partial) loss of function of isomiRs with altered seed sequence and even of isomiRs with a preserved seed has been confirmed experimentally (Kawahara et al., 2007; Gong et al., 2012; Chan et al., 2013; Tan et al., 2014). The size of the effect will, however, be largely determined by the absolute and relative expression and stability of the isomiR compared to the canonical miRNA *in vivo* and its binding efficiency to RISC, which can also be affected by the sequence variability (Chan et al., 2013; Llorens et al., 2013). Alternatively, it has been suggested that isomiRs, which often show a high expression correlation with their canonical miRNAs, may function to regulate the same pathways as the canonical miRNA and that the sequence heterogeneity within isomiR populations may act to reduce offtarget effects (Cloonan et al., 2011). The broader functional implications of isomiRs, their contribution to gene expression regulation and how their expression is controlled remains to be further elucidated.

# Impact of Genetic Variants on miRNA Expression and Function

As presented above, miRNA transcripts need to fulfill structural and sequence prerequisites in order to result in expression of the correct mature miRNA sequence(s). Sequence variability in miRNA genes can therefore influence both the expression level as the functionality of the miRNA and consequently will result in differential regulation of their target genes.

Large-scale *in silico* analyses of single nucleotide polymorphisms (SNPs) in human miRNA genes have demonstrated that miRNA genes have lower SNP densities than their flanking regions or the human genome (Saunders et al., 2007; Gong et al., 2012; Han and Zheng, 2013). Within the miRNA gene the mature sequence has a lower SNP density than the precursor, with the seed having the lowest SNP density, reflecting their functional importance. In addition, it was shown that there is a negative correlation between the number of SNPs a miRNA gene harbors and the number of diseases the miRNA is associated with (Han and Zheng, 2013). This again highlights the importance of genetic variants in miRNA genes related to their function and their involvement in human diseases.

Genetic variants can affect miRNAs on several levels. Variants in miRNA promoter regions and other regulatory regions may result in an altered transcription rate. Variants in splice sites of the host gene (for intronic miRNAs) or of the polycistron (clustered miRNAs) could result in aberrant expression patterns. Next, variants within the miRNA transcript can have an effect on miRNA maturation in multiple aspects. They can change the binding affinity of the miRNA hairpin to biogenesis enzymes or accessory proteins (sequence motifs or structural motifs that are changed due to the modified underlying sequence). Variants can lead to altered processing accuracy or to changed frequency of alternative cleavage sites of biogenesis enzymes. They can also lead to altered strand loading bias into RISC. These changes in maturation can all result in altered expression of the canonical miRNA and its existing isomiRs, resulting in deregulation of target genes. It may also result in production of novel isomiRs, which can lead to altered functionality of the miRNA. Lastly, genetic variants within the mature sequence can affect target specificity by generating isomiRs.

Given the potentially huge impact of genetic variants on the tightly regulated miRNA repertoire, and the importance of miRNA-mediated gene regulation, it is not surprising that genetic variants have been found to be causal for or associated with human diseases (Mencía et al., 2009; Hughes et al., 2011). Variants in miRNA biogenesis enzymes and in miRNA binding sites can also lead to impaired miRNA regulation. However, the latter category of variants would likely have a smaller impact than the former two categories, because only the targeting of one gene by one miRNA would be disrupted in case the target does not contain multiple binding sites for that miRNA. Genetic variants in miRNA genes, biogenesis genes and target binding sites associated with human diseases were recently reviewed by Kawahara (2014). Here, we will focus on illustrating the different effects genetic variants can have on miRNA transcription, maturation, and targeting with examples from human miRNA variants studied in disease where possible.

### Variants Influencing miRNA Transcription or Splicing

Though bio-informatics approaches have been used to predict miRNA gene promoters (Saini et al., 2007; Marsico et al., 2013), many miRNA gene promoters have not yet been experimentally validated. Below we describe an example of a variant in a miRNA promoter, which was experimentally delineated. The second example is a case of a variant in a host gene promoter for an intronic miRNA cluster. However, experimental investigation of this type of variants can be complicated by the presence of an independent miRNA promoter in addition to the host gene promoter.

SNP rs57095329 is located in the promoter region of MIR146A, 17 kb upstream of the pre-miR-146a sequence. This variant was found to be associated with systemic lupus erythematosus (SLE). The risk allele reduces the binding to transcription factor Ets-1, another SLE susceptibility gene, and results in decreased promoter activity. Consistent with this mode of action, risk allele carriers have lower expression of miR-146a-5p (Luo et al., 2011).

SNP rs999885 is located in the promoter region of the protein-coding gene MCM7. This variant was associated with a decreased risk of chronic hepatitis B infection, but also with an increased risk of hepatocellular carcinoma (HCC) in individuals with chronic hepatitis B virus infection. The miR-106b-25 cluster, coding for miR-106b, miR-93, and miR-25, is located in the 13th intron of MCM7. HCC patients that carry the risk allele have higher expression of the miRNA transcript in non-tumor liver tissue than non-carriers (Liu et al., 2012). Recently it was shown that this cluster can be transcribed independently of MCM7 via different promoters and that the miRNA polycistronic transcript can undergo alternative splicing (Ramalingam et al., 2014). Therefore it would be very interesting to uncover whether and how this variant influences the relative and absolute expression of the different mature miRNAs in this cluster.

To the best of our knowledge, variants in splice sites that affect miRNA expression in human disease have not been published yet. Nevertheless, the effect of host gene splice variants on mature miRNA expression has been demonstrated by mutagenesis experiments (Janas et al., 2011).

#### Variants Influencing miRNA Maturation

Variants within the pri-miRNA can affect miRNA expression levels by inducing changes in the maturation process, as described above. This effect can take place at several steps in the maturation pathway: Drosha processing, Dicer processing, and/or altering strand preference.

An example of a variant with and effect on the Drosha processing step is rs2910164. This variant is located in the seed sequence of miR-146a-3p and was predicted to lead to a mismatch in the hairpin stem. The heterozygous variant genotype was associated with increased risk of papillary thyroid carcinoma (PTC). After overexpression of pri-miR-146a with the G allele or the C allele in cells, the C allele resulted in nearly twofold reduction of pre-miR-146a compared to the G allele. The reduced production of pre-miR-146a from primiR-146a with a C allele was also confirmed by an *in vitro* processing assay (Jazdzewski et al., 2008). An example of a primiRNA variant affecting the first processing step is a variant located four bases upstream of the miR-510-5p sequence. This variant was identified in a study screening X-linked miRNA genes in male schizophrenia patients and control individuals. The variant was predicted to alter the secondary structure around the Drosha processing site. Functional validation showed that the variant results in increased expression of pre-miR-510, miR-510-5p, and miR-510-3p (Feng et al., 2009; Sun et al., 2009).

A variant affecting Dicer processing is rs546098287, located within the seed sequence of miR-96-3p. This variant was identified as the causative mutation in an Italian family with non-syndromic hearing loss (Soldà et al., 2012). Two segregating point mutations in miR-96-5p were previously identified in two Spanish families with non-syndromic hearing loss, demonstrating the importance of this gene in the disease pathogenesis (Mencía et al., 2009). The mutation in miR-96- 3p reduces the processing from pre-miR-96 to both miR-96-5p and miR-96-3p. This is likely induced by a structural change, because the variant was predicted to enlarge an internal loop in the hairpin stem and when a second variant was introduced to restore the original stem structure, the expression was largely brought back to normal levels (Soldà et al., 2012).

An example of a variant influencing the relative strand abundance is a variant in MIR133A2, identified in a patient with atrial fibrillation. The variant is located at the 3 end of the miR-133a-3p sequence and was predicted to alter the secondary structure of the hairpin at the base of the miR-133a duplex. miR-133a was found to be highly expressed in atrial tissue, with the 3p strand being the dominant one, while miR-133a-5p represented less than 1% of all the miR-133a reads. Functional validation experiments demonstrated that the variant resulted in increased miR-133a-5p expression, but had no significant effect on miR-133a-3p expression. The variant thus increases the relative abundance of the 5p strand compared to the 3p strand (Ohanian et al., 2013).

The cases described above are all variants located either within the precursor sequence or close to it. A study by Diederichs and Haber (2006) reported the functional validation of 15 primiRNA variants ranging from 5 to 133 nt outside the pre-miRNA sequence, identified in cancer cell lines. Despite some of these variants had predicted structural changes at or near the base of the miRNA hairpin, none of the variants displayed detectable differences in mature miRNA expression. On the other hand, for the variant rs11134527, which is located 96 nt outside the MIR218-2 hairpin sequence, it was shown that it does change the expression of miR-218-5p significantly (Gao et al., 2013). Therefore variants located further outside the hairpin can still affect miRNA biogenesis, although variants closer to the hairpin are more likely to have a higher impact.

#### miRNA Variants Influencing Targeting

Beside resulting in altered expression, miRNA variants can also result in altered target specificity by creating isomiRs.

An interesting example of a variant creating a polymorphic isomiR is rs2910164 in miR-146a-3p. This variant also affects Drosha processing as described above (Jazdzewski et al., 2008). The isomiR and canonical miR-146a-3p are predicted to share only a small set of target genes. Transcriptome analysis in healthy and in tumor tissue of PTC patients with GG or GC genotype showed that hundreds of genes (358 in unaffected tissue, 575 genes in tumor tissue) were differentially expressed (Jazdzewski et al., 2009). Even though this does not provide direct evidence that all those genes are direct targets of the isomiR, it does give insight in the large-scale downstream changes isomiRs may induce.

Direct evidence of altered targeting was shown in a study investigating the effect of eight seed SNPs and three mature sequence SNPs (among which one deletion) identified in dbSNP in nine miRNA genes. Four of the tested miRNAs (miR-627-5p, miR-379-5p, miR-499-3p, and miR-124-3p) partially or completely lost their potential to suppress their original target due to a SNP in their seed sequence. The seed sequence variant rs2620381 in miR-627-5p not only resulted in loss of targeting of SEMA3F, but also in gained capacity to suppress ATP6V0E1 (Gong et al., 2012). In addition, miR-379-5p, miR-940, and miR-34a-5p containing variants in the mature sequence also displayed partial or total loss of function of their original mRNA targets tested.

# Studying miRNAs Involved in Disease

#### miRNA Profiling and Genetic Studies

To study if and which miRNAs play a role in the pathogenesis of a disease, two main strategies can be used: a miRNA profiling approach and a genetic approach.

In the miRNA profiling approach, the affected tissue of a group of patients is subjected to miRNA expression profiling and compared to the expression profile of either the same tissue in healthy individuals, tissue of patients with a different disease or subphenotype, or to adjacent non-affected tissue of the same patients (Calin et al., 2005; Kan et al., 2012; Feliciano et al., 2013; Huang et al., 2014). The tested miRNAs can range from selected candidate miRNAs to a genome-wide approach and different technologies are available, such as RT-qPCR, microarray based methods or RNA sequencing (Pritchard et al., 2012). The miRNA expression profiles are compared between the disease phenotype and the controls in order to determine whether and which miRNAs are differentially expressed. Resulting signatures may be used as diagnostic or prognostic biomarkers (Calin and Croce, 2006) and can provide clues about the role of miRNAs in the disease mechanism. Additionally, interesting miRNAs can be further investigated to dissect their mode of action.

In the genetic approach, the starting point is a linkage or an association analysis (genome-wide or candidate gene approach) including but not necessarily limited to miRNA genes between patients and control individuals (Mencía et al., 2009; Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium, 2011; Zhang et al., 2015). When a genetic variant in or near a miRNA gene is associated with or is causal for the disease, functional validation can be initiated to assess its effect and its role in the pathogenesis of the disease. Variants in miRNA genes linked to the disease could also be used for disease diagnosis, as exemplified by the patent on rare miRNA gene variants identified in schizophrenia patients (Sommer and Rossi, 2012). The genetic screening approach focused on miRNA genes is not yet as frequently applied on a large scale as genome-wide miRNA profiling.

Both strategies have limitations and benefits. General considerations regarding miRNA profiling were reviewed by Pritchard et al. (2012). In the study of disease, genome-wide miRNA profiling can provide a global insight in which miRNAs may play a role in the disease by generating disease-specific miRNA expression signatures. An advantage of this approach is that the deregulated miRNA provides a first indication of its possible functional role in the disease. The main limitation, however, is that the differential expression of a miRNA in itself does not provide an indication of causality: the altered expression can be a causal factor, a consequence, or could even be unrelated to the disease pathogenesis (Kan et al., 2012). Another constraint is that the affected tissue has to be accessible. This is especially problematic in the study of neurological diseases. Alternatively, brain tissue of deceased patients and healthy individuals, if available, can be used. However, though many miRNAs seem to be relatively stable (Bail et al., 2010; Winter and Diederichs, 2011), the half-lives of some miRNAs are less than 1 hour (Sethi and Lukiw, 2009). This implicates that when brain samples with different postmortem delay from death to tissue processing are compared, variability may arise due to differential instability of the miRNAs in the samples. The restricted access to postmortem brain samples can also result in suboptimal sample sizes. A much more accessible proxy for unavailable affected tissues is blood. While this may provide a good avenue for determining diagnostic or prognostic biomarkers, it is not clear whether the miRNAs deregulated in blood would also be deregulated in the affected tissue. In addition, several factors can influence miRNA expression in blood, such as age, drug use, and other diseases (Margis et al., 2011; de Boer et al., 2013; Meder et al., 2014).

The main advantage of a genetic approach is that when a variant is associated with disease risk, it gives an indication that the miRNA involved has a primary effect in the disease pathomechanism (large or small contributing effect), which cannot be determined by expression profiling. This advantage also has a downside: while the association or linkage analysis may indicate a miRNA variant is likely to be involved in the disease, it is not at all clear whether the identified variant has a functional effect and whether the expression of that miRNA is altered in the disease state. Another asset of the genetic approach is that diseases for which the main affected tissue is not accessible can still be studied because genomic DNA can be isolated from more accessible patient material, such as blood or saliva. There is, however, still a chance of missing a mutation, if the patient is mosaic for the mutation and the load of the mutation is lower in the examined tissue than in the affected tissue.

Both approaches are complementary and should both be used to determine the involvement of a miRNA in relation to a disease: genetic approaches can pinpoint the causal or risk factors involved in the disease, while expression studies reveal whether and how the expression of the disease-associated miRNAs is altered in the affected tissue. Neither approach is sufficient to determine both. An example of a study combining both approaches is the study of Calin et al. (2005) where miRNA profiling in affected tissue of patients was performed, followed by variant screening in a set of relevant miRNA genes. An expression signature of thirteen miRNA genes was found to be able to distinguish between patient subgroups. In two of these genes, patient-specific variants were found. One of the variants was a functional germline mutation in pri-miR-15a/16-1. On the other hand, when a genetic variant is associated with a disease, usually only the expression of that particular miRNA is assessed in patient material. The profiling and genetic approach has also been combined on a large scale by miRNA profiling in blood samples of patients with different diseases and investigating whether the deregulated miRNA genes were located within 250 kb of published GWAS significant SNPs (Keller et al., 2011). While on average 103 deregulated miRNAs were identified in each disease, of all the deregulated miRNAs they found only six miRNA genes in proximity to a GWAS hit associated with the same disease. Combining the genetic and the profiling approach on the same set of patients could thus provide an invaluable amount of information on both DNA and RNA level from which the diseasecontributing miRNAs can be separated from the miRNAs that are probably deregulated due to secondary disease processes.

#### Databases and Tools for Genetic Approaches

To provide direct evidence that a genetic variant in or near a miRNA gene and the deregulated miRNA expression in disease tissue are linked, detailed functional validation is required. Functional validation is labor-intensive, costly, and time-consuming, therefore researchers usually prioritize the potential candidate variants using *in silico* approaches. In this section, we will focus on available databases and tools for the impact prediction of genetic variants on miRNA function (**Tables 1** and **2**). Different types of resources for miRNA research have been reviewed by Vlachos and Hatzigeorgiou (2013).

A first step in the assessment of the relevance of a genetic variant or a set of genetic variants is determining whether the variant is located near or in a specific region of a miRNA gene. For a single variant of interest, the variant location may be assessed in a genome browser to determine the position of the variant compared to neighboring miRNA genes. To evaluate the exact location of the variant relative to the different miRNA gene regions (e.g., seed, mature, terminal loop), one also has to compare this location with the information provided about the gene in miRBase (Kozomara and Griffiths-Jones, 2014). This procedure is tedious and error-prone, especially when investigating multiple miRNA variants. For variant lists such as those derived from massively parallel sequencing projects, the first annotation step can be done with any of the annotation tools available to determine which (miRNA) genes, if any, are located near the variants.

Online repositories compiling information about known SNPs in miRNA genes, such as miRNASNP (Gong et al., 2012), miRNA SNiPer (Zorc et al., 2015), and miRvar (Bhartiya et al., 2011), can be used to extract exact location information for these variants relative to the miRNA hairpin. A limitation for the last two databases is that the search needs to be initiated by using the miRNA gene name instead of the SNP ID. In addition, databases cannot be used when investigating novel variants and are less suitable when investigating multiple variants in several miRNA genes. To eliminate the time-consuming processes related to assessing the position of variants in the context of miRNA genes and what their predicted structural impact is, we recently developed the freely available software miRVaS1 . Required input is the genomic location of a (known or novel) variant or of a list of variants.

The next possible step in determining the potential relevance of a genetic variant is predicting its effect on the secondary structure of the miRNA. For known SNPs within miRNA genes, structure prediction results of the miRNA hairpin with and without the variant can be found in the miRNASNP database (Gong et al., 2012). The difference in free energy is calculated and a predicted effect on the miRNA expression is provided, based on the assumption that variants destabilizing the hairpin

1http://mirvas*.*bioinf*.*be/


TABLE 2 | Overview of described software.


will reduce the expression and stabilizing variants will increase the expression of the mature miRNA (Gong et al., 2012). For novel variants the RNA secondary structures can be predicted for wild type and variant sequences using web-based RNA structure prediction tools, such as the Mfold or RNAfold web servers (Zuker, 2003; Gruber et al., 2008), and compared. Again, this process is time-consuming and not suitable for the analysis of large variant sets. miRVaS<sup>1</sup> automates this type of analysis, predicting structural impact for lists of variants, needing only the genomic location and alternative allele of the variant.

If a variant is located in a miRNA seed sequence, the effect on target binding can also be assessed *in silico*. Both PolymiRTS database (Bhattacharya et al., 2014) and miRNASNP (Gong et al., 2012) contain lists of predicted gain and loss of target binding sites for seed sequence variants that can be browsed. To assess the impact of variants in miRNA promoter regions on transcription factor binding, the dPORE-miRNA database can be queried (Schmeier et al., 2011). Algorithms to predict Dicer cleavage patterns and preferential strand loading into RISC, such as PHDcleav and RISC binder (Ahmed et al., 2009, 2013) could also be useful to assess the impact of variants. The predicted effect on Dicer cleavage and RISC binding induced by variants is incorporated in the miRvar database (Bhartiya et al., 2011).

#### Functional Characterization of a Disease-Associated miRNA Variant

Using tools and databases, a set of genetic variants can be prioritized based on their potential functional relevance. The most promising candidates can then be chosen for direct functional validation in cell-based assays. The assays used depend on the location of the variant and the hypothesis.

For variants in miRNA promoter regions, the wild type and variant promoter can be cloned into a promoter-less reporter gene vector and transfected (transiently or stably) into cell lines to assess whether there is a difference in promoter activity due to the variant and whether the variant promoter has different binding affinity to transcription factors (Luo et al., 2011).

To assess the impact of a genetic variant on miRNA biogenesis and/or targeting, cell lines overexpressing wild type and mutant miRNAs are usually established. The constructs should include the full precursor gene and upstream and downstream flanking sequences. The inclusion of flanking regions is crucial: the biogenesis machinery in the cell may not efficiently recognize or process the transcript when the flanking regions are absent or too small, as evidenced by the study of Chen et al. (2004) where only constructs for miR-223 with at least 40 nt of flanking region resulted in detectable miR-223 expression. Using upand downstream flanks of 125 nt resulted in mature miRNA expression for all 13 tested miRNAs in this study, showing that the incorporation of longer flanks in miRNA constructs is generally a good strategy. To assess the effect on biogenesis, miRNA expression analysis is performed. Comparison of wild type and variant miRNA levels can be done at the level of the mature miRNA (Shen et al., 2009), assessing the total effect of the variant on the whole biogenesis process, or it can be done on several maturation levels of the miRNA, to deduce the exact step of the biogenesis that is affected by the variant (Duan et al., 2007).

If a variant is located in the mature miRNA sequence, targeting may be affected. Investigating which genes and pathways are affected due to the variant may provide insight in the disease mechanism by identifying new pathways and/or by confirming pathways previously hypothesized to be involved in the disease. Investigation and validation of direct interactions between miRNA and target RNAs can be done using reporter assays. Hereto, the potential binding site is cloned downstream of a reporter gene and cells are cotransfected with the miRNA and the reporter vector to assess whether the miRNA (or the variant miRNA) can target the site and thus reduce the reporter gene activity (Duan et al., 2007; Gong et al., 2012). While this allows confirmation of direct interactions, overexpression of the binding site and the miRNA may also lead to validation of interactions that may not take place endogenously in the cell type or tissue of interest, due to, for instance, absence of colocalization, or co-expression.

Investigation of variant induced targeting alterations can also be performed on a large scale by subjecting the wild type and variant miRNA expressing cells to transcriptome and/or proteome profiling. Single genes or proteins can be analyzed for differential expression. Pathway analysis can also identify which set of genes are influenced by the variant miRNA and may lead to biological meaningful findings about the variant miRNA and the pathways in the cell it affects in the disease state (Jazdzewski et al., 2009). If a variant is located outside the mature miRNA sequence, this approach can still be used if the variant affects mature miRNA expression, as the altered expression of the miRNA will also deregulate its target genes (Strazisar et al., 2015). However, in contrast to the reporter gene assays, these large-scale approaches cannot distinguish between direct and indirect targets of the miRNA, so target prediction algorithms need to be used to determine which of the deregulated genes can potentially be targeted directly by the miRNA.

Different crosslinking and immunoprecipitation (CLIP) approaches were developed and applied for global miRNA–RNA target interaction identification by Argonaute precipitation, such as HITS-CLIP and PAR-CLIP (Ule et al., 2005; Chi et al., 2009; Hafner et al., 2010). In these methods, RNA and proteins are crosslinked in cells by UV irradiation (in case of PAR-CLIP after incorporation of photoactivatable nucleosides). After cell lysis and partial RNA digestion, Argonaute-RNA complexes are pulled down and purified stringently. Subsequently, coprecipitated RNA is sequenced to identify which miRNAs and which target RNA regions were bound to Argonaute. Because miRNA and target RNAs are not physically linked to each other during the procedure, *in silico* target prediction is required to assess the interactions between the identified RNA sequences. These approaches thus provide enrichment for actual target RNA sites bound to Argonaute and limits the target prediction to these sites and to the detected miRNAs.

A method that can identify direct miRNA-target interactions on a large scale is the crosslinking, ligation, and sequencing of hybrids (CLASH) technology (Kudla et al., 2011; Helwak et al., 2013). Similar to the CLIP techniques, cells are UV irradiated to mediate crosslinking, after which Argonaute-RNA complexes are pulled down and RNA is partially digested. Then a ligation step is performed to ligate miRNAs to their target RNAs. The RNA is sequenced to identify chimeric RNAs. The non-chimeric reads provide the same information as the CLIP experiments, while the chimeric RNAs contain linked miRNA and target RNA fragments and hence provide direct proof of miRNA-target interactions. This technology may thus be used to directly identify whether mature miRNA variants alter targeting capacity of a disease-associated miRNA.

Together, these approaches can be used to assess what the effect of the genetic variant is on the expression and the functionality of the miRNA.

# Concluding Remarks

In this review we summarized the current knowledge of the miRNA maturation process, focusing on the prerequisites of the miRNA transcript to be processed. From these requirements it is clear that the RNA structure, and thus also the underlying sequence determining the structure, are of paramount importance for maturation into mature miRNAs. Thus genetic variants in miRNA genes can have large effects, not only on the miRNA gene itself, but due to its role as a fine-tuner of gene expression and translation, also on its downstream targets. In line with this, different miRNA variants have been found to be involved or hypothesized to be involved in human disease. We also highlighted the biological relevance of genetic variants located within or near miRNA genes and provided an overview of *in silico* and experimental approaches to investigate the effect of these variants on miRNA expression and function. This in turn could then be correlated to changes in miRNA and target

# References


gene abundance in the affected tissue of patients, as a first step in the understanding of the role of the miRNA in the disease pathomechanism.

As we are now coming to realize that the miRNA repertoire is much more complex than previously appreciated with the recognition of the wide-spread presence of isomiRs and their functional relevance, the possibilities of intronic miRNAs being transcribed independently from their host genes and even alternative splicing, many more questions arise. Though many aspects of miRNA biogenesis and functioning have already been unraveled, more investigations will need to be done to be able to fully grasp the regulation and the genesis of the correct expression pattern of the full miRNA repertoire in the cell. We also anticipate that, especially for the investigation of genetic variants in disease, we will need to undertake comprehensive and integrative approaches to be able to fully appreciate how certain genetic variants can undermine this regulatory system and lead to disease, while others remain completely harmless.

# Author Contributions

All authors contributed to the conception of the work. SC wrote the manuscript. MS, PD, and JD provided input and revised the manuscript. All authors approved the final manuscript.

# Acknowledgments

This research was funded by Research Foundation – Flanders (G027410N to JD) and by the University Research Fund (UA-KP BOF/MS-2013 6264 to MS).


inherited hearing loss in an Italian family by altering pre-miRNA processing. *Hum. Mol. Genet.* 21, 577–585. doi: 10.1093/hmg/ddr493


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Cammaerts, Strazisar, De Rijk and Del Favero. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Noncoding RNAs in human saliva as potential disease biomarkers

Xianzhi Lin1 †, Hsien-Chun Lo1 †, David T. W. Wong2, 3, 4 and Xinshu Xiao1, 2, 4 \*

*<sup>1</sup> Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, USA, <sup>2</sup> Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA, USA, <sup>3</sup> School of Dentistry, University of California, Los Angeles, Los Angeles, CA, USA, <sup>4</sup> Jonnson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA*

#### Keywords: RNA-Seq, miRNA, piRNA, circular RNA, extracellular RNA, biomarker, saliva

#### Edited by:

*Mohammadreza Hajjari, Shahid Chamran University of Ahvaz, Iran*

#### Reviewed by:

*Edward K. L. Chan, University of Florida, USA Daniel Wai Hung Ho, The University of Hong Kong, China Jun-An Chen, Academia Sinica, Taiwan Helen K. W. Law, The Hong Kong Polytechnic University, China*

#### \*Correspondence:

*Xinshu Xiao, gxxiao@ucla.edu* † *These authors have contributed equally to this work.*

#### Specialty section:

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

#### Received: *03 March 2015* Paper pending published: *27 March 2015* Accepted: *22 April 2015* Published: *07 May 2015*

#### Citation:

*Lin X, Lo H-C, Wong DTW and Xiao X (2015) Noncoding RNAs in human saliva as potential disease biomarkers. Front. Genet. 6:175. doi: 10.3389/fgene.2015.00175* Human saliva emerged as a research material in as early as the 17th century when investigators sought to understand the basis for salivary secretion (Garrett, 1975). Over the centuries, the focus of salivary research has evolved greatly and a wide range of topics has been examined (Garrett, 1975; Schipper et al., 2007). It is now known that the functions of saliva include at least lubrication, digestion of food, remineralization, prevention of demineralization, protection against microbial and viral infection, speech facilitation, and maintenance of oral and general health (Schipper et al., 2007; Malathi et al., 2014).

One active research area related to human saliva is the discovery of biomarker molecules for a variety of diseases. Compared to other body fluids, saliva is easily accessible in a noninvasive manner. However, it is also immediately exposed to the outside environment, thus may be confounded by a wide variety of environmental factors. Nevertheless, previous studies have established that at least some human molecules in saliva are highly stable and potential biomarkers have been examined for a number of oral and systemic diseases (Bonne and Wong, 2012; Schafer et al., 2014).

# Key Components of Human Saliva

Saliva consists of both cellular and fluid contents. Epithelial cells, leukocytes, and erythrocytes are the three major human cell types, which co-exist with bacterial cells in human whole saliva (Aps et al., 2002). The fluid content of saliva is primarily generated by the salivary glands, but with additional contributions from blood, oral tissue, bacteria, viruses, and food remnants (Schipper et al., 2007). It mainly consists of water, macromolecules (such as glycoproteins, enzymes), small organic molecules, inorganic components (e.g., electrolytes), and metabolites from oral bacteria (Almstahl and Wikstrom, 2003; Aps and Martens, 2005; Schipper et al., 2007).

Many biomarker studies focused on profiling and quantification of proteins or RNA molecules in saliva. Since the diseases investigated for salivary biomarkers are often systemic, it is of great interest to identify circulating protein or RNA molecules that may have originated from diseaserelevant cells (such as tumor cells). Such molecules reside outside of the cells in saliva and are often captured in cell-free saliva (CFS), the fraction of saliva with cellular contents removed (often by centrifugation). Most of the salivary RNAs appear to be highly degraded compared to full-length mRNAs in cellular compartments, possibly due to presence of RNA degradation enzymes in saliva and other body fluids (for circulating mRNAs) (Park et al., 2007). Notably, certain miRNA and mRNA molecules were shown to be highly stable, possible owing to protection by exosomes or protein complexes (Park et al., 2006, 2009; Palanisamy and Wong, 2010; Palanisamy et al., 2010).

# Technologies for Salivary RNA Profiling

About a decade ago, microarrays were applied to characterize the global profile of mRNAs in saliva (Li et al., 2004; Park et al., 2007). These studies revealed that there were over one thousand distinct mRNA molecules in human CFS (Li et al., 2004). In addition to mRNAs derived from coding genes, many noncoding RNAs (ncRNAs) were also detected. Data from these studies demonstrate that there are hundreds of microRNAs (miRNAs) in human saliva, and most of them likely exist in exosomes (Michael et al., 2010; Gallo et al., 2012).

However, microarray techniques have inherent limitations, such as the dependence on gene annotation and crosshybridization noise. In recent years, more powerful techniques based on next generation sequencing (NGS) revealed additional coding and ncRNA species in human saliva (Spielmann et al., 2012; Bahn et al., 2015). In contrast to the hybridization-based microarrays, RNA sequencing (RNA-Seq) offers single nucleotide information, high sensitivity and accuracy in transcript detection, and the capability to detect novel RNA species and transcript isoforms (Lee et al., 2011, 2013; Li et al., 2012). An increasing number of bioinformatic tools are emerging for analysis of RNA-Seq data, ranging from rapid short read aligners to detailed examination of RNA expression patterns (Oshlack et al., 2010). Owing to these improvements, the catalog of human genes, especially ncRNA genes, has been greatly expanded (refer to Sai Lakshmi and Agrawal, 2008; Kozomara and Griffiths-Jones, 2014; Xie et al., 2014 for ncRNA databases).

# ncRNA Molecules in Saliva

In 2012, the Wong group reported the first global characterization of the human salivary transcriptome using high-throughput RNA-Seq (Spielmann et al., 2012). This study demonstrated that saliva harbors a wide variety of RNA species. More than 4000 distinct RNA molecules derived from coding or noncoding human RNAs were identified, including a small number of miRNAs. This study established that the RNA content in saliva is very diverse, which should be fully explored in future biomarker studies.

Recently, another in-depth analysis of human salivary extracellular ncRNA revealed novel insights regarding its RNA content and provided a comparative view of salivary ncRNAs relative to those of other body fluids (Bahn et al., 2015). Using human CFS, this study confirmed previous findings that miRNAs are stably and abundantly present in saliva (Patel et al., 2011), often harbored within exosomes (Gallo et al., 2012). miRNA expression profiles of healthy individuals were quantified and compared. Highly concordant miRNA expression was observed across individuals. Furthermore, considerable similarity was observed between miRNA expression levels of saliva and other body fluids (blood, cerebral spinal fluid (CSF)). Thus, these data suggest that salivary miRNAs could serve as candidate biomarkers, at least with equivalent promise as those derived from more invasive fluids.

A surprising observation from this study was the relative abundance of human piwi-interacting RNAs (piRNAs) in saliva. piRNAs are small ncRNAs typically ∼26–32 nt in length observed in germ cells of both vertebrates and invertebrates (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006; Lau et al., 2006; Watanabe et al., 2006; Das et al., 2008). piRNAs are known to target transposons and repress their mobility (Das et al., 2008; Malone and Hannon, 2009). The number of abundant piRNAs is less than that of miRNAs in saliva, despite the large number of annotated piRNAs in various databases (Sai Lakshmi and Agrawal, 2008; Bahn et al., 2015). Nevertheless, piRNA expression levels were highly concordant between healthy individuals, similarly as miRNA levels. However, in contrast to the consistent expression profile of miRNAs across body fluids, piRNAs were highly exclusive to saliva with very low abundance in blood or CSF. These observations indicate that salivary piRNAs may have originated from cells in the oral mucosa or salivary glands, rather than circulating from systemic organs via blood. Nevertheless, salivary piRNAs may impose systemic functional impact, which needs to be further investigated.

Another novel finding in this study was the discovery of circular RNAs (circRNAs) in CFS, which is the first report of the presence of circRNAs in an extracellular fluid (Bahn et al., 2015). CircRNAs were originally identified in RNA viruses (Sanger et al., 1976; Kos et al., 1986). Later, intracellular circRNAs generated from specific exons of coding genes were reported (Nigro et al., 1991; Cocquerelle et al., 1992; Capel et al., 1993). Recent studies demonstrated that circRNAs exist in many different cell types and species (Salzman et al., 2012, 2013; Jeck et al., 2013; Memczak et al., 2013). Some circRNAs are likely noncoding (Capel et al., 1993; Memczak et al., 2013; Guo et al., 2014), but others may code for proteins (Wang and Wang, 2015). The function of most circRNAs remains unknown. Two circRNAs were shown to function as miRNA sponges (Hansen et al., 2013; Memczak et al., 2013). However, this function may not apply to the majority of other circRNAs as they lack bioinformatic evidence of significant miRNA complementarity (Guo et al., 2014). The discovery of circRNAs in CFS indicates that this type of molecule may have extracellular function and should be considered as a type of candidate biomarker (Li et al., 2015).

Although mRNAs are highly degraded in saliva and other body fluids, small ncRNAs are often stable with reproducible expression across individuals. Indeed, miRNAs have been extensively studied in blood and other body fluids as potential disease biomarkers (Chen et al., 2008; Gilad et al., 2008; Mitchell et al., 2008; Wang et al., 2009, 2010; Fichtlscherer et al., 2010; Li et al., 2010; Liu et al., 2011). The similarity between miRNA profiles of saliva and other body fluids (Weber et al., 2010; Bahn et al., 2015) strongly supports the potential of using miRNAs (and possibly other ncRNAs) from human CFS as biomarkers for various human diseases.

# Salivary ncRNAs as Potential Biomarkers for Diseases

Although at an early stage, salivary ncRNA studies have revealed potential disease biomarkers. Thus, far, most studies focused on miRNA expression in saliva. **Table 1** summarizes a number of



*<sup>a</sup>Validated by qRT-PCR.*

*<sup>b</sup>Area under Receiver Operating Characteristic (ROC) curve.*

*<sup>c</sup>Change in disease relative to control. The number in the parentheses represents the fold change.*

*<sup>d</sup>A combination of four miRNA exhibits discriminating power.*

studies where miRNAs were assessed as putative biomarkers for oral squamous cell carcinoma (OSCC) (Park et al., 2009), parotid gland tumors (Matse et al., 2013), and esophageal cancer (Xie et al., 2013). In addition to oral and esophageal diseases, salivary ncRNAs were also examined as potential biomarkers for systemic diseases. In a clinical study focusing on Sjögren's Syndrome, a chronic autoimmune disease, the authors observed different miRNA expression patterns in minor salivary glands of Sjögren's Syndrome patient compared to healthy individuals (Alevizos et al., 2011). The disease group can be clearly distinguished from the normal group using the miRNA expression profile by principal components and hierarchical clustering analyses. A very recent study focused on pancreatic cancer, using samples of patients with pancreatic cancer, benign pancreatic tumor or healthy controls (Xie et al., 2015). The authors observed significant down-regulation of miR-3679-5p and up-regulation of miR-940 in the cancer group compared to the other groups (**Table 1**), suggesting salivary miRNA may potentially be used for early detection of pancreatic cancer.

In addition to human ncRNAs, exogenous ncRNAs in saliva may also serve as potential disease biomarkers. The humanassociated microbial communities have profound impact on the individual's physiological outcome (Human Microbiome Project Consortium, 2012). In human saliva, over 1500 bacteria have been identified and completely sequenced [Human Oral Microbiome Database; http://www.homd.org/]. Some studies have shown that saliva can be used to detect microbial infection (Schafer et al., 2014). In addition, both DNA and RNA viruses were detected in human saliva from viral infected hosts (Liou et al., 1992; Chen et al., 1997; Vieira et al., 1997; Shugars et al., 2001; Hermida et al., 2002; Mackiewicz et al., 2004; Goncalves et al., 2005; Balamane et al., 2010; Pride et al., 2012), thus could serve as biomarkers of viral infection. Thus, far, little is known regarding the landscape and function of exogenous ncRNAs in saliva.

# Future Challenges and Perspectives

A comprehensive ncRNA expression profile is emerging for human saliva including the presence of miRNAs, piRNAs, and circular RNAs (Ogawa et al., 2013; Bahn et al., 2015). More RNA species may be discovered in the future given the rapid evolution of new technologies and powerful bioinformatic methods. The value of saliva as a body fluid for biomarker discovery is just becoming widely recognized. However, there are a number of challenges in this field, most of which are general to usage of any body fluid in biomarker discoveries. One challenge lies in the unbiased isolation of short and long RNA molecules from saliva samples. Although this topic is under intensive investigation, improved methods that can retain most RNA species unbiasedly in an operator-independent manner are highly desired.

Another challenge is accurate quantification of ncRNA abundance, which is key to biomarker assessment. RNA yield from different samples may vary greatly, which calls for effective experimental and bioinformatic methods for normalization of RNA expression. Most RNA-Seq studies discussed above calculated RNA expression levels by normalizing the number of reads of a particular RNA molecule against the total number of mapped reads (i.e., the RPKM measure Mortazavi et al., 2008). However, to estimate the absolute concentration of an RNA molecule in a sample, synthetic spike-in RNAs with known concentration should be added to the RNA sample before library generation. This approach necessitates accurate measurement of RNA concentration of the sample and synthesis of a large number of spike-in RNAs with varying sequence contents and concentrations (see Williams et al., 2013 for a demonstration of this approach). This challenging approach, though highly desirable and necessary for clinical usage of a biomarker, has not been widely adopted.

A third major challenge is a better understanding of the biogenesis pathways of human ncRNAs in saliva, which constitutes the basis to assess whether and to what extent

# References


ncRNA expression can reflect a person's health condition. Salivary RNAs could be derived from systemic organs or local cells of the oral cavity. Cellular origins of candidate biomarkers for various diseases should be further examined to substantiate our understanding of the validity of the biomarkers. Indeed, the presence, origin, and functional roles of disease biomarkers are all essential questions general to studies of different types of biomarkers and diseases. A valid disease biomarker should be directly involved in disease mechanisms or indirectly associated/correlated with key pathways driving the pathogenesis of disease. The ultimate question is how knowledge gained in biomarker studies could be utilized to develop effective strategies for disease prevention and treatment, which closely relies on a clear understanding of disease mechanisms.

# Acknowledgments

This work was supported by NIH grants R01HG006264 (to XX) and UH2 TR000923 (DTWW).


**Conflict of Interest Statement:** David T. W. Wong is co-founder of RNAmeTRIX Inc., a molecular diagnostic company. He holds equity in RNAmeTRIX, and serves as a company Director and Scientific Advisor. The University of California also holds equity in RNAmeTRIX. Intellectual property that David Wong invented and which was patented by the University of California has been licensed to RNAmeTRIX. Additionally, he is a consultant to PeriRx. The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Lin, Lo, Wong and Xiao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Phage display and targeting peptides: surface functionalization of nanocarriers for delivery of small non-coding RNAs

Babak Bakhshinejad\*

Department of Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran

Keywords: phage display, peptide library, targeting, delivery, nanocarrier, small non-coding RNA

# Introduction

#### Edited by:

Mohammadreza Hajjari, Shahid Chamran University of Ahvaz, Iran

#### Reviewed by:

Tohru Yoshihisa, University of Hyogo, Japan Venugopal Thayanithy, University of Minnesota, USA João Conde, Massachusetts Institute of Technology, USA

#### \*Correspondence:

Babak Bakhshinejad, babak.bakhshinejad@modares.ac.ir; babak\_bakhshinejad@yahoo.com

#### Specialty section:

This article was submitted to RNA, a section of the journal Frontiers in Genetics

Received: 25 February 2015 Accepted: 26 April 2015 Published: 12 May 2015

#### Citation:

Bakhshinejad B (2015) Phage display and targeting peptides: surface functionalization of nanocarriers for delivery of small non-coding RNAs. Front. Genet. 6:178. doi: 10.3389/fgene.2015.00178 Small non-coding RNAs are known as a clinically relevant category of non-coding RNAs (ncRNAs) that have gained growing attention for their therapeutic values. miRNA and siRNA are of the most important and well-documented types of small ncRNAs. These regulatory RNA molecules are short–approximately 22 nucleotides long–with a similar mechanism of generation in which they are excised from longer double-stranded RNA precursor molecules. Both miRNA and siRNA target protein-coding genes via an antisense-based strategy and follow the same processing manner to reach gene-silencing effect (Chen et al., 2015). Functionally, small ncRNAs play a determining part in the modulation of cellular gene expression. The impressive role of these molecules as the controlling machinery of cells has proposed promising capacities for their potential translation into the clinic. In this context, tuning the regulatory components (such as ncRNAs) rather than the regulated components (protein-coding gens) has been suggested to be a more convenient and more effective approach to correct many cellular dysfunctions (Taft et al., 2010). This emerging concept symbolizes the dramatic potential of small ncRNAs, as a significant part of the regulatory apparatus of cells, to treat human diseases.

# Small ncRNA Delivery: Active Targeting and the Dream of Magic Bullet

Within the recent years, nanotechnology-based approaches have emerged as potentially powerful and efficient systems for cellular delivery of ncRNAs. The exploitation of nanocarriers shows promise to bypass a variety of biological barriers for systematically administered RNA therapeutics (Miele et al., 2012; Segovia et al., 2015). Nanoparticles provide protection from serum nucleases for RNA-based therapeutic molecules, decrease their sequestration by phagocytes of the reticuloendothelial system (RES) and are able to carry both hydrophilic and hydrophobic substances (Aagaard and Rossi, 2007; Peer et al., 2007). The excellent characteristics of nanocarrier platforms aid in enhancing overall bioavailability, extending in vivo stability, and improving the cellular delivery of RNA therapeutics such as small ncRNAs.

For the successful delivery of small ncRNA, one of the most important problems that needs to be solved is targeting of these drugs into the desired sites. The delivery of small ncRNAs to the diseased cells, tissues or organs not only raises their silencing potency, but also leads to a considerable reduction of side effects by avoiding normal cells (Daka and Peer, 2012). Active targeting plays a highly critical part in achieving efficient in vivo biodistribution and optimized systemic delivery of ncRNA-based therapeutics. Active targeting becomes possible when the surface of nanocarriers that are embedded with small ncRNAs is modified through the covalent attachment of a cell/tissue-specific ligand. These surface-conjugated ligands will specifically interact with receptors that are expressed on the surface of target cells (Bertrand et al., 2014). In this manner, the delivery of therapeutic small ncRNA molecules will be poised to materialize the dream of constructing magic bullet; the concept that was theoretically fathered by Paul Ehrlich to describe targeted transportation of therapeutic agents to the desired sites of the body. Peptides are known as one of the most interesting categories of targeting ligands that have recently received huge attention. Due to some unique properties, peptides have become favorable targeting ligands for formulating powerful and sophisticated drug delivery platforms.

# Phage Display Libraries: A Treasure of Cell-targeting Peptides

Phage display that is defined as the ability to genetically modify bacteriophages for introducing specific amino acids onto their surface is a worthwhile tool in biomedicine. The birth of phage display, as a kind of virus-based molecular chimera, goes back to three decades ago when Smith for the first time reported the expression of a foreign polypeptide on the surface of phage particles (Smith, 1985). The power of phage display for gaining momentum as a revolutionary strategy for gene/drug delivery studies comes from two distinctive features: (i) establishing a physical connection between the phenotype (the displayed ligand) and the genotype (the DNA sequence encoding the displayed ligand) within the same viral particle and (ii) producing very large libraries of ligands displayed on the surface of the phage particles. The most widely-used phage display libraries are random peptide libraries (RPL). In simple terms, a random peptide library is a collection of phages carrying on their surface extremely diverse types of peptides.

Due to the phenotype-genotype link, phage peptide libraries can be screened against any target of interest for selecting targetspecific binders. This screening strategy displays similarity to the approach of natural selection but with distinctive difference of being performed in the test tube instead of nature (Bakhshinejad and Sadeghizadeh, 2014). Affinity selection is a hallmark of phage display screening through which binding peptides specific to any target including organic and inorganic structures can be obtained. The procedure of affinity-based library screening is called "panning." Whole cells in the culture are one of the most important targets used in panning studies. Cell panning involves incubating the phage library with target cells. Some phages interact with and bind to the target with higher affinity. The unbound or weakly-bound phages are then washed away and the bound phages are recovered. Subsequently, the recovered phages are amplified through infecting host bacterial cells. This process is repeated for several rounds to ultimately isolate the most strongly binding phages whose surface displayed peptides are identified by DNA sequencing (Bakhshinejad et al., 2014).

# Phage Library Selected Peptides and Targeted Delivery of miRNA/siRNA-based Nanocarriers

Currently, RPLs are known to be one of the major sources of clinically relevant peptides. The marriage of combinatorial peptide chemistry and phage display has turned phage peptide libraries to a treasure for the discovery of useful targeting peptides. As we are beginning to more deeply understand the biological functions of ncRNAs in cellular processes and thereby their vital roles in the pathogenesis of numerous disorders, phage peptide libraries are attracting increasing attention for the delivery of small ncRNAs. Cell-specific peptides derived from phage libraries have been demonstrated to be capable of acting as important targeting components in nanocarriers and delivering both miRNA and siRNA into target cells. A variety of nanocarriers have been used for the delivery of small ncRNAs (Conde et al., 2015). Liposomes, polymers, micelles, protamine, and virus-like particles (VLPs) are delivery platforms that in different studies have represented potential to achieve delivery of miRNA and siRNA to target cells/tissues. **Table 1** indicates a list of studies in which peptides selected by phage display screening have been used for targeted delivery of miRNA/siRNAcontaining nanocarriers to desired cells.

Cancer is a disease that has been the focus of much investigation for the delivery of small ncRNA-based therapeutics. Peptides obtained from screening of phage libraries against various tumor cells have served to target nanocarriers with encapsulated ncRNAs to the cells of interest. In line with this, several attempts have been made to identify phage-borne peptides specific to different tumors. A number of these peptides have indicated favorable outcomes for the delivery of peptidetargeted miRNA/siRNA-containing nanocarriers to malignant cells. DS 4-3, breast cancer-specific peptides, conjugated to branched polyethylenimine (bPEI) have been indicated to deliver the tumor suppressor miR-145 to metastatic breast cancer cells, thereby inhibiting tumor cell growth and suppressing cell invasion (Lee et al., 2013). In a novel strategy for intracellular delivery of siRNA into cancer cells, a research group led by Petrenko targeted liposomes by using preselected intact phage proteins (Bedi et al., 2011). In this work, not only canceravid peptide but the whole fusion phage coat protein in combination with its displayed peptide was used as a targeting ligand. This tumor-specific complex (phage fusion protein and peptide) was isolated from affinity screening of a phage library against MCF-7 breast cancer cells followed by being inserted into the lipid bilayer of liposome. The result was construction of a siRNA-encapsulated phage protein-targeted liposome that specifically down-regulated the expression of PRDM14 gene–a gene with important roles in the carcinogenesis of breast cancer–and inhibited the synthesis of its protein. This strategy remarkably overcomes the problem of purification of targeting peptides and their subsequent conjugation to nanocarriers.

Phage display selected peptides have also been exploited for targeted delivery of small ncRNAs involved in disorders


TABLE 1 | Studies in which peptides obtained through screening of phage display libraries have been used to functionalize the surface of nanocarriers for targeted delivery of miRNA and siRNA.

other than cancer. Cardiovascular diseases, as one of the leading causes of death worldwide, have attracted attention to obtain benefits from achievements of phage display technology in the development of targeted therapy approaches. Since cardiomyocyte apoptosis is an important player in several heart diseases, the inhibition of this type of cell death has been viewed as a useful strategy for potential treatment of pathological conditions linked to cardiovascular system. This can be achieved through abolishing the activity of those factors whose function stimulates apoptosis. The conjugation of a primary cardiomyocyte (PCM) specific peptide selected by phage display to a polymer-based nanocarrier was used for the delivery of siRNA specific to Fas (an apoptosis inducer) to cardiomyocytes under hypoxic conditions (Nam et al., 2010). This PCM-targeted siRNA delivery system led to the prevention of apoptosis through down-regulating the Fas gene. Sometimes the inhibitors of ncRNAs, not ncRNAs themselves, have to be delivered to target cells. The exploitation of miRNA inhibitors is a prime example of miRNA-based scenarios that can be used for therapeutic purposes. As dysregulated miRNAs are associated with some disorders, the ability to inhibit the cellular activity of these miRNAs can provide new opportunities to treat human diseases. In this context, a micelle-based nanocarrier targeted with a phage display selected peptide specific to vascular endothelial cells was shown to successfully deliver miR-92a inhibitor to endothelial cells; a strategy that is helpful for cell-specific delivery and targeted inhibition of miRNAs that contribute to the pathology of atherosclerosis (Kuo et al., 2014).

# Conclusion

The capacity of small ncRNAs in specifically silencing the expression of disease-associated genes underlines their importance as novel forms of therapeutic intervention in numerous human disorders. Nanocarriers, due to some tunable features, represent huge potential to overcome delivery issues concerned with miRNA/siRNA-based pharmaceutical platforms. Recent findings highlight the significance of phage display methodology in providing a wide variety of peptides to be used for targeting of nanocarriers to desired cells. Although still in its infancy, phage peptide libraries are receiving growing attention for targeted cellular delivery of miRNA/siRNA therapeutic cargoes. Active targeting of nanocarriers through surface modification with phage selected peptides will provide new opportunities for taking advantage of the tremendous potential of small ncRNAs in the clinical setting. The quickly advancing role played by phage peptide libraries in the territory of targeted delivery of RNA therapeutics has roots in the surprising diversity of these libraries and the convenience of their screening over a wide variety of cellular targets that cover numerous human

# References


diseases. Cell-targeting peptides derived from phage display libraries have raised hopes for researchers to take great strides toward the cherished goal of targeted delivery of miRNAs and siRNAs into the same cells that should take these therapeutic agents.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Bakhshinejad. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The role of long non-coding RNAs in genome formatting and expression

*Pierre-Olivier Angrand, Constance Vennin, Xuefen Le Bourhis and Eric Adriaenssens\**

*Cell Plasticity and Cancer – Inserm U908, University of Lille, Lille, France*

Long non-coding RNAs (lncRNAs) are transcripts without protein-coding potential but having a pivotal role in numerous biological functions. Long non-coding RNAs act as regulators at different levels of gene expression including chromatin organization, transcriptional regulation, and post-transcriptional control. Misregulation of lncRNAs expression has been found to be associated to cancer and other human disorders. Here, we review the different types of lncRNAs, their mechanisms of action on genome formatting and expression and emphasized on the multifaceted action of the H19 lncRNA.

#### *Edited by:*

*Mohammadreza Hajjari, Shahid Chamran University of Ahvaz, Iran*

#### *Reviewed by:*

*Venugopal Thayanithy, University of Minnesota, USA Jan-Wilhelm Kornfeld, Max-Planck Institute for Neurological Research, Germany Maryam Tahmasebi Birgani, Ahvaz Jundishapur University of Medical Sciences, Iran*

#### *\*Correspondence:*

*Eric Adriaenssens, Cell Plasticity and Cancer – Inserm U908, University of Lille, Bâtiment SN3, Cité Scientifique, F-59655, Villeneuve d'Ascq, Lille, France eric.adriaenssens@univ-lille1.fr*

#### *Specialty section:*

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

*Received: 28 February 2015 Paper pending published: 16 March 2015 Accepted: 12 April 2015 Published: 29 April 2015*

#### *Citation:*

*Angrand P-O, Vennin C, Le Bourhis X and Adriaenssens E (2015) The role of long non-coding RNAs in genome formatting and expression. Front. Genet. 6:165. doi: 10.3389/fgene.2015.00165*

#### Keywords: lncRNAs, H19, chromatin organization, transcriptional regulation, post-transcriptional control

The advent of DNA tilling arrays and deep sequencing technologies has revealed that a much larger part of the genome is transcribed into RNAs than previously assumed. It is estimated that up to 70% of the genome is transcribed but only 2% of the human genome codes for proteins (Bertone et al., 2004; Birney et al., 2007; Kapranov et al., 2007; ENCODE Project Consortium, 2012) and RNAs without coding potential are collectively referred as non-coding RNAs (ncRNAs).

Non-coding RNAs include the well-known ribosomal (r) RNAs, ribozymes, transfer (t) RNAs, small nuclear (sn) RNAs, telomere-associated RNAs (TERRA, TERC), as well as a plethora of far less characterized RNAs. Based on their size, these ncRNAs are subdivided into two groups: small ncRNAs (*<*200 nt) and long ncRNAs [lncRNA (*>*200 nt)]. Small ncRNAs, such as microRNAs (miRs), small interfering RNAs (siRNAs), or PIWI-interacting RNAs (piRNAs) received much attention and were shown to mainly act as negative regulators of gene expression. In contrast, lncRNAs represent a more functionally diverse class of transcripts. LncRNAs are found in a large diversity of animals species (Guttman et al., 2009; Jia et al., 2010; Pauli et al., 2012), but also in plants (Swiezewski et al., 2009), yeast (Houseley et al., 2008), and even in prokaryotes (Bernstein et al., 1993) and viruses (Reeves et al., 2007). LncRNAs remains poorly conserved among species (Pang et al., 2006; Derrien et al., 2012). However, accumulating evidences indicate that this RNA class plays an important role in a variety of biological processes and may be involved in cancer and other human diseases (Wapinski and Chang, 2011; Tano and Akimitsu, 2012).

Majority of lncRNAs are 5 capped, 3 polyadenylated, multi-exonic and are subjected to transcriptional regulation as coding mRNAs (Carninci et al., 2005; Guttman et al., 2010; Cabili et al., 2011; Derrien et al., 2012). Some of the lncRNAs such as XIST, MALAT1, or NEAT1 are almost exclusively localized in the nucleus (Brown et al., 1992; Hutchinson et al., 2007), whereas others are mostly found in the cytoplasm (Coccia et al., 1992; Yoon et al., 2012). In term of genomic organization, lncRNAs can be classified according to their proximity to protein coding genes into five categories: sense, when overlapping one or more exons of another transcript; antisense, when overlapping one or more exons of another transcript on the opposite strand; bidirectional, when its expression and the expression of the neighboring coding transcript on the opposite strand are initiated in close proximity; intronic, when raising from an intron of another transcript; or intergenic, when produced from an independent transcription unit in the interval between two protein coding genes. This crude classification illustrates that lncRNA expression may be controlled by different molecular mechanisms, but it does account neither for their modes of action nor for their cellular functions.

While only a limited number of lncRNAs has been studied, numerous evidences indicate that lncRNAs interact with a plethora of proteins. Furthermore, homologous Watson–Crick base pairing provides an efficient way by which lncRNAs may selectively interact with other nucleic acid species. It is believed that lncRNAs are involved in a diversity of cellular functions through gene expression regulation at different levels including chromatin organization, transcriptional regulation, and post-transcriptional mRNA processing (Mercer et al., 2009; Wilusz et al., 2009).

To complicate matters further, Anderson et al. (2015) recently described that a conserved micropeptide is encoded by a skeletal muscle-specific RNA previously annotated as a putative long non-coding RNA. This finding leads to the proposal that several lncRNAs could also have a biological function through the production of micropeptides.

# LncRNAs in the Control of mRNA Processing

The ability of lncRNAs to recognize complementary sequences allows the regulation of mRNA processing at various steps, including degradation, splicing, translation, or transport (**Figure 1**).

transcript induces stabilization of the target mRNA and increases protein abundance. (B) mRNA degradation. Staufen double-stranded RNA-binding protein 1 (STAU1)-mediated mRNA decay is induced when base pairing is formed between the mRNA and a lncRNA. (C) Ribosome targeting. Through homologous base pairing with mRNAs and interactions with ribosomal proteins formation and maintenance of nuclear structures involved in alternative splicing of nascent transcripts. (E) miR sponge. By sequestering miRs through base pairing formations, lncRNAs affect the expression of the miR target genes. (F) Precursor of miRs. LncRNAs can serve as a source of miRs after processing. LncRNAs are shown in red, whereas mRNAs are in blue. See text for examples.

Base pairing between defined regions of the human β-site APP-cleaving enzyme 1 (BACE1) transcript and its antisense lncRNA BACE1-AS induces the mRNA stabilization and consequently the increase in BACE1 protein abundance (Faghihi et al., 2008). Similarly, the lncRNA TINCR (terminal differentiation-induced ncRNA) interacts with a range of differentiation mRNAs including FLG, LOR, ALOXE3, ALOX12B, ABCA12, CASP14, or ELOVL3, to increase their stability (Kretz et al., 2013). In contrast, the recognition of mRNAs by other lncRNAs, such as half-STAU1-binding site RNAs (1/2sbsR-NAs) decrease target mRNA stability by inducing STAU1 recruitment and the STAU1-mediated mRNA decay pathway (Gong and Maquat, 2011).

The translational process may also be modulated positively or negatively by lncRNA–mRNA pairing. For example, the antisense lncRNA ULCH-AS1 (ubiquitin carboxy-terminal hydrolase L1 antisense RNA 1) enhances ULCH mRNA translation (Carrieri et al., 2012), whereas lincRNA-p21 or pseudo-NOS suppress target mRNA translation (Korneev et al., 1999; Yoon et al., 2012).

The lncRNA MALAT1 (metastasis associated lung adenocarcinoma transcript 1) regulates pre-mRNA alternative splicing by modulating active serine/arginine splicing factors levels (Tripathi et al., 2010). In this case, the modulation of the mRNA processing is not achieved by a lncRNA–mRNA pairing mechanism but rather by the MALAT1-mediated modulation of the distribution of various splicing factors in nuclear speckle domains. However, antisense transcripts may also affect alternative splicing of their sense transcripts by virtue of masking splice sites by base complementarity (Krystal et al., 1990; Khochbin et al., 1992; Beltran et al., 2008). For example, a specific isoform of the lncRNA NPPA-AS is capable of down-regulating the intron-retained NPPA (atriuretic peptide precursor A) mRNA variant through RNA duplex formation between the sense and antisense transcripts (Annilo et al., 2009).

# LncRNAs and the Connection with the MicroRNA World

Some lncRNAs act on post-transcriptional regulation through the modulation of the microRNA (miR) pathways. MiRs, a large class of small ncRNA, function by annealing to complementary sites in the coding sequences or 3- -untranslated regions (UTRs) of target mRNAs where they favor the recruitment of protein factors that impair translation and/or promote transcript degradation leading to a decrease in protein abundance (Baek et al., 2008; Bartel, 2009). Specifically, one mechanism by which the BACE1-AS lncRNA enhances BACE1 sense mRNA stability could be by masking the binding site for miR-485-5p (Faghihi et al., 2010). Rather than competing for miR-binding sites, a number of lncRNAs contain miR-binding sites in their sequence and therefore act as "sponges" to sequester miRs away from their mRNA targets. The pseudogene PTENP1 previously considered as biologically inactive was found to sequester miRs, consequently affecting their action on target gene regulation (Poliseno et al., 2010). In particular, the 3- -UTR of the PTENP1 lncRNA binds the same set of miRs targeting the tumor suppressor gene PTEN, then reducing the downregulation of this transcript and thus enhancing PTEN protein abundance. A number of other lncRNAs, including KRASP1, linc-MD1, HULC, or linc-ROR were shown to control mRNA activity through a miR sponge mechanism (Poliseno et al., 2010; Wang et al., 2010, 2013; Cesana et al., 2011). These examples illustrate that lncRNAs could counteract miR actions, but lncRNAs can themselves give rise to miRs and thus favor post-translational control by miR pathways as it is the case for the mouse Dlk1– Dio3 cluster or the BIC lncRNA (Eis et al., 2005; Hagan et al., 2009). Within the Dlk1–Dio3 cluster, *Meg3/Gtl2* contains in its last intron the evolutionarily conserved microRNA miR-770 whereas *Meg8* transcripts have the intron-encoded miR-341, miR-1188, and miR-370. Similarly, miR-155 is processed from sequences present in BIC lncRNA that accumulates in lymphoma cells.

# LncRNAs in the Transcriptional Control

A number of evidences indicate that lncRNAs can act at the level of transcription either negatively or positively through a variety of molecular mechanisms (**Figure 2**). The dihydrofolate reductase (DHFR) gene contains a major and a minor promoter. The minor promoter gives rise to a lncRNA that forms a stable triplex lncRNA-DNA association at the major DHFR promoter and interacts with the general transcription factor II B (TFIIB) leading to the dissociation of the transcriptional preinitiation complex at this major promoter and then reducing DHFR expression (Martianov et al., 2007).

Other lncRNAs act as decoys to negatively control transcription by titrating transcription factors away from their cognate promoters. The lncRNA PANDAR (promoter of CDKN1A antisense DNA damage activated RNA) is induced in a TP53 dependent manner and inhibits apoptotic gene expression to favor cell-cycle arrest through direct interaction with, and sequestration of NFYA, a transcription factor controlling the apoptotic program upon DNA damage (Hung et al., 2011). Similarly, the lncRNA GAS5 (growth arrest-specific 5) contains an RNA motif derived from a stem-loop structure mimicking a DNA motif corresponding to the glucocorticoid response element. GAS5 binds to the DNA-binding domain of the glucocorticoid receptor, acts as a decoy glucocorticoid response element and is thus competing with DNA sites for binding to the glucocorticoid receptor (Kino et al., 2010).

Rather than acting as molecular decoys, lncRNA could modulate transcription by recruiting factors at target gene promoters or acting as transcription factor co-activators. For example, a lncRNA produced at the 5 regulatory region of the cyclin D1 (CCND1) gene in response to genotoxic stress tethers and modulates the activity of the RNA-binding protein TLS (translocated in liposarcoma) which in turn inhibits the activity of the histone acetyltransferases CBP (CREB binding protein) and EP300, leading to CCND1 transcriptional repression (Wang et al., 2008). The lncRNA Evf-2 (DLX6-AS1) forms a stable complex with the homeodomain-containing protein DLX2 to induce expression of

the adjacent genes at the DLX5/6 locus (Feng et al., 2006). In this later case, the Evf-2 lncRNA functions as a co-factor regulating transcription factor activity.

Other lncRNAs regulate transcription by controlling transcription factor trafficking. As such, the lncRNA NRON (non-protein coding RNA, repressor of NFAT) interacts with importin-beta family members to inhibit nuclear translocation of the inactive dephosphorylated nuclear factor of activated T cells (NFAT) trans-activator (Willingham et al., 2005).

# LncRNAs and Epigenetics

LncRNAs have been implicated in the control of gene expression through the recruitment of epigenetic modifiers at specific genomic loci. In eukaryotic chromatin, epigenetic regulation is conveyed by covalent modifications of DNA (methylation, hydroxymethylation), modifications of histone tails (acetylation, methylation, phosphorylation, ubiquitinylation), and the incorporation of various histone variants. These modifications locally change chromatin organization and regulate gene expression without changes in the DNA sequence. A number of evidences indicate that lncRNAs, acting as guides targeting enzymes involved in chromatin modifications, are part of this picture (**Figure 3**).

The lncRNA HOTAIR (HOX transcript antisense RNA) is transcribed from the HOXC locus and targets Polycomb Repressive Complex 2 (PRC2) to silence distantly located genes, including genes at the HOXD locus and 100s of other genes on various chromosomes (Rinn et al., 2007; Zhang et al., 2015). Components of PRC2 trimethylate lysine 27 of histone H3 (H3K27me3) establishing the silent chromatin state (Völkel and Angrand, 2007; Völkel et al., 2015). Interestingly, HOTAIR also binds the LSD1–CoREST complex which possesses a lysine 4 of histone H3 demethylase activity, thus removing an active H3K4me2 chromatin mark (Tsai et al., 2010). Furthermore, deletion analysis of HOTAIR revealed that distinct parts of the lncRNA interact with PRC2 and LSD1 indicating that HOTAIR is able to bridge two independent chromatin modifying activities at a target locus. Indeed, the knockdown of HOTAIR is responsible for the concomitant loss of occupancy of PRC2 and LSD1, and concurrent loss of H3K27me3 and gain of H3K4me2 at target loci. Then, HOTAIR acts as an RNA scaffold targeting two different histone modification activities involved in heterochromatin formation.

The interplay between one lncRNA and different chromatin modifying complexes is also found at the INK4A tumorsuppressor locus. The antisense lncRNA ANRIL (antisense noncoding RNA in the INK4 locus, CDKN2B-AS) which is produced by the INK4B/ARF/INK4A locus binds specifically two Polycomb proteins, CBX7 (PRC1) and SUZ12 (PRC2). Disruption of interaction with both PRC1 and PRC2 proteins impacts the transcriptional repression at the INK4B locus in *cis* (Yap et al., 2010; Kotake et al., 2011). As another example, the lncRNA

KCNQ1OT1 (KCNQ1 opposite strand/antisense transcript 1) mediates bidirectional silencing by interacting with chromatin and recruiting the PRC2 complex, as well as the histone methyltransferase G9a (EHMT2), resulting in an increase in the repressive histone modifications H3K27me3 and H3K9me3 at the KCNQ1 domain (Pandey et al., 2008). Thus, similar to HOTAIR and ANRIL, KCNQ1OT1 represents a prototype of a scaffold RNA recruiting multiple sets of chromatin modifying activities involved in target gene silencing. Approximately 20% of lncR-NAs, including HOTAIR, ANRIL, KCNQ1OT1, but also XIST, RepA, HEIH, PCAT-1, H19, or linc-UBC1 (Zhao et al., 2008; Maenner et al., 2010; Prensner et al., 2011; Yang et al., 2011; Luo

represses transcription in *cis* at the INK4B/ARF/INK4A locus by

et al., 2013; He et al., 2013), are believed to guide PRC2 activity to target genes, indicating that lncRNA-mediated targeting of PRC2 at chromatin is a widely used strategy to repress gene expression through a chromatin reorganization mechanism (Khalil et al., 2009).

H3K4me3 activating marks.

In contrast, the lncRNA HOTTIP (HOXA transcript at the distal tip) mediates transcriptional activation by controlling chromatin modification and organization (Wang et al., 2011). HOTTIP is produced from the 5- -end of the HOXA locus, downstream of HOXA13. The knockdown of HOTTIP decreases expression of HOXA genes in *cis*, with an efficacy that correlates with the proximity of the HOXA genes relative to the HOTTIP

transcriptional unit. At the target genes, knockdown of HOTTIP results in the loss of activating H3K4me3 and H3K4me2 epigenetic marks, together with the decreases in occupancy of the MLL1 protein complex responsible for the establishment of these histone modifications. Furthermore, chromosome conformation capture carbon copy (C5) assays revealed abundant long-range looping interactions, bridging the transcribed target HOXA genes into proximity of the HOTTIP transcriptional unit. Thus, the mechanism by which the lncRNA HOTTIP controls HOXA expression relies on its potential to guide the histone methyltransferase MLL1 at target HOXA gene promoters, and on the formation of chromatin loops that connect distantly expressed HOXA genes to HOTTIP transcripts.

A role of lncRNAs in chromatin loop formation has also been described for the lncRNA CCAT1-L (Xiang et al., 2014). Indeed, CCAT1-L, is transcribed from a locus upstream of MYC and plays a role in MYC transcriptional regulation by promoting long-range chromatin looping.

Thus, lncRNAs, through the recruitment of chromatin modifiers and/or the induction of chromatin loops will modulate the chromatin conformation and will format the genome in a particular configuration. This lncRNA-mediated genome formatting emerges as a crucial and fundamental mechanism by which lncRNA may act on gene expression programs.

## H19, a Prototype of a Multitask lncRNA

As discussed above, lncRNAs can regulate genome expression through different molecular mechanisms. However, several lncR-NAs use multiple strategies that, in combination, may be required for their biological function. The action of the lncRNA H19 on gene expression illustrates the complexity of the combinatorial mechanisms of regulation achieved by a single lncRNA. H19 was the first lncRNA discovered (Brannan et al., 1990). Furthermore, H19 and its neighboring IGF2 gene located at position 11p15.5 are subjected to genomic imprinting and the study of the gene regulation at this locus serves as a model for understanding the molecular mechanisms involved in this genomic regulation. In addition, alterations of gene expression at the H19/IGF2 locus are associated to malignancies and developmental disorders. Loss of heterozygosity including loss of imprinting could be responsible for a loss of expression or a biallelic expression of these genes. Patients suffering from Beckwith–Wiedemann syndrome (BWS, OMIM 130650; Choufani et al., 2010) exhibit a loss of H19 expression and a biallelic expression of IGF2. BWS is associated with fetal and postnatal overgrowth and increased risk of embryonic or childhood cancers such as Wilm's tumors. Loss of IGF2 expression with a biallelic H19 expression is responsible for 20 to 60% of cases of Silver–Russel syndrome (SRS, OMIM 180860; Penaherrera et al., 2010). SRS is an intrauterine growth delay associated to an altered postnatal growth with facial dysmorphia and corporal asymmetry. Numerous studies including ours indicate that H19 may play a key role in tumorigenesis and could contribute to tumor progression and aggressiveness. H19 overexpression has also been reported in various cancer tissues including breast (Adriaenssens et al., 1998; Lottin et al., 2002), bladder (Cooper et al., 1996), lung (Kondo et al., 1995), and esophageal cancers (Hibi et al., 1996). Several lines of evidence indicate that H19 could play a role in tumor invasion and angiogenesis. In breast cancer, the oncogenic role of H19 has been well established (Berteaux et al., 2005), even if the precise molecular mechanisms involved in tumorigenesis are not yet fully understood.

(1) and on post-transcriptional control as a miR decoys sequestering miR-106a

91H and HOTS may have biological outcomes (5).

At the H19/IGF2 locus, both genes share a common set of enhancers located downstream of the H19 gene (**Figure 4**). The ICR (imprinting control region), located 2 kbp upstream of the H19 promoter, controls the monoallelic expression of H19 and IGF2 by insulating communication between the 3- enhancers and the IGF2 promoter. The chromatin insulator property of the H19/IGF2 ICR is regulated by the insulator CTCF (CCTC-binding factor), which binds specifically to the unmethylated maternal allele. On the paternal allele, the ICR methylation does not allow CTCF binding and leads to IGF2 expression (reviewed in Lewis and Murrell, 2004). The H19/IGF2 locus contains other differentially methylated regions (DMRs), with DMR1 being a methylation-sensitive silencer and DMR2 being a methylation-sensitive activator (Constancia et al., 2000; Murrell et al., 2004). CTCF binding to the maternal ICR regulates its interaction with matrix attachment region 3 (MAR3) and DMR1 at IGF2, thus forming a tight loop around the maternal IGF2 locus which may contribute to its silencing. These interactions restrict the physical access of distal enhancers to the IGF2 promoter (Weber et al., 2003; Murrell et al., 2004; Kurukuti et al., 2006). Furthermore, several lncRNAs are produced at the H19/IGF2 locus adding further complexity to the locus regulation. The first antisense transcript at the H19/IGF2 locus is the lncRNA IGF2-AS (3–4 kb) discovered in 1991 in chicken (Rivkin et al., 1993; Moore et al., 1997). IGF2-AS and IGF2 are coregulated at the transcriptional levels but the function of this IGF2-AS lncRNA remains unclear. The lncRNA 91H (about 120 kb) is transcribed from the maternal allele (Berteaux et al., 2008). Recently, at the same position, a new protein coding gene HOTS (6 kbp) has been described (Onyango and Feinberg, 2011) but the relationship between the HOTS and 91H is still not clear. However, these two transcripts are transcribed in an antisense orientation compared to H19. An additional lncRNA produced by the H19/IGF2 locus has been identified (Court

et al., 2011). This PIHit (paternally expressed IGF2/H19 intergenic transcript) lncRNA is a 5–6 kb transcript expressed from the paternal allele after birth. Thus, the genomic organization of coding and non-coding transcripts illustrates the complexity of the interleaved networks of lncRNAs expressed from the H19/IGF2 locus.

To complicate matters further, H19 lncRNA mechanisms of action appear to be extremely diverse, acting at various levels (**Figure 5**). H19 has been shown to guide chromatin modifying enzymes to specific loci. In particular, Luo et al. (2013) have shown that H19 binds to and recruit the histone methyltransferase EZH2 at the E-cadherin promoter, leading to an increase in H3K27me3 repressive marks and to the silencing of the E-cadherin gene in bladder cancer. PRC2 protein members are not the only chromatin modifying factors interacting with H19 since it has been shown that this lncRNA physically binds to the methyl-CpG-binding domain protein 1 (MBD1). The H19-MBD1 complex is then recruited at several imprinted genes including IGF2, SLC38A4, and PEG1 (Monnier et al., 2013). This recruitment also induces methylation at lysine 9 of histone H3 (H3K9me3), probably via the additional interaction with an H3K9 histone methylransferase.

The multifaceted action of H19 is also illustrated by its dual interaction with miR pathways. On one hand, the lncRNA H19 acts as miR sponge to sequester miR-106a as well as the mir-let7 family members (Kallen et al., 2013; Imig et al., 2015). On the other end, H19 serves as a precursor of miR-675 that will in turn, post-translationally regulate a number of targets involved in cell tumorigenicity, including RB, IGFR1, SMAD1, SMAD5, CDC6, NOMO1, or RUNX1 (Cai and Cullen, 2007; Tsang et al., 2010; Gao et al., 2012; Keniry et al., 2012; Dey et al., 2014; Zhuang et al., 2014). The role of H19 in tumor progression could also be mediated through its interaction with the tumor-suppressor


TP53 protein. This association results in partial TP53 inactivation (Yang et al., 2012).

Several evidences also indicate that the H19 lncRNA controls IGF2 expression at the translational and/or post-translational levels (Li et al., 1998), suggesting that other mechanisms by which H19 exerts its action remain to be deciphered. Similarly, the possible role of RNA duplex formation between H19 and the antisense transcripts 91H and HOTS requires investigations.

# LncRNAs in Human Diseases

Given the wide range of molecular actions achieved by the lncRNAs and their roles in various physiological processes, it is not surprising that they have been shown to be involved in many human diseases. A number of data indicate that alterations of lncRNA expression lead to tumorigenesis through changes at the chromatin, transcriptional or post-transcriptional levels that impact target genes expression (**Table 1**). Since lncRNAs are regulating a different cellular pathways, growing evidences suggest that they could play a role in a large number of other human disorders including metabolic diseases, neurodegenerative and psychiatric disorders, cardiovascular and immune dysfunctions (Taft et al., 2010; Esteller, 2011; Harries, 2012; Shi et al., 2013; Clark and Blackshaw, 2014).

### Perspectives and Concluding Remarks

LncRNAs represent a large part of the transcriptome and a very heterogeneous class of transcripts in terms of genomic organization and modes of action. Many of them are considered as key regulators of gene expression and thus, lncR-NAs constitute an additional layer controlling the cellular programs. LncRNAs regulate diverse expression steps at the levels

# References


of chromatin rearrangement, transcriptional control, and/or post-transcriptional processing. By these actions, lncRNAs are involved in numerous physiological functions and in many cases lncRNA alterations are associated with human disorders.

The fact that lncRNAs can be deregulated in tumors and other human pathologies, make them attractive candidates as biomarkers and as targets for therapy. LncRNAs may be downregulated at the RNA levels by targeting their sequence. As so, short interfering RNAs (siRNAs) designed to perfectly match exact stretches of nucleotides, guarantee a high degree of specificity leading to lncRNA degradation. The power of the siRNA approach is illustrated by the success of a number of preclinical studies where siRNAs targeted mRNAs (Kaur et al., 2014). Similar approaches can thus be envisioned to target non-coding RNAs. Indeed, siRNAs have also been used to target miRs, leading to heart regeneration in an *in vivo* mouse model (Aguirre et al., 2014) and the use of siRNAs has been proposed in a therapeutic strategy targeting the lncRNA HOTAIR in endometrial carcinoma (Huang et al., 2014). Similarly, antisense oligonucleotides, single-strand DNA, or RNA molecules of 8 to 50 nucleotides can be used to target lncRNA. Specifically, *in vivo* and *in vitro* experiments revealed that antisense oligonucleotides directed against the lncRNA MALAT1 inhibit its expression and drastically reduce lung cancer metastasis (Gutschner et al., 2013; Tripathi et al., 2013).

In this context, further exploration in the complexity of the lncRNA world promises the emergence of novel therapeutic opportunities.

# Acknowledgments

This work is supported by Inserm, the University of Lille, and grants from INCa (PLBio 2010), le Comité du Nord de la Ligue Contre le Cancer and l'ITMO Biologie Cellulaire, Développement et Evolution (BCDE).


H19-miR-106a interaction. *Nat. Chem. Biol.* 11, 107–114. doi: 10.1038/nchembio.1713


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Angrand, Vennin, Le Bourhis and Adriaenssens. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Changes in expression of the long non-coding RNA *FMR4* associate with altered gene expression during differentiation of human neural precursor cells

#### *Edited by:*

*William Cho, Queen Elizabeth Hospital, Hong Kong*

#### *Reviewed by:*

*Francesco Nicassio, Istituto Italiano di Tecnologia, Italy Guney Bademci, University of Miami, USA Mohammadreza Hajjari, Shahid Chamran University of Ahvaz, Iran*

#### *\*Correspondence:*

*Claes Wahlestedt, Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, Miller School of Medicine, University of Miami, 1501 NW 10th Avenue, BRB # 407 (M-860), Miami, FL 33136, USA cwahlestedt@med.miami.edu*

#### *Specialty section:*

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

*Received: 14 May 2015 Accepted: 27 July 2015 Published: 10 August 2015*

#### *Citation:*

*Peschansky VJ, Pastori C, Zeier Z, Motti D, Wentzel K, Velmeshev D, Magistri M, Bixby JL, Lemmon VP, Silva JP and Wahlestedt C (2015) Changes in expression of the long non-coding RNA FMR4 associate with altered gene expression during differentiation of human neural precursor cells. Front. Genet. 6:263. doi: 10.3389/fgene.2015.00263* *Veronica J. Peschansky1, Chiara Pastori1, Zane Zeier1, Dario Motti2, Katya Wentzel1, Dmitry Velmeshev1, Marco Magistri1, John L. Bixby2,3,4,5, Vance P. Lemmon2,3,4, José P. Silva1 and Claes Wahlestedt1\**

*<sup>1</sup> Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, Miller School of Medicine, University of Miami, Miami, FL, USA, <sup>2</sup> Miami Project to Cure Paralysis, University of Miami, Miami, FL, USA, <sup>3</sup> Department of Neurological Surgery, Miller School of Medicine, University of Miami, Miami, FL, USA, <sup>4</sup> Center for Computational Science, University of Miami, Coral Gables, FL, USA, <sup>5</sup> Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA*

CGG repeat expansions in the Fragile X mental retardation 1 (*FMR1*) gene are responsible for a family of associated disorders characterized by either intellectual disability and autism Fragile X Syndrome (FXS), or adult-onset neurodegeneration Fragile X-associated Tremor/Ataxia Syndrome. However, the *FMR1* locus is complex and encodes several long non-coding RNAs, whose expression is altered by repeat expansion mutations. The role of these lncRNAs is thus far unknown; therefore we investigated the functionality of *FMR4*, which we previously identified. "Full"-length expansions of the *FMR1* triplet repeat cause silencing of both *FMR1* and *FMR4*, thus we are interested in potential loss-of-function that may add to phenotypic manifestation of FXS. Since the two transcripts do not exhibit *cis*-regulation of one another, we examined the potential for *FMR4* to regulate target genes at distal genomic loci using gene expression microarrays. We identified *FMR4*-responsive genes, including the methyl-CpG-binding domain protein 4 (*MBD4*). Furthermore, we found that in differentiating human neural precursor cells, *FMR4* expression is developmentally regulated in opposition to expression of both *FMR1* (which is expected to share a bidirectional promoter with *FMR4*) and *MBD4*. We therefore propose that *FMR4*'s function is as a gene-regulatory lncRNA and that this transcript may function in normal development. Closer examination of *FMR4* increases our understanding of the role of regulatory lncRNA and the consequences of *FMR1* repeat expansions.

Keywords: lncRNA, intellectual disability, epigenetics, differentiation, chromatin remodeling, *MBD4, FMR4,* Fragile X

**Abbreviations:** ASD, autism spectrum disorder; DIV, days *in vitro*; FMRP, Fragile X mental retardation protein; FXPOI, Fragile X-related primary ovarian insufficiency; FXS, Fragile X syndrome; FXTAS, Fragile X-associated tremor/ataxia syndrome; hNPCs, human neural precursor cells; hNS, human neurospheres; lncRNAs, long non-coding RNAs; mRNA, messenger RNA; ncRNAs, non-coding RNAs.

# Introduction

Fragile X Syndrome, FXTAS, and FXPOI are X-linked disorders that arise from expansions in a CGG-repeat region in the 5- -UTR of the *FMR1* gene. Normal *FMR1* alleles contain 6– 54 repeats, expansions from 55 to 200 repeats are considered "premutations" and all larger repeat sizes are known as the "full mutations." Individuals with a premutation may develop the adult-onset neurodegenerative disorder known as FXTAS, while women carrying the premutation are at risk for FXPOI. Only the full mutation leads to FXS, which is a common cause of inherited intellectual disability and autism (Oostra and Willemsen, 2003). *FMR1* premutations result in overproduction of toxic, expanded mRNAs that contribute to the development of FXPOI and FXTAS pathology (Tassone et al., 2000; Kenneson et al., 2001). Full mutations lead to DNA and repressive histone methylation of the *FMR1* locus (Sutcliffe et al., 1992; Hornstra et al., 1993; Coffee et al., 1999, 2002; Kumari and Usdin, 2010). Thus FXS derives from the loss of *FMR1* mRNA and protein FMRP. We and others have identified four non-coding transcripts with abnormal expression in response to Fragile X repeat expansions at the *FMR1* locus (Ladd et al., 2007; Khalil et al., 2008; Pastori et al., 2014), but their role in FXS/FXTAS/FXPOI phenotypes remains to be determined.

The vast majority of the human transcriptome is comprised of either long [>200 nucleotides (nt)] or short ncRNAs (Cheng et al., 2005; Banfai et al., 2012; Djebali et al., 2012). While short ncRNAs typically regulate gene expression through posttranscriptional mechanisms or by interfering with translation (Rother and Meister, 2011; Fabian and Sonenberg, 2012), lncRNAs (which can be many kilobases long) often act in *cis* or *trans* to regulate gene expression at their locus of origin or elsewhere in the genome, respectively. Evidence suggests that lncRNAs perform scaffolding functions by recruiting epigenetic complexes or ribonucleoproteins that cause chromatin remodeling (Wang and Chang, 2011). Other lncRNAs act posttranscriptionally by targeting mRNAs or translational machinery. Regardless of the mechanism, a growing body of evidence implicates lncRNAs in a myriad of normal cellular functions such as DNA damage response and mitosis (Tsai et al., 2010; Yap et al., 2010; Hung et al., 2011; Kotake et al., 2011; Wang and Chang, 2011) and in diseases, such as cancer (Hajjari et al., 2014).

Recent attention has focused more specifically on the role of lncRNAs in neurodevelopmental programs and diseases of the nervous system. For example, lncRNAs are involved in the differentiation of neural cell types, and synaptic signaling and maturation (Mercer et al., 2010; Qureshi et al., 2010). In addition, both short and long ncRNAs are known to be involved in Prader–Willi syndrome, which is a developmental disorder caused by paternal deletion of a maternally imprinted region and can present with metabolic dysregulation including circadian rhythm defects (Sahoo et al., 2008; De Smith et al., 2009; Powell et al., 2013). Both syndromic and non-syndromic ASD susceptibility loci also contain aberrantly expressed lncRNAs that may contribute to disease (Velmeshev et al., 2013; Ziats and Rennert, 2013). Dysfunction of lncRNAs has also been linked to pathogenesis of neurodegenerative disorders including Alzheimer's disease (Faghihi et al., 2008) and spinocerebellar ataxia type 7, another repeat expansion disorder (Sopher et al., 2011). In sum, there is a growing body of evidence supporting the involvement lncRNAs in both the normal and diseased nervous system, spurring further mechanistic inquiries.

*FMR4*, an untranslated, antisense lncRNA originating at the *FMR1* gene locus was shown to have anti-apoptotic functions in HEK293T and HeLa cells but to have no effect on expression of *FMR1* (Khalil et al., 2008). Here, we describe *FMR4*'s function as a regulator of gene expression in *trans* by identifying mRNA expression changes induced by *FMR4*. In particular, these effects in HEK293T cells are mirrored by discordant developmental regulation between *FMR4* and one of its targets, *MBD4*, in hNPCs.

# Materials and Methods

### HEK293T Cell Culture, Transfection, and RNA Extraction

HEK293T cells were cultured in DMEM with 10% FBS. In overexpression experiments, cells were transfected with pcDNA3.1-FMR4 or the empty pcDNA3.1 control vector using Lipfectamine 2000CD. For knockdown experiments, the siRNA FMR4(C) (Khalil et al., 2008), versus Silencer Negative Control siRNA #1 (Ambion) were used with the Lipofectamine RNAiMAX transfection reagent, according to manufacturer's instructions (Invitrogen). For microarray experiments, 1 <sup>×</sup> 106 cells were plated and transfected with 0.5 µg plasmid or 40 nM siRNA on the same day, and incubated for 6 or 24 h after transfection. For validation, *FMR4* was knocked down using two sequential siRNA transfections over 72 h. RNA was extracted using Trizol (Invitrogen) and the RNeasy Mini Kit, and treated with DNAse on-column using the RNAse-free DNAse Set (Qiagen) according to manufacturer's instructions.

### Microarray Hybridization and Analysis

At 6 or 24 h post transfection, RNA was extracted and samples were submitted to the Hussman Institute for Human Genomics Center for Genome Technology for microarray analysis using Affymetrix Human Gene 2.0 ST Arrays. Total RNA samples were first prepared using the Ambion WT Expression Kit (cat# 4411974). Briefly, the kit generates sensestrand cDNA from total RNA using a reverse transcription priming method that specifically primes non-ribosomal RNA, including both poly(A) and non-poly(A) mRNA. Next, samples are fragmented and labeled using the Affymetrix GeneChip WT Terminal Labeling Kit (cat# 902280). Final yield was hybridized onto the array, washed and stained using the Affymetrix GeneChip Hybridization, Wash, and Stain Kit (cat# 900720). Arrays were scanned using GeneChip Scanner 3000 7G system. Background subtraction, GC-RMA normalization and quality control were performed using the Affymetrix GeneChip Command Console Software and the bioconductor package from R. Data have been archived in the Gene Expression Omnibus at the National Center for Biotechnology Information, and assigned the accession number GSE70817.

### cDNA Synthesis and Quantitative Real-Time PCR (qPCR)

cDNA was synthesized using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) with 500 ng of total RNA per reaction. Gene-specific FMR4 cDNA was primed separately ("FMR4 RT": ATTGCTGGCAGTCGTTTCTT), in order to specifically detect the antisense transcript and prevent capture of overlapping sense transcripts arising from that genomic region. Random hexamer-primed cDNA libraries were used for detection of all other genes. FMR4 RNA expression was quantified using SYBR Green quantitative real-time PCR (qPCR) with the following primers, validated by melting curve: "FMR4 FW" – ACCAAACCAAACCAAACCAA and "FMR4 REV" – GTGGGAAATCAAATGCATCC. Commercially available TaqMan probes (Invitrogen) were used for all other transcripts (MBD4, cat# Hs00187498\_m1; FMR1, cat# Hs00924547\_m1; MALAT1, cat# Hs01910177\_s1). The endogenous control was glyceraldehyde 3-phosphate dehydrogenase (GAPDH, cat# 4326317E) where necessary, and data were analyzed by the --Ct method. For polyA detection, cDNA was synthesized from 1 µg total RNA using the High Capacity cDNA kit and oligodT primers at 50 nM final concentration. In the noRT control, we omitted the reverse-transcriptase enzyme from the cDNA synthesis reaction. Reverse transcription products were amplified using qPCR as described above and visualized on a 1.5% agarose gel.

### Human Neural Precursor Cell (hNPC) Culture, Differentiation, and Viral Transduction

hNPCs used in this study were derived from human fetal brains collected from third trimester aborted fetuses received from the Birth Defects Research Lab at the University of Washington in Seattle. This work was classified as "Non-Human Subject Research" by the Human Subject Research Office at the University of Miami, and therefore was not subject to Institutional Review Board approval. Briefly, brain tissues were dissected and dissociated using the trypsin-based Neural Tissue Dissociation Kit (Miltenyi Biotec, cat #130-093-231). Five million single cells were seeded into each 75-mm tissue culture flask in proliferation media supplemented with B27 (1X of proprietary formula), human Epidermal Growth Factor (hEGF) at 20 ng/mL, human Fibroblast Growth Factor (hFGF) at 10 ng/mL, heparin at 20 ug/mL, GlutaMax (1X, Gibco), and Primocin at 0.1 mg/mL (InvivoGen). Neural stem cells formed neurosphere colonies after approximately 7 DIV, while other cells remained in suspension as single cells or formed a monolayer on the flask surface. Neurospheres can be maintained as such in suspension for several months using proliferation media, or hNPCs can be differentiated, forming a mixture of neurons and astrocytes. Neurospheres were transduced with a lentiviral vector expressing *FMR4* (pLentiCMV/TO-mCherry-FMR4) or the control vector (pLentiCMV/TO-mCherry), and

collected for qPCR analysis after 2 days. To differentiate, neurospheres are dissociated into single cells with Accutase and cultured in Advanced DMEM/F12 media without hFGF or hEGF, but with Bottenstein's N2 at 1X (Invitrogen, proprietary formula), 2.5% fetal bovine serum, heparin at 20 µg/mL, GlutaMax (1X, Gibco) and Primocin at 0.1 mg/mL (InvivoGen).

### Subcellular Fractionation

Neurospheres were collected by centrifugation at 250 × *g* in order to form 50 µL pellets. Pellets were washed with PBS and fractionated with the NE-PER Nuclear and Cytoplasmic Extraction Reagents (Thermo Scientific, cat# 78835) according to manufacturer's instructions. Briefly, cells were sequentially lysed and centrifuged to first separate pelleted nuclei from cytoplasm, then to separate chromatin from nucleoplasmic components. RNA was extracted from solid chromatin with 1 mL of Trizol, and with 0.75 mL Trizol LS for every 0.25 mL of liquid fraction. cDNA prepared using random primers was used for TaqMan qPCR (*MALAT1, GAPDH,* and *FMR1*), while gene-specific priming and SYBR Green was used for *FMR4* as noted above. In each case, starting material for cDNA reactions was 2 µL total RNA (not equal mass of RNA), to enable comparison of absolute quantities of each transcript between compartments. Relative quantification (RQ) for each transcript in each compartment was calculated from Cq values by qPCR. Individual fraction RQ values were normalized to the total detected amount for each transcript.

# Results

### *Fmr4* Induces Genome-Wide Changes in Gene Expression

Previous studies of the 2.4 kb antisense lncRNA *FMR4* described no *cis-*regulation of *FMR1* (Khalil et al., 2008); therefore we hypothesized that *FMR4* would regulate gene expression in *trans*, which is a well-documented function of other lncRNAs (Rinn et al., 2007; Kino et al., 2010; Miyagawa et al., 2012). In order to comprehensively measure gene regulation in response to *FMR4* at the mRNA level, we treated HEK293T cells with either an siRNA against *FMR4,* a scrambled control siRNA (knockdown), pcDNA3.1-*FMR4*, or the empty pcDNA3.1 vector (overexpression) and processed for microarray hybridization after 6 or 24 h. Using LIMMA, a linear modeling approach (Smyth, 2004), we identified differential expression of over 3,700 genes between *FMR4* knockdown, overexpression and their respective control conditions, and characterized the pattern of target gene expression relative to *FMR4*. To this end, we used the Cluster Affinity tool of TIGR MultiExperiment Viewer to identify genes with opposite behavior in the knockdown condition compared to the overexpression condition. This strategy narrowed our focus to the 238 transcripts represented in **Figure 1A** and Supplementary Figure S1, which are further classified by whether they are concordant or discordant with respect to *FMR4* changes. This analysis yielded 155 and 83 target genes with concordant and discordant changes in mRNA expression

relative to *FMR4*, respectively. These data support the idea that *FMR4* is a regulator of gene expression through *trans*activity.

# Pathway Analysis and *FMR4* Target Validation

We then analyzed the cohort of concordant and discordant *FMR4*-responsive genes with GeneGo Metacore (**Tables 1** and **2**). Cell cycle regulation and apoptosis were highly ranked biological processes affected by *FMR4*, which is consistent with our earlier findings (Khalil et al., 2008). Additionally, *FMR4*-sensitive genes are enriched in developmental processes in general and neurodevelopmental processes in particular (e.g., adrenergic and opioid signaling, Wnt pathway, cytoskeletal elements, synaptogenesis). Informed by the pathway analysis and previous insights into *FMR4*'s function, we used HEK293T cells with *FMR4* knocked down by sequential siRNA transfections (**Figure 1B**) to validate a 28% increase in methyl CpG-binding domain (*MBD4*; **Figure 1C**).

#### Developmental Regulation of Gene Expression by *FMR4* in hNPCs

To investigate the putative role of *FMR4* in neurodevelopment and Fragile X-associated neurological disorders, we used an *in vitro* model system consisting of human fetal-derived neurospheres (hNSs). These cells can be maintained as precursor in hNSs, or induced to differentiate into a mixed culture of early neurons and glia (see Materials and Methods). This system has the advantage of being a primary culture of human brain cells (critical for studying a primate-specific transcript *in vitro*)



without the need for reprogramming, as would be the case with embryonic or induced pluripotent stem cells. We focused on the relationship between *FMR4* and its putative target gene, *MBD4*, which we identified by microarray analysis. *MBD4* is a transcriptional repressor involved in DNA repair (Ballestar and Wolffe, 2001; Kondo et al., 2005). Expression of *MBD4* developmentally regulates several tissues (Ruddock-D'cruz et al., 2008; Zhang et al., 2014a), and its aberrant expression in hippocampal GABAergic neurons in psychiatric disease may be linked to abnormal differentiation in these cells (Benes et al., 2009).

The *FMR1* locus is crucial to normal brain development; thus, we wanted to determine whether *FMR4* expression is dependent on developmental stage. We extracted RNA from undissociated hNS ["0 days *in vitro*" (DIV)] and cultured cells from dissociated hNS up to five DIV in differentiation media. At both time points, we measured *FMR4* expression as well as that of *FMR1*, to determine whether these transcripts are independently regulated. We observed that *FMR4* expression



is significantly decreased in differentiating cells at 5 DIV while *FMR1* is increased at the same time point (**Figure 2A**). We then measured *MBD4* and found that it is also developmentally regulated. As indicated by the microarray analysis, *MBD4* was increased at a time point when *FMR4* expression is low (**Figure 2A**). We also observed changes in expression level of other *FMR4* target genes in undifferentiated, proliferating hNPCs transduced with an mCherry-tagged lentiviral vector expressing *FMR4* (Supplementary Figure S2A). We found that *FMR4* overexpression significantly upregulated two putative targets, the deubiquitinase *YOD1* and the G-protein subunit *GNG12*, and downregulated the ribonucleotide reductase *RRM2* (Supplementary Figure S2B). These data corroborate our finding that *FMR4* regulates gene expression in *trans*.

### Molecular Mechanisms of *FMR4*

To better understand the molecular role of *FMR4*, we performed subcellular fractionation of hNS. As expected, *GAPDH* and *FMR1* mRNA were highest in the cytoplasmic fraction (87.6 and 67.0%, respectively) where they are translated (Duszczyk et al., 2011) (**Figure 2B**). Some lncRNAs are expressed predominantly in the nucleus. A well-known example of this is the lncRNA *NEAT2/MALAT1* (Gutschner et al., 2013). We measured this transcript as a positive control for nuclear transcripts, and found it enriched in the nucleoplasm (41.6%) and chromatin (54.0%) relative to the cytoplasm (**Figure 2C**). *FMR4* RNA was primarily localized (73.4%) to chromatin (**Figure 2C**), which is consistent with a transcriptional regulatory function of *FMR4*.

Since RNA polyadenylation frequently targets mature RNA for export into the cytoplasm (Zhang et al., 2014b); many lncRNAs are not polyadenylated. Therefore, to establish whether *FMR4* can be polyadenylated, we performed first-strand cDNA synthesis with oligodT primers, thereby capturing polyadenylated transcripts from total hNS RNA. We observed *via* qPCR that in addition to *FMR1* and *GAPDH* mRNAs (which are expected to be polyadenylated), a detectable portion of *FMR4* is also polyadenylated (Supplementary Figure S3). These data suggest that a fraction of *FMR4* may be stabilized by polyadenylation; this would increase the molecule's half-life and permit its diffusion to distant genomic loci.

## Discussion

Loss of *FMR1* mRNA and FMRP function has been widely studied, but there is relatively little known about ncRNA encoded by the locus. Here we show that one such ncRNA, *FMR4*, may regulate mRNA expression genome-wide *via* a developmentally regulated transcriptional mechanism, thereby impacting important biological processes.

Similar to *FMR4*, *trans*-acting lncRNAs [such as *HOTAIR*, *MALAT1,* and *GAS5* (Rinn et al., 2007; Kino et al., 2010; Miyagawa et al., 2012)] affect loci far from their genomic locus of origin. In this study, we confirmed that knockdown and overexpression of *FMR4* causes changes in genes involved in proliferation and differentiation. As a chromatin-associated transcript, *FMR4* may act at the transcriptional level by forming complexes with histone modifying enzymes or by directly targeting mRNA stability, splicing, or editing. It remains to be seen whether these observations are dependent on RNA-protein interactions, and whether they result from direct epigenetic changes or *via* downstream effects.

Our data show that *FMR4* is developmentally regulated in an hNPC model. After 5 days of *in vitro* differentiation, *FMR4* expression is significantly reduced, while that of both *FMR1* and the *FMR4* target gene *MBD4* is increased. Since *FMR4* and *FMR1* do not interact in *cis* but are discordantly expressed with differentiation of hNPCs, the bidirectional promoter responsible for expression of both of these transcripts might be activated in only one direction at a time. An alternative possibility is that *FMR4* RNA is degraded at a higher rate during this period while *FMR1* is not. Nevertheless, it would be interesting to determine whether transcription factors normally regulate this locus as a whole or target each transcript individually. It is also unclear whether the decrease in *FMR4* lncRNA expression contributes to disease in addition to loss of *FMR1*, or is an artifact of the locus-wide, full mutation-induced epigenetic changes causing transcriptional repression. Future studies will be necessary to distinguish between these possible mechanisms and consequences of *FMR4* regulation.

Discordant regulation of *MBD4* and *FMR4* leads us to conclude that *FMR4* regulates expression of genes at distal loci, as the same relationship was identified by our overexpression and knockdown studies. Our independent validation studies confirm this was not due to false positive identification, which is a common problem in genome-wide analyses such as microarray studies. *FMR4*'s localization to the chromatin fraction points to a transcriptional mechanism for this effect, however, studies evaluating the direct binding of *FMR4* to nucleic acids or proteins are warranted. Such studies could suggest a direct role for *FMR4* in specifically regulating *MBD4* and gene expression more broadly, although this cannot be concluded definitively based solely on changes in mRNA levels. We acknowledge that differences in RNA processing or any number of effects downstream of *FMR4* could be responsible for the observed differential gene expression, therefore it would be useful to examine precise interactions between *FMR4* and its targets. Chromatin immunoprecipitation experiments could, for example, establish an interaction between *FMR4* and promoter regions of target genes. Based on the function of other lncRNAs, one could speculate that *FMR4* helps form three-dimensional interactions between distant sequence elements such as promoters and enhancers (Ma et al., 2014), and that *FMR4* polyadenylation targets this transcript for further RNA processing (Norbury, 2013). Answering these questions would help our understanding of *FMR4* function in particular and continue the rapid expansion of evidence on the elements that govern lncRNA actions in general.

In this study we have described novel facets of *FMR4* functionality as it relates to neurodevelopment, and suggest that perturbation in the expression of this lncRNA may contribute to pathogenesis of the Fragile X repeat expansionassociated disorders. We report that *trans-*regulatory activity of *FMR4* is corroborated by changes in target gene expression during differentiation of hNPCs. With this new information, we have further developed the evidence supporting the role of primate-specific lncRNAs in complex developmental programs and opened new avenues of research into the causes and therapies for Fragile X-associated neurological disorders.

### Funding

This work was supported by the National Institutes of Health (MH084880 to CW, HD057521 to VL and NS059866 to JB), as

## References


well as the Lois Pope LIFE Foundation Development Award and the Louis J. Elsas Research Award in Biochemical Genetics to VP.

# Author Contributions

VP, CP, DM, KW, DV, and MM contributed to the acquisition, analysis, and interpretation of data. VP, CP, ZZ, JB, VL, JS, and CW made substantial contributions to the conception and design of the work. All authors participated in drafting and/or revising the manuscript for intellectual content, approval for publication and agree to be accountable for all aspects of the work.

# Acknowledgments

The authors would like to acknowledge Yan Shi, MS, for her help with high content screening assays and Nagi Ayad, Ph.D., for his guidance and critical reading of the manuscript.

# Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fgene. 2015.00263


**Conflict of Interest Statement:** The reviewer Guney Bademci declares that, despite being affiliated with the same institute as the authors Veronica J. Peschansky, Chiara Pastori, Zane Zeier, Dario Motti, Katya Wentzel, Dmitry Velmeshev, Marco Magistri, John L. Bixby, Vance P. Lemmon, José P. Silva, and Claes Wahlestedt, the review process was conducted objectively.

*Copyright © 2015 Peschansky, Pastori, Zeier, Motti, Wentzel, Velmeshev, Magistri, Bixby, Lemmon, Silva and Wahlestedt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Long non-coding RNA SOX2OT: expression signature, splicing patterns, and emerging roles in pluripotency and tumorigenesis**

*Alireza Shahryari <sup>1</sup> , Marie Saghaeian Jazi <sup>2</sup> , Nader M. Samaei <sup>3</sup> and Seyed J. Mowla <sup>4</sup> \**

*<sup>1</sup> Stem Cell Research Center, Golestan University of Medical Sciences, Gorgan, Iran, <sup>2</sup> Department of Molecular Medicine, Faculty of Advanced Medical Technologies, Golestan University of Medical Sciences, Gorgan, Iran, <sup>3</sup> Department of Medical Genetics, Faculty of Advanced Medical Technologies, Golestan University of Medical Sciences, Gorgan, Iran, <sup>4</sup> Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran*

#### *Edited by:*

*Michael Rossbach, Genome Institute of Singapore, Singapore*

#### *Reviewed by:*

*Xin-An Liu, The Scripps Research Institute, USA Ralf Jauch, Chinese Academy of Sciences, China*

#### *\*Correspondence:*

*Seyed J. Mowla, Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, P.O. Box 14115-175, Tehran, Iran sjmowla@modares.ac.ir*

#### *Specialty section:*

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

*Received: 27 February 2015 Accepted: 18 May 2015 Published: 17 June 2015*

#### *Citation:*

*Shahryari A, Saghaeian Jazi M, Samaei NM and Mowla SJ (2015) Long non-coding RNA SOX2OT: expression signature, splicing patterns, and emerging roles in pluripotency and tumorigenesis. Front. Genet. 6:196. doi: 10.3389/fgene.2015.00196* SOX2 overlapping transcript (SOX2OT) is a long non-coding RNA which harbors one of the major regulators of pluripotency, SOX2 gene, in its intronic region. *SOX2OT* gene is mapped to human chromosome 3q26.3 (Chr3q26.3) locus and is extended in a high conserved region of over 700 kb. Little is known about the exact role of SOX2OT; however, recent studies have demonstrated a positive role for it in transcription regulation of *SOX2* gene. Similar to SOX2, SOX2OT is highly expressed in embryonic stem cells and down-regulated upon the induction of differentiation. SOX2OT is dynamically regulated during the embryogenesis of vertebrates, and delimited to the brain in adult mice and human. Recently, the disregulation of SOX2OT expression and its concomitant expression with SOX2 have become highlighted in some somatic cancers including esophageal squamous cell carcinoma, lung squamous cell carcinoma, and breast cancer. Interestingly, SOX2OT is differentially spliced into multiple mRNA-like transcripts in stem and cancer cells. In this review, we are describing the structural and functional features of *SOX2OT*, with an emphasis on its expression signature, its splicing patterns and its critical function in the regulation of *SOX2* expression during development and tumorigenesis.

#### **Keywords: lncRNA, SOX2OT, splicing pattern, expression signature, pluripotency, tumorigenesis, stem cell**

# **Introduction**

According to the recent genome-wide studies, most of the human genome is transcribed, yielding a complex network of large and small RNA molecules in human cells. However, only 1–2% of the transcripts have the capacity for protein translation (Kapranov et al., 2007; Guttman et al., 2009). The new class of long (or large) non-coding RNAs (lncRNAs) comprises the most proportion of the human transcriptome. Little is known about the exact functional roles of lncRNAs in human. Nevertheless, some recent studies have reported dysregulations of lncRNAs in several human disorders. LncRNAs key roles in the regulation of pluripotency, stem cells differentiation, and tumorigenesis are emerging (Perez et al., 2007; Gupta et al., 2010; Loewer et al., 2010; Esteller, 2011; Ng et al., 2011; Prensner and Chinnaiyan, 2011). Furthermore, a number of studies have achieved toward a therapeutic effect for some genetic disorders by targeting an lncRNA *in vitro* and *in vivo* (Gupta et al., 2010; Gutschner et al., 2013; Meng et al., 2015).

SOX2 is a HMG-box transcription factor which is essential for the maintenance of self-renewal and the pluripotency of undifferentiated embryonic stem cells (Avilion et al., 2003; Fong et al., 2008). More interestingly, SOX2 along with OCT4, c-Myc and Klf4 plays a critical role in the generation of induced pluripotent stem cells (iPSC) from both adult human and mouse somatic cells (Takahashi and Yamanaka, 2006; Takahashi et al., 2007). Recently, it has been suggested that SOX2 promotes tumor initiation and controls cancer stem cell properties in squamous cell carcinoma (SCC) of the skin tumors (Boumahdi et al., 2014). The single-exon *SOX2* gene was mapped to the human chromosome 3q26.3 (Chr3q26.3) locus, where it is embedded within the intronic region of a multi-exon lncRNA, known as *SOX2 overlapping transcript (SOX2OT)* which transcribed in the same orientation as *SOX2* (Fantes et al., 2003). While little is known about the exact role of *SOX2OT*, recent studies have demonstrated a positive role for it in the regulation of *SOX2* gene in human stem cells (Amaral et al., 2009; Shahryari et al., 2014).

Human *SOX2OT* gene has a high nucleotide identity with its ortholog in mouse and other vertebrates, demonstrating its high degree of evolutionarily conservation. The multi-exon SOX2OT has no open reading frame (ORF), but is spliced into several mRNA-like transcripts with the longest one of approximately 3.5 kb in human (Amaral et al., 2009; Shahryari et al., 2014). Close concomitant expression patterns of SOX2OT and SOX2 in stem cells and some human cancers, have all suggested that they may be co-regulated and involved in similar molecular pathways. Accordingly, some recent reports have demonstrated the transcriptional regulation of *SOX2* by SOX2OT (Amaral et al., 2009; Askarian-Amiri et al., 2014; Hou et al., 2014; Shahryari et al., 2014).

In this review, we have delineated the complex structure and functional features of *SOX2OT* locus, with more emphasis on its expression and splicing patterns, and its potential role in the regulation of SOX2 expression during the development and cancer progression.

# **Genomic Architecture of** *SOX2OT* **Gene Region in Vertebrates**

*SOX2OT* gene with the official symbol of *SOX2-OT* (also known as NCRNA00043) was originally mapped to the human chromosome 3q26.3-q27 locus [current location of NC\_000003.12 (181056680..181742228)], and harbors one of the main regulators of pluripotency, *SOX2* gene [also known as *ANOP3* and *MCOPS3*, with current location of NC\_000003.12 (181711924..181714436)], in its intronic region overlapping in the same transcriptional orientation (Fantes et al., 2003). *SOX2OT* gene is located and extended in a highly conserved region of over 700 kb in human and other vertebrates (Fantes et al., 2003; Amaral et al., 2009; **Figure 1A**).

Amplification of several genomic regions at 3q26-qter chromosome is associated with multiple human cancers (Massion et al., 2002; Jiang et al., 2004). The gene amplification events in those regions, particularly in q26–q29 region of chromosome 3, are present in the multiple types of SCCs of different tissues including lung, head and neck, esophagus, and cervix (Gebhart and Liehr, 2000; Balsara and Testa, 2002; Bass et al., 2009). Interestingly, a 2 Mb gained/amplified genomic region in 3q26.3 which encompasses *SOX2* and *SOX2OT* has been reported in lung SCC (Hussenet et al., 2010).

Chromatin modification maps of chromosome 3q26.3-q27 acquired by chromatin immune precipitation sequencing (ChIP-Seq) data represented several transcription start sites (TSSs) for *SOX2OT* gene (Mikkelsen et al., 2007; Amaral et al., 2009). Those promoter regions embedded within 1–7 non-coding highly conserved sequence blocks in vertebrates known as highly conserved elements (HCEs), probably are associated with the regulatory region of *SOX2OT*. These blocks of transposon-free regions with over 5 kb long have remained resistant to transposon invasion throughout vertebrate evolution and encompassed regulatory sequences controlling the expression of genes that are involved in early development (Simons et al., 2006; Amaral et al., 2009).

Interestingly, analysis of the alternative TSSs of *Sox2ot* orthologous in various vertebrates demonstrated the existence of a distal promoter, located over 500 kb upstream of the *SOX2OT* sequence in mouse and human (*SOX2OT* refers to human and *Sox2ot* to non-human). This distal promoter region which is associated with the transposon-free region, highly positional conserved elements, and histone modification marks of promoters, created a novel isoform of Sox2ot termed Sox2dot (Sox2 distal overlapping transcript) which has HCE 1 with an enhancer-like function in the mouse's developing forebrain (Amaral et al., 2009). Here, we bioinformatically analyzed the potential binding sites for transcription factors in a highly conserved genomic regions upstream of *SOX2OT* and *SOX2DOT*. As illustrated in **Figure 1**, the data represent the existence of binding sites of several transcription factors involved in cancer progression as well as stem cells pluripotency and differentiation in those regulatory regions. Noticeably, the number and distribution of binding sites of some transaction factors belonging to POU domain and HMG-box families is surprising (**Figures 1B,C**).

Primary sequence analysis of *sox2ot* in vertebrates including fish, reptiles, amphibians and mammals highlighted some highly conserved regions, including a 400-nt segment in exons near to *SOX2* gene, as well as an upstream region with more than 90% identity between mouse and human genomes. However, there is only a low degree of conservation when full length sequence of *SOX2OT* gene (*∼*750 kb) is compared among different species (Amaral et al., 2009; **Figure 2**).

# **Splicing Patterns of** *SOX2OT* **Gene in Human and Other Vertebrates**

Protein-coding capacity parameters including ORF length, synonymous versus non-synonymous base substitution rates, and similarity to known proteins demonstrated that human and mouse SOX2OT/Sox2ot full-length sequences have no significant protein-coding potential. Nevertheless, there is a possibility for generation of some small peptides, encoded by some transcripts (Dinger et al., 2008; Amaral et al., 2009). Mark signs of mRNAs including a lot of cap and poly Adenine signals suggest that

**FIGURE 1 | Genomic architecture of Chr3q26.33 region in human and vertebrates. (A)** The banding pattern of chromosome 3 and location of *SOX2OT* locus of 3q26.33 is presented according to the UCSC genome browser (h19 assembly). **(B)** The conserved transcription factor binding sites is presented at upstream of human genomic regions of SOX2OT and the isoform of SOX2DOT. The binding sites distribution for multiple transcription factors of POU domain and HMG-box families is noticeable. **(C)** A high degree of conservation at upstream of genomic regions of SOX2OT and SOX2DOT in 100 vertebrates is presented, using Multiz alignment program (adopted from http://genome.ucsc.edu).

around SOX2 overlapping region.

presented. The data is compared with some well-known vertebrates

*SOX2OT* gene is transcribed by RNA polymerase II enzyme, and produces a mRNA-like lncRNA transcript (Numata et al., 2003; Amaral et al., 2009).

Human and mouse SOX2OT have multiple TSSs, and several alternatively spliced variants and polyadenylation sites have already been reported for them (Amaral et al., 2009). Several full-length clones of mouse sox2ot have been registered with a wide range of sizes, from 638 nucleotides (GenBank accession no. BY721402) to an approximately 3.5 kb form (accession no. AK031919). The various sizes of the registered cDNA clones are in accordance with the Northern blot data obtained from several mouse tissues. While the most abundant isoform of sox2ot posses a size of *∼*3 kb, several other rare ones with approximate sizes of 1, 4, 6, and *>*10 kb have also been reported in some mouse tissues. In zebrafish embryo, Northern blot analysis revealed an abundant 2.5 kb transcript variant and two other less abundant transcripts of 1.5 and 6 kb (Amaral et al., 2009).

As we have previously reported, SOX2OT is spliced into several transcript variants, including SOX2OT, SOX2OT-S1, and SOX2OT-S2 which co-upregulated with master regulators of pluripotency, SOX2 and OCT4, in esophageal squamous cell carcinoma (ESCC). SOX2OT-S1 (Accession no: JN711430, GI: 379031002) lacks exon 4 of the main transcript, whereas SOX2OT-S2 (SOX2OT-S2; Accession no: JN882275, GI: 379031003) lacks exons 3 and 4. In addition to the experimentally approved novel transcripts, human EST database (dbEST) also provided some ESTs with GenBank accession numbers BX423294.2, BX442540.2, BX459910.2, DA268964.1, and DA282731.1 which are related to the novel sequence of exon 3-exon five junction in SOX2OT-S1, and DA308672.1 which is related to the novel sequence of exon 2 exon five junction in SOX2OT-S2 variant (Shahryari et al., 2014).

More than 15 different *Major Class* of introns (GT-AG), at least 13 spliced variants, and six TSSs were presented for SOX2OT using bioinformatics analysis and AceView annotation (Amaral et al., 2009). Our group has been also identified several novel variants of SOX2DOT, which demonstrates a complex pattern of TSSs and alternative splicing of SOX2OT (**Figure 3A**). According to the validated NCBI Reference Sequence (RefSeqs), splicing patterns of SOX2OT, as illustrated in **Figure 3**, generates at least six transcript variants. Among those, three variants are generated from alternative splicing of SOX2OT, while the other three ones are originated from SOX2DOT (**Figures 3B,C**).

# **Expression Signature of SOX2OT in Somatic, Stem, and Cancer Cells**

Sox2ot isoforms are widely expressed in whole embryo and newborn mouse, but in adult tissues their expression is primarily restricted to brain. It is also expressed at lower levels in tissues where Sox2 is also expressed, such as lung, as well as in tissues were sox2 is not expressed, such as testis. Nevertheless, sox2dot isoform is exclusively expressed in adult mouse brain tissues. Concomitant with Sox2, Sox2ot is mainly expressed in mouse embryonic stem cells and downregulated during the course of differentiation. Nevertheless, only Sox2ot is upregulated during the late mouse embryoid body differentiation events. Moreover, expression of Sox2 and Sox2ot are coregulated during mouse neurosphere differentiation *in vitro*. Accordingly, Sox2dot isoform is also upregulated upon the induction of differentiation in neurospheres. Similar to mouse, Sox2ot and sox2 are also dynamically regulated during embryogenesis of other vertebrate, including

chicken and zebrafish (Mercer et al., 2008; Amaral et al., 2009).

The lncRNA SOX2OT is co-upregulated with master regulators of pluripotency, SOX2 and OCT4, in ESCC. The qRT-PCR analysis revealed a high level of SOX2OT expression in tumor samples of ESCC, compared to the apparently non-tumor marginal tissues from the same patients, which suggested a potential part for it in tumorigenesis of esophagus (Shahryari et al., 2014).

A concomitant expression pattern of *SOX2OT* with that of *SOX2* and *OCT4* genes is reported in a pluripotent cell line, NT2. SOX2OT and its variants also proved to have a distinct expression pattern during neural differentiation of NT2 cells. The expression pattern of SOX2OT variants was similar to those of SOX2 and OCT4, and downregulated upon the induction of neural differentiation. However, in contrast to a complete shut-down of SOX2 and OCT4 expression, a low expression of SOX2OT and its variants is persisted in later time points of differentiation (Shahryari et al., 2014).

Distinct differences in the expression patterns of SOX2OT and SOX2 were observed in breast cancer tissue samples. Analysis of the genome-wide RNA transcript profiles from the Cancer Genome Atlas (breast invasive carcinoma gene expression) by RNA Seq data set in 1106 samples of breast cancer tissues revealed the concordant expression of SOX2OT and SOX2 in this somatic cancer. SOX2OT and SOX2 are highly expressed in estrogen receptor positive (ER+) breast cancer cell lines, in comparison with the ER*−* ones. In ER+ breast cancer cell lines, expression of SOX2OT is positively correlated with SOX2 expression level, albeit at lower levels. Moreover, SOX2OT and SOX2 are coupregulated in suspension culture conditions of breast cancer cell lines which advocates the growth of cellular subpopulation with cancer stem cell-like properties (Askarian-Amiri et al., 2014).

Overexpression of both SOX2OT and SOX2 has been reported in human primary lung cancer tissues, in comparison with the corresponding non-tumor samples. Furthermore, SOX2OT demonstrated a significant high expression level in SCC of the lung, compared with adenocarcinoma ones. There was a positive correlation between SOX2OT and SOX2 expression levels in the same lung cancer tissue samples (Hou et al., 2014).

In order to expand our knowledge of expression regulations, we reviewed some resources on gene expression profile of SOX2OT and SOX2. Exploring the expressed sequence tags (ESTs) profiles which are available from NCBI, demonstrated the expression patterns of SOX2 and its overlapping transcript in multiple pools of different human tissues and tumors. The data represent the possibility of SOX2OT and SOX2 expression in a wide list of human tissues including brain, connective tissue, esophagus, eye, intestine, kidney, lung, muscle, nerve, and testis. More interestingly, the data hint the possibility of upregulation of SOX2OT expression in glioma and kidney tumors. In agreement with the data reported by Amaral et al. (2009) our results also revealed a high enrichment of SOX2OT expression in CNS libraries (**Figures 4A,B**). The high expression of SOX2OT and some other lncRNAs in CNS tissues suggests a potential role for them in animal brain development and function (Amaral and Mattick, 2008).

We also evaluated cancer genome anatomy project resources [Cancer Genome Anatomy Project (CGAP)] to find out a correlation between the expression signatures of *SOX2OT* with that of other genes. Based on SAGE (Serial Analysis of Gene Expression) data, SOX2OT represented a significant positive and negative correlation with multiple key genes involved in neuronal development (e.g., *LRRC4B*) addressing its function in CNS development. Furthermore, cancer associated genes (e.g., *ROCK2*, *NFKB*) are also significantly correlated with SOX2OT in SAGE libraries; which highlighted the potential function of SOX2OT in cancer progression. Noticeably, a significant positive correlation of *POU3F2* transcription factor which had multiple binding sites in genomic regulatory region of SOX2OT was observed (**Figure 4C**).

# **The Potential Roles of SOX2OT in Pluripotency and/or Tumorigenesis Through Regulation of SOX2 Expression**

Transcription factor SOX2 regulates the expression of more than one thousand genes in stem cells where small changes of its expression strikingly alter the self-renewal and pluripotency properties; hence SOX2 acts role as a molecular *rheostat* in those cells (Boyer et al., 2005; Boer et al., 2007; Kopp et al., 2008; Amaral et al., 2009; Mandalos et al., 2014). Recent evidences have demonstrated that gene amplification and/or aberrant expression level of SOX2 play a role in the development and tumorigenesis of many types of cancer including pancreatic carcinoma, prostate, breast, lung, gastric, and esophagus cancers (Gure et al., 2000; Sattler et al., 2000; Sanada et al., 2006; Rodriguez-Pinilla et al., 2007; Chen et al., 2008; Jia et al., 2011; Hütz et al., 2013). SOX2 is also involved in the proliferation and anchorage-independent growth of esophageal and lung cell lines. SOX2-driven tumors expressed both squamous differentiation and pluripotency markers which introduced SOX2 as a lineagesurvival oncogene in SCC of both lung and esophagus (Bass et al., 2009). Nevertheless, the exact regulation of SOX2 in pathwaydependent pluripotency and tumorigenesis has not been fully addressed yet.

LncRNAs have been suggested to regulate the expression of neighboring overlapped or antisense genes via different mechanisms (Mercer et al., 2009; Hung and Chang, 2010). The location of *SOX2* gene within the intronic region of *SOX2OT* gene proposed a possibility for SOX2 expression regulation by SOX2OT. This hypothesis is more approved by several experimental approaches obtained from gene expression alteration during stem cell differentiation or carcinogenesis, and also by manipulation of SOX2OT expression *in vitro* (Amaral et al., 2009; Askarian-Amiri et al., 2014; Hou et al., 2014; Shahryari et al., 2014). Similar dynamic regulation of sox2ot transcripts and sox2 proposed a conserved role for sox2ot in vertebrate embryogenesis and neuronal system development (Amaral et al., 2009).

Using the RNA interference strategy, our group performed a functional assay on SOX2OT, where the data supported our hypothesis on the existence of a positive regulation of SOX2 and OCT4 by SOX2OT (Shahryari et al., 2014). In line with the data, Askarian-Amiri et al. (2014) demonstrated that ectopic expression of SOX2OT caused increased SOX2 expression level. They also demonstrated that the enriched suspension culture of breast cancer cells, which favors stem cell growth, exhibited upregulation of both SOX2 and SOX2OT expression, in comparison to the original adherent cells (Askarian-Amiri et al., 2014).

Furthermore, SOX2OT exerts regulatory function in cell cycle progression; hence its association with carcinogenesis of human tumors of breast (Askarian-Amiri et al., 2014), esophagus (Shahryari et al., 2014), and lung (Hussenet et al., 2010; Hou et al., 2014) cancers is not surprising. SOX2OT controls lung cancer cell proliferation, and represents a novel prognostic indicator for this cancer (Hou et al., 2014). The knocking down of SOX2OT caused induction of G2/M arrest, prohibition of S phase entry and inhibited cell proliferation which correlated with reduced protein levels of Cyclin B1 and Cdc2 in human lung cancer cell lines. SOX2OT moderated lung cancer cell cycle progression through regulating EZH2 expression level; albeit any evidence of physical interaction between them has not been observed (Hou et al., 2014). EZH2 (a histone-lysine *N*-methyltransferase enzyme) is a major component of the polycomb repressive complex 2 (PRC2) which is involved in maintaining the transcriptional repressive state of its target genes (Cardoso et al., 2000; Cao et al., 2002).

High expression levels of SOX2OT and SOX2 are associated with estrogen receptor status and tamoxifen sensitivity of breast cancer cells (Askarian-Amiri et al., 2014). SOX2OT and SOX2 coupregulation has been reported in lung tumor tissues, particularly in squamous cell lung carcinoma (Hussenet et al., 2010; Hou et al., 2014), which is related to 3q26.33 genomic amplification (Hou et al., 2014). A statistically significant correlation coefficient between SOX2 and SOX2OT in cancer tissues (Askarian-Amiri et al., 2014; Hou et al., 2014; Shahryari et al., 2014), suggested the possibility of SOX2OT role in the regulation of SOX2 expression.

Altogether, current evidences indicate a functional association between SOX2OT and SOX2 in tumorigenesis, cellular differentiation, and pluripotency (**Table 1**). Yet, more remains to be investigated on the mechanisms underlying this regulation.

# **Concluding Remarks**

According to recent achievements, a large number of lncRNAs primarily exert their biological functions through induction of epigenetic events including DNA methylation or histon modifications in their target genes. This is mediated by the well-known chromatin modifying complexes of PRC1 and PRC2, as well as other related complexes in a *cis-* or *trans*-


**TABLE 1 | Recent studies which highlighted emerging roles of SOX2OT in pluripotency and carcinogenesis.**

acting manner (Prensner and Chinnaiyan, 2011; Wang and Chang, 2011; Brockdorff, 2013). Multiple lncRNAs including HOTAIR, ANCR, and ANRIL are able to recruit PRC1 or PRC2 complexes to genomic regulatory regions of their target genes to reshape/regulate the chromatin state/their expression (Gupta et al., 2010; Aguilo et al., 2011; Kretz et al., 2012).

LncRNA ANRIL is involved in various mechanisms of epigenetic regulation including triggering a repression of INK4 locus by SUZ12 in PRC2 (Kotake et al., 2011), an induction of chromatin silencing of the CDKN2A/B genes through interaction with CBX7 in PRC1 (Yap et al., 2010), and an alteration of DNA methylation of the locus in differentiated cells (Yu et al., 2008). Genomic association of *SOX2* and *SOX2OT* remarkably resembles that of *ANRIL* and *CDKN2B*. Similarly, the lncRNA *ANRIL* holds the protein-coding gene *CDKN2B* in its intronic region, albeit in the antisense/opposite strand.

A brain specific lncRNA known as RMST which is involved in modulating neurogenesis physically interacts with SOX2. By acting as a transcriptional coregulator, RMST helps SOX2 to bind to regulatory regions of that of target genes which have a role in the regulation of neural stem cell fate (Ng et al., 2013). Although recent studies on SOX2OT and SOX2 have not claimed the existence of any physical interaction between them, the functional assays obtained from both knockdown and overexpression events have demonstrated that SOX2OT has a positive effect on SOX2 expression (Askarian-Amiri et al., 2014; Shahryari et al., 2014). As it was mentioned above, SOX2OT regulated the expression of EZH2 (in PRC2); however, the exact mechanism of regulation of SOX2 expression by SOX2OT mediated either by regulating PRC2 or other molecular mechanism remained largely questionable.

Several isoforms of Sox2ot which originated from alternative TSSs are associated with chromatin modifications characteristic of well-known promoters in HCEs. These isoforms have tissue or cell type specific signature, and are differentially regulated (Kimura et al., 2006; Denoeud et al., 2007; Amaral et al., 2009). This event is more prominent in SOX2DOT isoform which has a specific tissue expression pattern restricted to the adult mouse brain. SOX2DOT also demonstrates different expression patterns during differentiation of ESCs and neurospheres. The existence of alternative splicing and alternative TSSs suggests that the different transcripts of Sox2ot might have differential regulation and function (Amaral et al., 2009).

Moreover, according to the sequences registered for SOX2OT in EST database of NCBI, it is deduced that SOX2OT could have more than three splicing variants with a unique tissue or cell type specific expression signature. Moreover, the isoform of SOX2DOT indicates a more complex splicing pattern for SOX2OT. Altogether, the overlapped expression of SOX2OT with SOX2, and the conserved association between them in different developmental systems of vertebrates, and also in human cancer and stem cells all support the existence of a complex functional regulatory relationship. The latter could be a consequence of having similar regulatory elements that regulate the expression of both Sox2ot and Sox2 (Amaral et al., 2009; Askarian-Amiri et al., 2014; Shahryari et al., 2014).

Several conserved genomic regions upstream of *SOX2OT* and *SOX2DOT* serve as the binding sites for key transcription factors responsible for controlling the pluripotency as well as tumorigenesis processes. This observation along with the observed correlations between the expression of SOX2OT variants with that of key genes promoting those events, all suggested a key role for SOX2OT in pluripotency and tumorigenesis.

In this review we have provided insights into structural characteristics, epigenetic modifications, and splicing patterns of *SOX2OT* gene. Furthermore, the expression patterns of its variants and their emerging roles in stem cell biology and tumorigenesis is discussed. It is clear that SOX2OT has a positive regulatory effect on SOX2 expression; however, the exact molecular mechanism remains to be elucidated. Specifying SOX2OT-dependent molecular pathways in organ tissue culture or engineered animal models may identify more common pathways between development, pluripotency and tumorigenesis.

In conclusion, current evidences support the idea that the lncRNA SOX2OT is a key regulatory molecule in mediating pluripotency and tumorigenesis events, probably through regulation of SOX2 expression. The positive effect of SOX2OT upon SOX2 expression also supports a role for it in promoting generation of iPSCs. SOX2OT has a potential to be employed as a novel prognostic indicator/therapeutic target of several human cancers including breast, lung and esophagus cancers.

# **References**


# **Acknowledgments**

This work was supported by a research grant from the Iranian Council of Stem Cell Technology, and Deputy of Research and Technology of Golestan University of Medical Sciences.


cancerous or nonneoplastic intraductal components. *Pancreas* 32, 164–170. doi: 10.1097/01.mpa.0000202947.80117.a0


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Shahryari, Saghaeian Jazi, Samaei and Mowla. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Exosomal lncRNA-p21 levels may help to distinguish prostate cancer from benign disease**

*Mustafa Is¸ ın <sup>1</sup> , Ege Uysaler <sup>1</sup> , Emre Özgür <sup>1</sup> , Hikmet Köseoglu˘ 2 , Öner S¸ anlı <sup>3</sup> , Ömer B. Yücel <sup>2</sup> , Ugur Gezer ˘ 1 and Nejat Dalay <sup>1</sup> \**

*<sup>1</sup> Department of Basic Oncology, Oncology Institute, Istanbul University, Istanbul, Turkey, <sup>2</sup> Department of Urology, School of Medicine, Istanbul Hospital, Bas¸ kent University, Istanbul, Turkey, <sup>3</sup> Department of Urology, Istanbul Medical Faculty, Istanbul University, Istanbul, Turkey*

#### *Edited by:*

*Mohammad Faghihi, University of Miami Miller School of Medicine, USA*

#### *Reviewed by:*

*Georges St. Laurent, St. Laurent Institute, USA Gregory C. Sartor, University of Miami Miller School of Medicine, USA Mohammadreza Hajjari, Shahid Chamran University of Ahvaz, Iran*

#### *\*Correspondence:*

*Nejat Dalay, Department of Basic Oncology, Oncology Institute, Istanbul University, Millet Cad., Istanbul 34093, Turkey ndalay@yahoo.com*

#### *Specialty section:*

*This article was submitted to RNA, a section of the journal Frontiers in Genetics*

*Received: 28 February 2015 Paper pending published:*

*24 March 2015 Accepted: 14 April 2015 Published: 06 May 2015*

#### *Citation:*

*Is¸ ın M, Uysaler E, Özgür E, Köseoglu H, ˘ S¸ anlı Ö, Yücel ÖB, Gezer U and Dalay N (2015) Exosomal lncRNA-p21 levels may help to distinguish prostate cancer from benign disease. Front. Genet. 6:168. doi: 10.3389/fgene.2015.00168* Exosomes are membranous vesicles containing various biomolecules including lncRNAs which are involved in cellular communication and are secreted from many cells including cancer cells. In our study, investigated the exosomal GAS5 and lincRNA-p21 lncRNA levels in urine samples from 30 patients with prostate cancer (PCa) and 49 patients with benign prostatic hyperplasia. Quantification of lncRNA molecules was performed by real-time PCR. We observed a significant difference in the exosomal lincRNA-p21 levels between PCa and BPH patients whereas the GAS5 levels did not reveal a difference. Our data suggest that the discriminative potential of exosomal lincRNA-p21 levels may help to improve the diagnostic prediction of the malignant state for patients with PCa.

**Keywords: prostate cancer, exosome, lncRNA, benign prostatic hyperplasia, non-invasive diagnosis**

# **Introduction**

Prostate cancer (PCa) is the second most common malignancy and accounts for 15% of the cancers in men. It is the fifth leading cause of death and almost 70% of the cases occur in the developed countries (Ferlay et al., 2015). The diagnosis of PCa is performed via histopathological evaluation of biopsy samples which has several known disadvantages like bleeding and infection (Loeb et al., 2013). Although use of the prostate specific antigen (PSA) as a diagnostic marker has improved the detection and management of PCa (Lippi et al., 2009) its low specificity and lack of other predictive parameters for the progression of the disease makes the stratification of the patients with high risk or indolent PCa difficult (Roddam et al., 2005; Nogueira et al., 2010).

Recently, a prostate specific lncRNA, the PCa Gene 3 (PCA3), has been approved as an additional test to determine the need for biopsies in PCa. Unfortunately, negative PCA3 results in indolent cancer carriers and high grade prostatic intraepithelial neoplasia (HGPIN; Morote et al., 2010; Auprich et al., 2011a) renders the biomarker insufficient. The level of TMPRSS2-ERG fusion transcripts, in cells collected from urine after digital examination, have also been evaluated together with the PCA3 levels for a more accurate diagnosis (Leyten et al., 2014). However, none of these approaches can satisfactorily distinguish high-risk from indolent cancer. Therefore identification of new biomarkers to correctly identify patients needing more aggressive treatment would help to prevent individuals with localized tumors from getting unnecessary biopsies and from the side effects of overtreatment.

Exosomes are small membranous vesicles originating from the endosomal compartment which function as messengers in intercellular communication (Simpson et al., 2008; Bang and Thum, 2012; Gezer et al., 2014). They are secreted and released by cells and bind to the receptors on recipient cells, thereby transferring the signal. Exosome secretion is an evolutionary conserved cellular mechanism dating back to Archaea but their role as cellular messengers has been described only recently (Taylor et al., 2006; Deatherage and Cookson, 2012). Exosomes contain proteins and various types of RNA molecules including lncRNAs. Non-exonic transcripts as a whole constitute the majority of non-ribosomal RNA molecules in the cell and lincRNAs constitute a significant portion of this fraction (Kapranov et al., 2010). Exosomes secreted from prostate can be detected in semen (Ronquist and Brody, 1985) and urine (Pisitkun et al., 2004). In recent years it has been shown that PCA3 and several other microRNAs are present in exosomes isolated from PCa patients (Dijkstra et al., 2014; Huang et al., 2015).

LincRNA-p21 and GAS5 lncRNA act as tumor suppressor molecules in the cellular machinery (Schneider et al., 1988; Huarte et al., 2010). Expression of lincRNA-p21 is stimulated by the p53 tumor suppressor protein and upon transcription it suppresses expression of the genes transcriptionally regulated by p53 by binding to the hnRNP-K complex (Huarte et al., 2010). GAS5 plays a role in the induction of apoptosis. It suppresses several antiapoptotic genes by binding to the glucocorticoid receptor (GR) and hence prevents GR from binding to the glucocorticoid response elements on the target DNA molecule (Kino et al., 2010). Recent data indicate that lncRNA molecules may exhibit tissue- and disease-specific expression which can provide important potential biomarkers specific to the particular cancer types (Yaman Agaoglu et al., 2011; Gezer et al., 2014). However, it should be noted that lncRNA molecules are associated both with cancer and pluripotency which can be a confounding factor (St. Laurent et al., 2013). In this study, we aimed to evaluate the diagnostic utility of exosomal lincRNA-p21 and GAS5 levels in individuals with benign prostatic hyperplasia (BPH) and PCa.

## **Materials and Methods**

#### **Exosomal RNA Isolation**

Thirty patients with PCa (median age 64 *±* 6) and 49 patients with BPH (median age 68 *±* 9) were enrolled in the study. The study was approved by the Ethics Committee of the Istanbul Faculty of Medicine and informed consent was obtained from the participants. All patients were Caucasian and the disease scores of the patients are given in **Table 1**. Urine samples from patients were collected after digital rectal examination and were centrifuged for 10 min at 1000 rpm in a conical tube to remove cell and debris. Supernatant was transferred to another conical tube and centrifuged at 2000 rpm for 10 min to remove remaining debris and bacteria. 10 ml of cell-free urine was then transferred to a new 15 ml tube and stored at *−*80°C until use. Exosomal RNA was extracted according to the manufacturer's protocol using the "Urine Exosome RNA Isolation Kit" (Norgen Biotek, Thorold, ON, Canada).

#### **Quantification of lncRNAs**

Exosomal RNA isolated from urine samples were used for cDNA synthesis using the First Strand cDNA Synthesis kit (Thermo Scientific, West Palm Beach, FL, USA) according to the manufacturers' instructions. The real-time amplification of lncRNA molecules was performed using the LightCycler 480 system **TABLE 1 | Disease scores of the patients.**


#### **TABLE 2 | The primer sequences used in the study.**


(Roche, Germany). SYBR Green (Roche) was used as the fluorescent molecule. The primer sequences are shown in **Table 2**.

The PCR reaction included an initial "hot start" for 10 min., followed by 45 cycles of amplification. Each cycle consisted of a denaturation step at 95°C for 10 s, annealing starting at 60°C for 20 s and decreasing by 2°C every two cycles down to 55°C, and amplification at 72°C for 30 s. For quantification of lncRNAs, the ∆∆Ct method was used. The GAPDH gene was used as the reference. All experiments were performed twice and the mean values were calculated.

#### **Statistical Analyses**

SPSS® ver.21.0 statistical program was used for statistical analysis. Kruskal–Wallis or Mann–Whitney U tests were used when appropriate to compare the parameters (lncRNA and PSA levels). A *p*-value *≤* 0.05 was considered statistically significant.

### **Results**

Exosomal levels of GAS5 and lncRNA-p21 lncRNAs were evaluated in the urine samples of 49 patients diagnosed with BPH and 30 patients with PCa. The distribution of exosomal lncRNA levels are shown in **Figure 1**. The lincRNA-p21 levels were significantly higher in PCa than in BPH (median; 0.163 vs 0.071; *p* = 0.016, AUC: 0.663). Exosomal GAS5 levels were found to be similar in the two disease groups (median; 1.197 vs 1.235 and *p* = 0.127). The data are shown in **Table 3**.

The PSA levels were higher in the PCa group than in the patients with BPH (*p <* 0.001, median values; 7.7 and 2.16 respectively). In the BPH group no correlation was observed between the

**TABLE 3 | Median values of exosomal lncRNAs and their statistical significances.**


IPSS (International Prostate Symptom Score) score and GAS5 or lincRNA-p21 levels. Likewise, no correlation between the clinical stage (Gleason score) and exosomal lncRNA levels was observed in the PCa group. There was no correlation between the PSA levels and the IPSS score in patients with BPH but a correlation was observed between the PSA values and the Gleason score in the PCa group (*p* = 0.123 and 0.049, respectively).

The sensitivity and specificity of lincRNA-p21 and lincRNAp21 in combination with PSA were calculated using a cut off value of 2.5 ng/ml for PSA and 0.181 for exosomal lincRNA-p21 expression (**Figure 2**). The specificity for predicting PCa increased from 63 to 94% when the two parameters were combined while the specificity did not change (**Table 4**).

# **Discussion**

Our study is the first report, revealing presence of lincRNAp21 and GAS5 lncRNA molecules in the exosomes derived from urine samples. Circulating GAS5 and lincRNA-p21 have been previously detected in B-cell malignancies (Isin et al., 2014) but exosomal GAS5 and lincRNA-p21 molecules have previously been only reported in exosomes secreted from HeLa and MCF-7 cell lines but not in human tumors (Gezer et al., 2014). Cellular lincRNA-p21 expression has been suggested to affect global gene expression in different cancers by modulating mRNA translation and supressing the p53 and Wnt/β-catenin signaling pathways (Dimitrova et al., 2014; Wang et al., 2014). Cellular GAS5

**TABLE 4 | Sensitivity and specificity of exosomal lncRNA-p21 and of the lincRNA-p21/PSA combination.**


expression suggest a role of GAS5 in the regulation of apoptosis in breast cancer cell lines and tumors (Pickard and Williams, 2014) and an inverse association with the mTOR expression in PCa cell lines (Yacqub-Usman et al., 2015). In the present study, we observed significantly higher levels of exosomal lincRNA-p21 levels in the patients with PCa.

It has been shown that one in six of the prostatectomy specimens may contain indolent cancers which usually do not progress to clinical detection during the lifetime of the patient (Epstein et al., 1994). On the other hand, several studies suggest that more than 50% of the cancers which are initially diagnosed as localized tumors are actually advanced at the time of treatment (Chun et al., 2008; Jeldres et al., 2008).

The PSA test is not sensitive enough to predict the presence, extent and risk of recurrence of PCa (Albertsen, 2010; Zeliadt et al., 2010; Friedrich, 2011). Therefore, there is a definite need for non-invasive diagnostic biomarkers which can distinguish the low- and high-risk patients in the clinical decision-making. Even though non-invasive detection of PCA3 levels in urine after digital rectal examination may provide some useful information on the need of a repeated biopsy, evaluation of PCA3 also fails to be specific (Bradley et al., 2013). Dijkstra et al. (2014) have shown that the diagnostic performance of PCA3 in exosomes was found to be more successful than in urine where the PCA3 score was normalized with PSA mRNA in order to achieve a higher performance. Although their study group is quite small the authors suggested an advantage of exosomal PCA3 evaluation while indicating that their data should be validated in larger clinical cohorts. The discriminative capacity achieved by the exosomal PCA3 in this study (AUC: 0.524) is lower than lincRNA-p21 (AUC: 0.663, CI: 95%) in the present study. These data indicate that analysis of exosomal lincRNA-p21 in urine performs better than PCA3 in detecting PCa.

Recently, there is an active controversy over decreasing the cutoff level of PCA3 from 35 to 25 (Nakanishi et al., 2008; Auprich et al., 2011b; de la Taille et al., 2011; Ploussard et al., 2011) which is expected to increase its diagnostic sensitivity. A meta-analysis by Luo et al. (2014)reported that the sensitivity of PCA3 for detecting PCa ranges from 46.9 to 82.3% and specificity from 55 to 92%. In our study the specificity of lincRNA-p21 for PCa was 94% when combined with PSA.

Expression of TMPRSS2-ERG has also been analyzed in urine samples (Leyten et al., 2014). It has been reported that combination of PCA3 and TMPRSS2-ERG expression increased the sensitivity of detecting PCa. However, the biomarker pair still failed to detect indolent tumors with a Gleason score of *≥*7.

# **References**


A new study investigating exosomal microRNA molecules derived from plasma samples in 29 castration resistant PCa (CRPCa) patients reported two significant miRNA molecules (miR-1290 and miR-375) which were later evaluated in 100 patients with CRPCa, as a prognostic marker significantly associated with poor prognosis, needing prospective validation (Huang et al., 2015).

In absence of reliable markers for detection and classification of PCa urinary exosomal lncRNAs may can provide an alternative, and non-invasive source of biomarkers. The stability and longevity of the RNA molecules is ideal for non-invasive diagnosis and characterization of the tumors. Our study for the first time demonstrates that detection of exosomal lncRNAs in urine may act as suitable biomarkers with potential utility of therapeutic implications. LincRNA-p21 provides a promising marker with therapeutic potential for the detection and stratification of PCa. Further studies with larger patient groups are needed to validate the therapeutic utility of exosomal lincRNA-p21 levels in urine.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Is¸ın, Uysaler, Özgür, Köseoglu, S¸anlı, Yücel, Gezer and Dalay. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*