In this work we investigated gene expression profiles from clinical samples of pancreatic cancer using a custom cDNA microarray enriched in probes that interrogate long potentially noncoding RNAs mapping to intronic and intergenic regions of the human genome, plus a collection of protein-coding genes previously associated with cancer in the literature. By comparing expression profiles of 38 pancreatic clinical samples with four distinct tissue histologies (primary adenocarcinoma, adjacent non-tumor tissue, chronic pancreatitis, metastasis), we detected in all types of pancreatic tissues studied a proportion of intronic and intergenic transcripts comparable to the one observed for protein-coding mRNAs. There are several reports of aberrant expression of microRNAs [21–24], but this is to our knowledge the first time that the expression of lncRNAs has been studied in pancreatic cancer.
We observed that most intronic and intergenic transcripts expressed in pancreatic tissues have little or no coding potential (96% of total). Comparison with sequence contigs resulting from the assembly of EST/mRNA data produced in our group  showed that these transcripts have a mean size of at least 779 nt, being longer than the EST probes deposited in the microarrays, which represent indeed only parts of longer noncoding RNAs transcribed from intronic regions. Most putative intergenic transcripts (~81%) were located more than 1 kb apart from an UTR of an annotated gene, suggesting that for the most part, these are indeed intergenic transcripts rather than uncharacterized untranslated regions of incomplete mRNAs.
While it is clear that lncRNAs may exert diverse cellular functions through multiple molecular mechanisms [12, 56, 57], it has been suggested that a fraction of the transcriptome noncoding complement may correspond to transcriptional noise resulting from RNA polymerase activity in regions of open chromatin or intronic segments of processed mRNAs . Our expression measurements of intronic lncRNAs do not permit to distinguish between i) intron lariats resulting from splicing of a pre-mRNA or ii) independent transcriptional units located within intron-annotated genomic regions. We have focused on poly(A+)-selected RNA fractions followed by oligo-dT primed reverse transcription to minimize the chance of labeling targets from non-polyadenylated spliced lariats. We argue that the identification of subsets of transcripts that map to intronic regions and whose steady-state levels allows the detection by microarrays indicate that these are not rapidly turned-over intron lariats. We have also performed a series of analysis to obtain additional evidence to support the notion that intronic/intergenic lncRNAs detected in pancreatic tissues are indeed bona fide cellular transcripts, as discussed below.
We first sought independent confirmation of intronic/intergenic lncRNA expression using RNAseq data generated from 9 distinct tissue libraries . We found that approximately 80% of intronic/intergenic lncRNAs detected in pancreatic tissues were also detected in at least one RNAseq library (Figure 1). Most transcripts confirmed by the RNAseq data were detected i) only in a single tissue type other than pancreas, or ii) in all 9 tissue libraries plus pancreas, indicating the prevalence of subsets of noncoding transcripts with broad or specific tissue-type expression patterns, respectively (Figure 1).
While only a fraction of the intronic/intergenic lncRNAs expressed in pancreatic tissues overlapped evolutionarily conserved DNA elements in vertebrates, mammals and primates, we observed a significant enrichment (p < 0.05) compared to randomly selected control regions. This result suggests that at least a fraction of these lncRNAs are under purifying selection in the vertebrate lineage and therefore must be biologically functional. For the remaining transcripts, absence of sequence conservation should not be taken as evidence of no biological relevance, since it is known that well-characterized functional lncRNAs are poorly conserved across their global sequence .
As proposed by Washielt et al. , mapping conserved RNA secondary structure may lead to the discovery of novel functional lncRNAs. We found that a small fraction of lncRNAs expressed in pancreas (15%. i.e. 49/335) are predicted to form stable structural domains that could be important for their processing or biological function. It is well documented in the literature that small regulatory RNAs can be generated by processing of long RNA precursors transcribed from intronic and intergenic regions of the genome . To ask what fraction of our set of lncRNAs expressed in pancreatic tissues could be precursor of small RNAs we compared their sequences to those of known microRNA and snoRNA [32, 33]. Only a discrete overlap was found, indicating that long intronic/intergenic transcripts are predominantly not precursors of known microRNAs/snoRNAs, yet leaving open the possibility that these transcripts may represent precursors of uncharacterized novel small RNAs.
We found significant enrichment of H3K4me3, a promoter-associated chromatin mark frequently found in RNA Pol II transcribed regions [35, 61], in the vicinity (up to 2 kb) of intronic (p < 0.05) noncoding transcripts as compared to randomly selected genomic DNA sequences. A comparable H3K4me3 enrichment was observed nearby known protein-coding transcripts, suggesting that transcription of protein-coding mRNAs and intronic lncRNAs initiates at promoter regions with similar chromatin contexts. We also observed a significant enrichment of CAGE tags proximal to known start sites of intronic lncRNAs expressed in pancreatic tissues, corroborating the notion that at least a fraction of these is independent transcriptional units. Since pancreatic tissues were absent from the study that generated the CAGE tags used for cross-reference, these results possibly underestimate the co-localization of intronic/intergenic lncRNAs with bona fide transcription start sites of capped transcripts.
Differently from protein-coding mRNAs, we did not find significant enrichment of CpG island in the vicinity of intronic and intergenic RNA sequences expressed in pancreatic tissues. Based on this observation, we propose that methylation of CpG islands is not involved in the transcriptional regulation of most intronic/intergenic lncRNAs expressed in pancreatic tissues. Nonetheless, the full set of observations regarding the structure, conservation and genomic context argues that at least a fraction of intronic/intergenic transcripts detected in pancreatic tissues are independent transcriptional units rather than transcriptional noise originated from random Pol II firing , prompting us to investigate in more detail their relative expression levels in tumor and non-tumor pancreatic tissues.
Differential expression of intronic lncRNAs in prostate and renal cancer has already been documented [17, 28]. Here we extend these observations to pancreatic cancer, asking whether there were sets of intronic/intergenic lncRNAs deregulated in clinical samples of pancreatic tumor. Comparing expression profiles from primary tumors with samples from histologically non-malignant pancreatic tissue and chronic pancreatitis (CP) we identified a 147-gene signature correlated with primary pancreatic tumor. This strategy was devised to favor the identification of tumor specific markers rather than transcripts associated with the stromal cell component, which is augmented in both tumor and CP samples [36, 37]. We sought to validate the pancreatic cancer expression signature by performing a meta-analysis with published gene expression studies of pancreatic cancer. Only 23% of the protein-coding mRNAs present in our pancreatic cancer signature were also identified in other reports. This modest overlap can be accounted for by differences in platforms and the heterogeneity of pancreatic tumor samples. Notwithstanding, we observed a high agreement (17/24, 71%) between the expression changes measured in our signature and those retrieved from published data, which provides independent support for our result and validates our sample set and methodological approach. This set included genes already reported in the literature as differentially expressed in pancreatic cancer and that have been investigated as biomarkers for pancreatic cancer (i.e. S100A6 , S100P , TIMP1  and NF-κB ). In agreement with previous findings , the analysis of gene enriched categories in the pancreatic cancer expression signature indicated the over-representation of genes involved in focal adhesion. Over-representation of focal adhesion genes in the pancreatic cancer signature is suggestive that deregulation of genes encoding proteins involved in the connection and signaling to the extracellular matrix plays an important role in the malignant transformation and/or maintenance of pancreatic adenocarcinomas. This set included integrin beta 5 (ITGB5), which we found to be upregulated in pancreatic adenocarcinoma. Itgb5 protein has been investigated as diagnostic biomarker in non-small cell lung cancer  and is target of the inhibitor drug EMD121974, which is under clinical trial . Thus, ITGB5 is an attractive candidate to be tested as biomarker and/or new drug target in pancreatic cancer.
Interestingly, a significant fraction (29%) of the 147-gene signature correlated with primary pancreatic tumor was comprised by lncRNAs mapping to intronic or intergenic regions, suggesting that noncoding RNAs could exert roles related to tumorigenesis of pancreatic cancer. This result prompted us to investigate the existence of subsets of lncRNAs with expression levels altered in metastatic samples.
We identified a statistically significant metastasis signature of 355 differentially expressed transcripts that includes 220 protein-coding genes, 134 intronic/intergenic transcripts and 6 known lncRNAs (Figure 4 and Additional file 5, Table S3). In addition to protein-coding genes previously shown to be deregulated in pancreatic metastasis (7 out of 19), the metastasis signature comprises known genes already associated to metastasis in other types of cancer (Additional file 7, Table S4), thus pointing to potentially interesting candidates for testing as new targets for treatment of the metastatic disease in pancreatic cancer.
The significant fraction of lncRNAs in the metastasis signature (38% of total) suggests that deregulation of these lncRNAs could also be associated with the metastatic process. Expression changes of protein-coding mRNAs from genes of the MAPK pathway has already been described in pancreatic carcinoma [67–69]. Here we found 9 intronic lncRNAs mapped to genes correlated to the MAPK pathway in the metastasis signature. We also identified expression changes in gene loci related to apoptosis, including 42 protein-coding mRNAs and 6 intronic lncRNAs; this pathway was one out of 12 described by Jones et al.  as genetically altered in pancreatic cancer. Four intronic lncRNAs belong to both categories. These results prompted us to document in more detail the nature of the 11 transcripts mapping to intronic regions of gene loci associated with the MAPK pathway or related to apoptosis, i.e., their relative orientation to the corresponding protein-coding mRNAs.
Strand-specific RT-PCR assays using RNA aliquots from tumor tissue samples showed that 4 intronic transcripts have antisense orientation relative to the protein-coding mRNA: PPP3CB, ATF2, TGFBR2 and MAPK1. Antisense transcripts originated in PPP3CB intronic regions were also detected in MIA PaCa-2 cells. The antisense orientation relative to the corresponding protein-coding mRNA provide strong evidence to support that these noncoding RNAs are produced from independent transcriptional units, possibly under control of a different promoter region.
Transcripts mapping to intronic regions with the same orientation of the corresponding protein-coding mRNA were detected in the ATF2, TGFBR2 and MAPK1, as well as in the 7 other gene loci tested (ARRB1, MAP3K1, MAP3K14, MAP2K5, PTEN, DAPK1 and RAPGF2), in both tissue and MIA PaCa-2 RNA samples. These sense-oriented intronic transcripts could indeed be bona fide RNAs originated from independent transcription, but also result from reverse transcription of unprocessed mRNA precursors or of stable RNA lariats generate during pre-mRNA splicing. Further experiments will be necessary to determine the precise nature of these sense-oriented intronic RNAs.
The relative abundance of two sense (DAPK1, MAP3K14) and one antisense-oriented (PPP3CB) intronic transcripts in samples of primary pancreatic adenocarcinoma and pancreatic metastases was independently accessed by qRT-PCR, confirming the results measured in the microarray hybridizations. Four additional intronic lncRNAs showed concordant results between qRT-PCR and the microarrays (ARRB, RAPGF2, ATF2 and PTEN). Expression changes of 4 intronic lncRNAs were not concordant between qRT-PCR and microarray (MAP3K1, TGFBR2, MAP2K5 and MAPK1). The amount of RNA and the number of patient tissue samples available for the qRT-PCR experiments were limiting, and the marginally significant and non-validated lncRNA candidates were tested only in few samples in an initial round of validation. It is possible that some of the intronic lncRNA candidates that failed the initial round of validation would still be validated as differentially expressed if tested in additional tissue samples. However, an alternative explanation for the non-validation of some candidates is the presence of array hybridization artifacts such as cross-hybridization or target amplification biases.
Intragenic lncRNAs have been shown to modulate in cis the expression of mRNAs expressed in the same locus [29, 70, 71]. We measured the relative abundance of mRNAs produced in the PPP3CB, DAPK1 and MAP3K1 4 loci in the same samples and did not observe statistically significant expression differences between primary tumors and metastasis. This result indicates that intronic RNAs produced in these loci do not affect in cis the abundance of the corresponding protein-coding transcripts. This conclusion is also supported by the absence of significant correlation between expression levels of protein-coding and noncoding RNAs originating from PPP3CB and DAPK1 loci. The possibility that intronic lncRNAs differentially expressed in metastatic samples may exert regulatory functions acting in trans is compelling and warrants further studies.
It has been shown that a significant portion of the noncoding component of the human transcriptome is comprised of non-polyadenylated RNAs . We note that our analysis was limited to the set of lncRNAs interrogated by the array platform (Table 1) and by the use of poly(A+)-enriched RNA, and therefore is not comprehensive in terms of describing the full complement of lncRNAs expressed in pancreatic tissues. Thus, additional studies using unbiased approaches such as RNAseq or tiling arrays will be required to catalog all poly(A+) and poly(A-) transcripts expressed in pancreatic tissues with distinct degrees of malignancy and for the identification of novel regulatory lncRNA candidates involved in the malignant transformation and tumor progression.