Triplex DNA-binding proteins are associated with clinical outcomes revealed by proteomic measurements in patients with colorectal cancer
© Nelson et al.; licensee BioMed Central Ltd. 2012
Received: 28 October 2011
Accepted: 26 March 2012
Published: 8 June 2012
Tri- and tetra-nucleotide repeats in mammalian genomes can induce formation of alternative non-B DNA structures such as triplexes and guanine (G)-quadruplexes. These structures can induce mutagenesis, chromosomal translocations and genomic instability. We wanted to determine if proteins that bind triplex DNA structures are quantitatively or qualitatively different between colorectal tumor and adjacent normal tissue and if this binding activity correlates with patient clinical characteristics.
Extracts from 63 human colorectal tumor and adjacent normal tissues were examined by gel shifts (EMSA) for triplex DNA-binding proteins, which were correlated with clinicopathological tumor characteristics using the Mann-Whitney U, Spearman’s rho, Kaplan-Meier and Mantel-Cox log-rank tests. Biotinylated triplex DNA and streptavidin agarose affinity binding were used to purify triplex-binding proteins in RKO cells. Western blotting and reverse-phase protein array were used to measure protein expression in tissue extracts.
Increased triplex DNA-binding activity in tumor extracts correlated significantly with lymphatic disease, metastasis, and reduced overall survival. We identified three multifunctional splicing factors with biotinylated triplex DNA affinity: U2AF65 in cytoplasmic extracts, and PSF and p54nrb in nuclear extracts. Super-shift EMSA with anti-U2AF65 antibodies produced a shifted band of the major EMSA H3 complex, identifying U2AF65 as the protein present in the major EMSA band. U2AF65 expression correlated significantly with EMSA H3 values in all extracts and was higher in extracts from Stage III/IV vs. Stage I/II colon tumors (p = 0.024). EMSA H3 values and U2AF65 expression also correlated significantly with GSK3 beta, beta-catenin, and NF- B p65 expression, whereas p54nrb and PSF expression correlated with c-Myc, cyclin D1, and CDK4. EMSA values and expression of all three splicing factors correlated with ErbB1, mTOR, PTEN, and Stat5. Western blots confirmed that full-length and truncated beta-catenin expression correlated with U2AF65 expression in tumor extracts.
Increased triplex DNA-binding activity in vitro correlates with lymph node disease, metastasis, and reduced overall survival in colorectal cancer, and increased U2AF65 expression is associated with total and truncated beta-catenin expression in high-stage colorectal tumors.
DNA and RNA are dynamic molecules that adopt several different secondary and tertiary structures. DNA can form a stable triple helix in which a purine- or pyrimidine-rich third strand forms sequence-specific H-bonds (Hoogsteen and reverse-Hoogsteen) with a purine-rich strand in the major groove of the Watson-Crick duplex in polypyrimidine-polypurine repeat sequences . Guanine (G)-rich DNA and RNA can also form G-quadruplexes that also use Hoogsteen and reverse Hoogsteen G*G bonds in a non-canonical four-stranded topology. G-quadruplexes specifically have been implicated at DNA telomere ends, the purine-rich DNA strands of oncogenic promoters, and in RNA 5’-untranslated regions (UTR) near translation start sites . For example, a nuclease-sensitive element in the human c-MYC promoter that can form either a DNA triplex or G-quadruplex interferes with DNA transcription . Transient Hoogsteen base pairs have been detected in DNA duplexes bound to transcription factors and in damaged DNA, suggesting that the DNA double helix can resonate and form excited-state Hoogsteen base pairs that can expand its structural complexity .
Genomic instability in association with carcinogenesis is well established and promotes multiple hallmarks of cancer . Repetitive DNA, such as tri- and tetranucleotide sequences, is genetically unstable, and expansions of such DNA repeats are associated with numerous hereditary neurological diseases including Fragile X syndrome, myotonic dystrophy, and Friedreich’s ataxia [6, 7]. Many of these DNA repeat sequences can exist in at least two different conformations, and at least 10 non-B DNA conformations can form, perhaps transiently, at specific sequences due to negative supercoiling generated by DNA replication, transcription, protein binding, or during DNA repair . Non-B DNA structures such as cruciforms, triplexes and G-quadruplexes can cause mutations such as deletions, expansions, and translocations [9, 10]. Bacolla et al. found that genes containing long polypyrimidine-polypurine sequences are more susceptible to chromosomal translocations than genes that do not contain these sequences . Researchers have located “hotspot” regions of the genome at or near sequences with the potential to form non-B DNA structures, including the region in the promoter of the human c-MYC gene capable of forming triplex or G-quadruplex DNA that overlaps with one of the major breakpoint hotspots in c-MYC-induced lymphomas and leukemias [12, 13]. The recently created Non-B Database ( http://nonb.abcc.ncifcrf.gov) can be used to predict the capability of a DNA sequence in mammalian genomes to form any of a variety of non-B structures .
While the existence of triplex or G-quadruplex nucleic acids in vivo has yet to achieve mainstream acceptance, eukaryotic proteins that recognize and bind to these alternative structures do exist. For example, the Fragile X mental retardation protein (FMRP) binds an intramolecular G-quartet in target mRNAs, and loss of function of this protein causes the Fragile X mental retardation syndrome . We have studied proteins in Saccharomyces cerevisiae and HeLa carcinoma cells that bind specifically to a purine-motif triplex DNA probe in gel shifts (EMSA) where the third strand is G-rich and photo-crosslinked with a psoralen group (Ps~) [16–18]. Stm1, the major purine-motif triplex DNA-binding protein in S. cerevisiae, also binds to G-quartet DNA and RNA in vitro. Using Southwestern blotting where HeLa nuclear extracts were separated by SDS-PAGE, blotted and probed with the same radio-labeled purine triplex DNA used in EMSA, we found that 100-, 60-, and 15-kDa bands were hybridized with the triplex DNA probe, whereas only the 100-kDa band was also hybridized with the parent duplex DNA probe . RecQ-family helicases, including the WRN helicase, have been shown to preferentially bind to and unwind aberrant DNA structures such as triplex and G-quadruplex DNAs, which are believed to exist in vivo as intermediates in DNA replication, recombination, and repair. The WRN helicase is deficient in patients with Werner syndrome, an autosomal recessive disease causing premature aging that is associated with numerous age-related phenotypes, including a high predisposition to cancer . Others have examined specific aspects of WRN expression in colorectal cancer, such as the presence of allelic variants and colorectal cancer risk and WRN promoter methylation as it correlates with a CpG island methylation phenotype (CIMP)-high diagnosis [21, 22]. These studies led us to question whether triplex DNA-binding proteins and WRN helicase expression are quantitatively and/or qualitatively different in human colorectal tumors and corresponding normal tissues, if there is any correlation with clinical prognosis, and identify purine-motif triplex DNA-binding proteins in human cells.
Numerous genetic, cytogenetic, and epigenetic aberrations act at specific stages in colorectal cancer initiation and progression and influence response to therapy, such as inactivation of tumor suppressor APC as an initiating event and KRAS or BRAF mutations as markers of non-response to EGFR-targeted therapy . High-throughput studies have suggested the existence of additional undiscovered cancer genes that may promote colorectal cancer development [24–26]. Colorectal cancer is also one of the more genetically unstable cancers, with about 65% of sporadic adenomas and cancers being characterized by chromosomal instability (CIN), 10-15% characterized by microsatellite instability (MSI), and approximately 20% having a CIMP phenotype, with some overlap among these characteristics.
We have found higher triplex DNA-binding activity in vitro in colorectal tumor extracts than in corresponding normal tissue extracts using EMSA, and that this increased binding activity correlated significantly with the spread of cancer to the lymph nodes, metastasis, and reduced overall survival. We also found that expression of the triplex/G-quadruplex-unwinding helicase WRN correlated significantly with total triplex DNA-binding activity in EMSAs in both normal and tumor tissue extracts. Biotin purine-motif triplex DNA affinity identified three multifunctional splicing factors: U2AF65, PSF, and p54nrb, and an anti-U2AF65 antibody produced a super-shifted EMSA band. High U2AF65 expression was associated with advanced colon tumor stages and with p54nrb and PSF expression in tumors. U2AF65 expression also correlated significantly with both total and truncated beta-catenin, as well as NF- B p65, PCNA, EGFR, mTOR, PTEN, and Stat5 in colorectal tumors.
Materials and methods
Preparation of cytoplasmic and nuclear extracts of tissue and cell lines. Tissue samples of tumor and adjacent normal mucosa were collected after surgical resections after informed consent, verification by a pathologist, and snap-frozen in liquid nitrogen. The patients had not previously received any chemotherapy, therefore the tissues are chemotherapy naïve. Frozen tissue samples were prepared as described by Asangani et al. . The samples were pulverized with a Sartorius Mikrodismembrator, then extracted for 30 min on ice with Schaffner lysis buffer A (10 mM HEPES-Na + pH 7.9, 10 mM KCl, 0.1 mM EDTA pH 8.0, 0.1 mM EGTA pH 8, 1 mM dithiothreitol, 0.5% Triton X-100, Sigma phosphatase inhibitor cocktail 2, and Roche Complete Mini protease inhibitor) and centrifuged at 13,000 rpm, 4°C in a microcentrifuge to produce cytoplasmic extracts. The nuclear pellet was extracted for 30 min on ice with Schaffner buffer C (20 mM HEPES-Na + pH 7.9, 0.4 M NaCl, 0.1 mM EDTA pH 8.0, 0.1 mM EGTA pH 8.0, 1 mM dithiothreitol, 20% glycerol, with phosphatase and protease inhibitors) and centrifuged at 13,000 rpm, 4°C in a microcentrifuge to produce nuclear extracts . Total protein concentrations were determined using the Pierce BCA Protein Assay kit. Colorectal cancer cell lines and HeLa cytoplasmic or nuclear extracts were similarly prepared using Schaffner buffers A and C, respectively.
Purine-motif triplex DNA formation and 33P-labeling
Purine triplex DNA oligonucleotide sequences and probe formation were as previously described [16, 17]. The parent duplex oligonucleotides are PuGA: 5’ – AATTCCTAAGGGAGGGGAGGGGAGGGTAGCT – 3’ and complementary strand PuCT: 5’ – AGCTACCCTCCCCTCCCCTCCCTTAGG – 3’. The parent duplex DNA was made by annealing equimolar (0.1 mM) concentrations of the PuGA and PuCT oligonucleotides at room temperature after boiling for 2 min in 40 mM Tris-HCl pH 8.0, 10 mM MgCl2, 0.01% NP-40. The purine-motif triplex-forming oligonucleotide (TFO) contained a 4’-(hydroxymethyl)-4,5’,8-trimethylpsoralen-hexyl (Ps~) moiety at the 5’-terminus (Eurogentec): 5’ – Ps ~ GGG TGG GGT GGG GTG GGT -3’. To form triplex DNA, the parent duplex DNA and a 10-fold molar excess of TFO were incubated for 4 h at 30°C in 40 mM Tris HCl pH 8.0, 100 mM MgCl2, 0.01% NP-40. Psoralenated TFO was then cross-inked with the parent DNA duplex with a 366 nm UV transilluminator for 10 min on ice. Purine triplex DNA (1 x 10-7 M) was 3’ end-labeled with T4 kinase (New England Biolabs) and γ-33P dATP for 1 h at 37°C. Unincorporated labeling dATP was removed from the reaction by centrifuging the reaction mixture with an equal volume of 10 mM Tris-HCl pH 8.0, 10 mM MgCl2, 0.05% Triton X-100 through a G25 Microspin column (GE Healthcare).
Electrophoretic mobility shift assay (EMSA) and super-shift EMSA
Gel shifts were also done as previously described [16, 17]. In this study 5 μg total protein from tissue extracts or 1.5 μg HeLa or colorectal cancer cell line cytoplasmic or nuclear extracts were mixed with 1 nM 33P-labeled purine triplex DNA and 2 μg poly (dIdC) carrier DNA in binding buffer (25 mM HEPES-Na + pH 7.9, 50 mM KCl, 10% glycerol, 0.5 mM dithiothreitol, 2 mM MgCl2) for 30 min at room temperature. Protein-triplex DNA probe complexes were resolved by nondenaturing PAGE at 7 V/cm for 90 min through a 5% acrylamide/0.25% bisacrylamide gel containing 22 mM Tris borate, 0.5 mM EDTA, and 5% glycerol. Protein-probe complexes were visualized using autoradiography and quantitated with a Storm 840 PhosphorImager (Molecular Dynamics). Major EMSA H3 bands from each tissue sample were normalized by dividing by the H3 band value of HeLa nuclear extract present in each gel. For super-shift EMSA, protein extracts were incubated in the same binding buffer with purine triplex DNA probe for 30 min at room temperature, then 400 ng of anti-U2AF65 MC3 antibody or mouse IgG antibody as a negative control (Santa Cruz) were added to the reaction and incubated for 1 h at room temperature. PAGE gels were run as for regular EMSA with the addition of a circulating cooling water bath to the gel apparatus.
The Wilcoxon Sign Rank Test was used to compare the level of the major EMSA H3 complex and WRN expression in total, cytoplasmic, and nuclear extracts of colorectal tumors and corresponding normal tissues. The Mann-Whitney U test was used with SPSS version 13.0 to compare quantitative variables in two independent groups. Spearman correlations among continuous variables were computed. Chi square (Bonferroni-corrected) were used for grouped/dichotomized variables. Survival was estimated using Kaplan-Meier analysis, and differences were calculated using Mantel-Cox log-rank statistics; primary endpoints were tumor-related death (disease-specific survival), death (overall survival), and tumor recurrence (recurrence-free survival, R0-patients only). The following variables were dichotomized according to the median value: protein levels in nuclear and total extracts (cytoplasm and nucleus) ratios (tumor/normal) as high levels in tumor (values above the median) vs. low levels in tumor (values below the median) as compared with normal tissue, involved lymph nodes as pN0 vs. pN1-3, distant metastasis as M0 vs. M1, surgical curability as curative vs. non-curative resection (R0 vs. R1/2).
Purification of triplex DNA-binding proteins using biotin/streptavidin affinity
Biotinylated purine triplex DNA was formed using a 3’ biotinylated PuCT oligonucleotide (Eurogentec): 5’ – AGCTACCCTCCCCTCCCCTCCCTTAGGAATTTT-biotin-3’ annealed to the PuGA complementary strand, then annealed and crosslinked with the Ps ~ TFO as described above. Purification of DNA-binding proteins using biotin/streptavidin affinity systems, as described in Current Protocols in Molecular Biology , was performed in separate 2 ml reactions containing either 800 μg RKO colorectal cancer cell nuclear extract or 1085 μg RKO cytoplasmic extract, EMSA binding buffer (25 mM HEPES-Na + pH 7.9, 50 mM KCl, 10% glycerol, 0.5 mM dithiothreitol, 2 mM MgCl2), 600 μg poly (dIdC), 1 nM biotinylated purine triplex DNA, and 150 μl pretreated streptavidin agarose (Fluka) while rotating for 2 hr at room temperature. Streptavidin agarose was gently pelleted and washed three times with binding buffer. Laemmli buffer was added directly to the agarose pellet and boiled for 5 min to elute bound protein(s). Proteins were separated using 10% SDS-PAGE and stained with Coomassie blue. Two bands (100 and 60 kDa) from the nuclear extract reaction and one band (65 kDa) from the cytoplasmic extract reaction were excised from the gel and submitted to the German Cancer Research Center (DKFZ) Functional Proteome Analysis laboratory for sequencing and analysis using nano-HPLC ESI-MS-MS and identified using MASCOT database searches.
Western blot analysis was performed using standard procedures as described in Current Protocols in Molecular Biology . 25 μg total protein from tissue or cell line cytoplasmic or nuclear extract was separated by 10% SDS-PAGE, then electro-transferred to nitrocellulose membranes in 25 mM Tris, 190 mM glycine with 20% methanol. After blocking in 5% milk in Tris-buffered saline with 0.2% Tween-20 (TBST) for 1 hr at room temperature, membranes were incubated with antibodies against WRN (H-300 Santa Cruz sc-5629, 1:500), U2AF65 (MC3 Santa Cruz sc-53942, 1:2000), PSF (39-1 Santa Cruz sc-101137, 1:2000), p54nrb (H-85 Santa Cruz sc-67016, 1:2000) in 5% milk-TBST for 1 hr at room temperature, or beta-catenin (L87A12 Cell Signaling CS-2698, 1:1000) or actin (Sigma A2066, 1:1000) in 5% milk in TBST overnight at 4°C. Blots were washed with TBST, incubated with the appropriate HRP-conjugated secondary antibody at 1:4500, and detected by enhanced chemiluminescence (Pierce, Thermo Scientific) and autoradiography. Protein bands were quantitated by densitometry using NIH Image J software and normalized to actin.
Reverse phase protein array (RPPA)
RPPA was performed as described by Mannsperger et al. . 2.7 ng cytoplasm or 2.8 ng nuclear protein extract per spot was printed with a non-contact spotter onto nitrocellulose slides (Oncyte Avid, Grace Bio-labs, Bend OR) using an Aushon 2470 Microarrayer (Billerica, MA). Slides were mounted in a customized incubation chamber (Metecon, Mannheim Germany), blocked for 1 hr at room temperature with 50% (v/v) Odyssey blocking buffer in PBS and individually stained with 37 validated primary antibodies at 1:300 in blocking buffer at 4°C overnight and Alexa 680-labeled secondary antibodies (Invitrogen) at 1:8000 in PBS with 0.05% Tween for 1 hr at room temperature. Slides were scanned with the Licor Odyssey system and spot intensities were calculated with GenePix Pro 5.0 microarray analysis software (Molecular Devices). To estimate the total protein concentration per spot, a slide from each run was stained with Fast Green FCF (Sigma-Aldrich) as described by Loebke et al. . Data analysis was done using R with the RPPanalyzer package from CRAN ( http://cran.r-project.org, ). For each antibody the logged mean of the raw foreground pixel intensities of a single spot was subtracted by the corresponding logged Fast Green FCF signal to normalize for the total protein per spot.
Colorectal tumors have higher triplex DNA-binding activity than corresponding normal tissue
Patient clinical characteristics
Absolute (n = 63)
Lymph Node Status
Increased triplex DNA-binding activity in colorectal tumors correlates with lymph node disease, metastasis, and overall survival
Correlation of the ratio of tumor (T) to normal (N) (T/N) EMSA H3 values for each patient with clinical features: test statistics (a) by presence of disease in lymph nodes (N-Stage) and (b) by presence of metastasis in distant organs (distant metastasis)
(a) Grouping Variable: presence of disease in lymph nodes (N Stage) dichotomized
Asymp. Sig (2-tailed)
(b) Grouping Variable: presence of metastasis in distant organs (distant metastasis)
Asymp. Sig (2-tailed)
Identification of U2AF65 as the protein present in the EMSA H3 complex
PSF (polypyrimidine tract binding-associated splicing factor, or SFPQ) [NCBI Protein AAH04534]
P54nrb (nuclear RNA-binding protein) or NonO [NCBI Protein NP_031389]
U2AF65 (U2 small nuclear RNA auxiliary factor 2 isoform b) [NCBI Protein NP_001012496]
PSF and p54nrb are known to function as RNA polymerase II-associated splicing factors, bind as heterodimers, and are implicated in the regulation of expression of the Myc family of oncoproteins, COX2, etc. They also bind to and stimulate topoisomerase I and promote homologous DNA pairing and the incorporation of a single-stranded oligonucleotide into homologous superhelical double-stranded DNA D-loop formation [33, 34]. U2AF65, identified from cytoplasmic extracts, is also an RNA polymerase II-associated splicing factor that can associate with mRNAs that include a predominance of transcription factors and cell cycle regulators, and shuttle continuously between the nucleus and cytoplasm [35, 36].
U2AF65 expression correlates with EMSA H3 values and p54nrb and PSF expression in tumor tissues and with a higher tumor stage
(a) Spearman correlation p values of EMSA H3 values with expression of triplex DNA-binding proteins (3BP) and (b) correlations of U2AF65 expression to PSF and p54nrb expression in normal and tumor tissue extracts
3BP expression correlated
With EMSA H3
Correlation to U2AF65
Expression of the WRN helicase correlates with EMSA H3 binding activity
We wanted to test the hypothesis that proteins that bind to or stabilize triplexes and G-quadruplexes can act in a yin-yang fashion (in complementary opposition) with proteins such as helicases that unwind or destabilize these structures, and that expression and/or function of these binding and unwinding proteins may be imbalanced in tumors that could contribute to genomic instability. We tested 51 patient colorectal tumor and normal tissue extracts for expression of the RecQ-family helicase WRN because it is known to act preferentially on aberrant structures such as triplexes and G-quadruplexes and to promote genomic integrity . We used the Wilcoxon sign rank test to determine if WRN is differentially expressed in normal and tumor tissue extracts and Spearman’s rho to correlate WRN helicase expression in normal and tumor tissue extracts with EMSA H3 data. We detected no significant differences in normalized WRN expression between normal and tumor extracts or according to tumor stage (mean cytoplasmic expression in tumor tissue = 0.424, in normal tissue = 0.283; mean nuclear expression in tumor tissue = 0.275, in normal tissue = 0.196; total expression mean in tumor tissue = 0.679, in normal tissue = 0.465). However, we did observe that total WRN expression correlated significantly with total EMSA H3 binding values in both normal tissue (rho 0.296, p = 0.03) and tumor extracts (rho 0.460, p < 0.001).
Reverse-phase protein array and western blot analysis of tissue extracts show a correlation of U2AF65 expression with total and truncated beta-catenin expression
Spearman correlations of EMSA H3 values and triplex DNA-binding protein expression to other proteins by reverse phase protein array (RPPA)
p 38 α
P13K p110 α
The data provides support to the hypothesis that the major triplex DNA-binding protein in human cells is more abundant and has higher binding activity in vitro in extracts from colorectal cancer tissues compared to adjacent normal tissues. This increased binding activity correlated significantly with the expression of triplex/G-quadruplex DNA-unwinding helicase WRN, and with the spread of cancer to the lymph nodes, metastasis, and reduced overall survival. The major triplex DNA-binding protein in gel shifts was identified as the U2AF65 splicing factor. U2AF65 expression was higher in more advanced colon tumor stages and correlated significantly with total and truncated beta-catenin expression.
U2AF is a non-small nuclear ribonucleoprotein (snRNP) splicing factor required for the binding of U2 snRNP to the pre-mRNA branch site [41, 42]. Purified U2AF is comprised of two polypeptides of 65- (U2AF65) and 35-kDa (U2AF35), respectively. U2AF65 binds to the polypyrimidine (Py) tract adjacent to the 3’ splice site using RNA-recognition motifs and cross-links to the branch point in an ATP-independent manner at the earliest stage of spliceosome formation . Both subunits of U2AF are essential for the viability of many model organisms, such as zebra fish, Drosophila, C. elegans, and S. pombe. Both U2AF65 and U2AF35 shuttle continuously between the nucleus and cytoplasm by a mechanism that involves carrier receptors and is independent from binding to mRNA. It has also been suggested that U2AF participates in the nuclear export of mRNA .
U2AF65 binds to single-stranded RNA and recognizes a wide variety of pyrimidine (Py)-tracts. The Py-tracts of higher eukaryotic pre-mRNAs are often interrupted with purines, yet U2AF65 must identify these degenerate Py-tracts for accurate pre-mRNA splicing. Based on in vitro studies, investigators have proposed that U2AF35 assists U2AF65 recruitment to nonconsensus polypyrimidine tracts. Pacheco et al. analyzed the roles of the two U2AF subunits in vivo in the selection of alternative 3' splice sites associated with polypyrimidine tracts of different strengths. Their results revealed a feedback mechanism by which RNA interference-mediated depletion of U2AF65 triggers down regulation of U2AF35 expression. They also showed that knockdown of each U2AF subunit inhibits weak 3' splice site recognition, while over-expression of U2AF65 alone is sufficient to activate selection of this splice site [46, 47]. It would be interesting to examine if over-expression of U2AF65 alone in the context of cancer activates splicing of weak or nonconsensus polypyrimidine tracts that could tip the balance of splicing regulation in a subset of cellular transcripts which could promote tumorigenesis.
The proteins we identified in RKO nuclear extracts using biotin triplex DNA affinity were PSF, a 100-kDa protein that also binds to the polypyrimidine tract, and its heterodimeric binding partner p54nrb. We speculate that the 100- and 60-kDa proteins identified in previous studies using Southwestern blotting with HeLa nuclear extracts  probed with the same purine triplex DNA probe used in this study are indeed PSF and p54nrb, but this has yet to be tested. Both PSF and p54nrb bind to double-stranded (ds)DNA, single-stranded (ss)DNA, and RNA, and contain DNA- and RNA-binding domains. PSF participates in constitutive pre-mRNA splicing and is a component of later spliceosomal B and C complexes (when U2AF65 is no longer present). PSF and p54nrb also bind and function in nuclear retention of defective RNAs and are involved in transcriptional regulation and the DNA damage response [48–51]. Interestingly, PSF also functions in DNA annealing, where PSF requires ssDNA and dsDNA with sequence homology for their in vitro pairing activity as well as divalent cations. PSF can promote the incorporation of ssDNA within the two separated strands of a homologous superhelical DNA duplex and produce a three-stranded D-loop structure, which is required for homologous recombination. Other splicing factors SF2/ASF and U2AF65 also caused DNA annealing but could not form D loops . PSF and p54nrb, as well as GRSF-1, YB-1, and polypyrimidine tract-binding protein (PTB) also bind to the MYC family of internal ribosome entry sites (IRES) and positively regulate translation of the Myc family of oncoproteins in vitro and in vivo. Protein array data in this study showed that expression of both PSF and p54nrb in colorectal tissue extracts correlated significantly with c-Myc expression levels, which is consistent with a role for PSF and p54nrb in the regulation of c-Myc protein expression.
Researchers identified both U2AF and PSF, as well as hnRNP C and PTB, as RNA-binding proteins that bind to two regions 3’ of the (CUG)n repeat expansion in the 3’-UTR of the DMPK gene, where expansion of this trinucleotide repeat causes the neuromuscular disorder myotonic dystrophy . Their study explored RNA-binding proteins interacting with non-CUG regions or higher order structures in the DMPK 3’-UTR that may be involved in RNA-mediated pathogenesis. Their finding that both U2AF and PSF can bind near this triplet repeat sequence with the potential to form higher order structures such as triplexes is consistent with our data on biotin triplex DNA affinity identification of both U2AF65 and PSF. Another group identified an RNA/protein complex in both Drosophila and 293 cells that consisted of expanded CAG RNA, U2AF65, and the NXF1 nuclear export receptor, providing further evidence that in other models, U2AF65 interacts with these triplet repeat sequences . We believe that the purine triplex DNA EMSA probe can be a surrogate multiplex nucleic acid structure that acts as a “bait and hook” to capture proteins that may be binding D-loops, R-loops, triplexes, G-quadruplexes, or other multi-stranded structures containing Hoogsteen or reverse Hoogsteen base pairs in vivo.
PTB also binds to polypyrimidine tracts in pre-mRNAs, and numerous studies have shown that PTB competes with U2AF65 for binding to these sequences [56–61]. Since PSF is a PTB-associated protein, binding competition between PSF and U2AF65 may be possible as well, which may explain why we identified both PSF with the biotinylated triplex DNA in RKO nuclear extracts and U2AF65 in RKO cytoplasmic extracts. Gama-Carvalho and colleagues performed immunoprecipitation of U2AF65- and PTB-associated RNAs from HeLa cells followed by microarray analysis to determine which mRNAs are associated with these two splicing factors that can compete for binding to polypyrimidine tracts . Among U2AF65-associated mRNAs was a predominance of transcription factors and cell cycle regulators, whereas PTB-associated transcripts were enriched in mRNAs that encode proteins implicated in intracellular transport, vesicle trafficking, and apoptosis.
Related to cancer, researchers found that 2 of 14 patients with malignant mesothelioma, a pulmonary malignancy, had antibodies against U2AF65 using the SEREX technique (serologic identification by recombinant expression cloning) . Additionally, a patient with liver cirrhosis that progressed to hepatocellular carcinoma had antinuclear antibodies that recognized a nuclear protein putatively identified as U2AF65 . Other splicing factors, most notably SFRS1 (ASF/SF2), are reported to be over-expressed in colon, thyroid, kidney, lung and breast cancer cells . Other splicing factors shown to be over-expressed in colorectal cancer cells are hnRNP-F and –K, SPF45, and SRPK1 . However, the present report is the first to describe correlation of increased expression or binding activity of U2AF65 in primary colorectal tumors with tumor stage, lymph node disease, metastasis and reduced overall survival.
Why U2AF65 is over-expressed in colorectal tumor cells, and whether this over-expression is important to the development and/or progression of colorectal cancer or a passive effect of general gene deregulation are unknown. About 75% of sporadic colorectal cancers are characterized by a chromosomal instability (CIN) phenotype. The most common reported chromosomal losses involve 5q (APC), 18q (DCC), and 17p (p53), while the most common gains involve 8q and 20q. The gene encoding U2AF65 (U2AF2) is located at c19q13.42. Chromosomal amplifications at c19q13.42 have been found in a rare embryonal tumor using array CGH and FISH [65, 66]. Other groups have reported amplifications or aberrations at c19q13 in colorectal tumors, particularly in liver metastases compared to primary tumors , and in other solid tumors including pancreatic  and ovarian .
Regarding genomic instability, Vasquez and colleagues recently showed that both non-B DNA sequences and WRN helicase deficiency induce mutations characterized by single base changes, mostly at C-G base pairs, in an additive but not synergistic manner . Because no synergy was observed, the authors concluded that a role for WRN in reducing mutation frequencies via a mechanism dependent on its cellular helicase activity (for example, of non-B DNA sequences) is unlikely. Their data do not directly support our present hypothesis, which is similar to their hypothesis that if one function of the WRN helicase were to resolve non-B (triplex and Z-DNA) structures, as observed in vitro, then mutation frequencies may be higher in WRN-deficient cells than in WRN-wild type cells because both the number and stability of such structures would be greater in WRN-deficient cells. However, they did verify that purified WRN protein was able to unwind the third purine-rich strand of a synthetic triplex in vitro. Although our data suggest a correlation between expression of the WRN helicase with triplex DNA-binding activity in both normal and tumor tissue extracts, defining a functional role and mechanism of non-B DNA unwinding activity by WRN helicase and G*G multiplex binding (for example, by U2AF65) will require further study.
Beta-catenin, as a transcription factor complexed with TCF4, is known to upregulate expression of many relevant proteins in colorectal cancer, such as c-myc, cyclin D1, LEF-1, CD44, and c-jun. Whether beta-catenin influences the expression of U2AF65 is unknown, but a search of transcription factor binding sites in the U2AF65 (U2AF2) gene promoter did not indicate any beta-catenin or TCF family transcription factor sites among the 55 high-scoring (>85%) sites we identified (Cold Spring Harbor Laboratory Mammalian Promoter Database http://rulai.cshl.edu/CSHLmpd2/; Transcription Factor Search http://www.cbrc.jp/research/db/TFSEARCH.html). Similarly, mining through microarray expression studies revealed no reports describing U2AF65 (U2AF2) as a beta-catenin, TCF4, or Wnt target gene (NCBI GEO; R Nusse Wnt/Beta catenin targets list: http://www.stanford.edu/~rnusse/pathways/targets.html). The biological significance of the correlation of U2AF65 and beta-catenin expression in colorectal tumor tissues, such as if beta-catenin as a transcription factor affects U2AF65 expression, or if U2AF65 as a splicing factor affects the splicing or expression of beta-catenin, remains to be determined.
Several studies have examined the interaction of beta-catenin with splicing factors and the role of beta-catenin in mRNA splicing. Researchers identified alternative splicing of SLC39A14, a divalent cation transporter, in colorectal tumors and found it to be regulated by the Wnt pathway, probably through regulation of splicing factor SRSF1 . The beta-catenin/TCF4 pathway also modifies alternative splicing through modulation of expression of splicing factors SRp20  and SF1  and direct interaction with FUS/TLS (translocated in liposarcoma) and various other RNA-binding proteins, including p54nrb . Others have shown that beta-catenin regulates multiple steps of RNA metabolism in colon cancer cells and may coordinate RNA metabolism .
Authors have also reported identification of truncated beta-catenin isoforms, mostly in colorectal cancer cells. In primary colorectal tumors, a relatively small percent (7 of 58 examined) contained somatic interstitial deletions that included all or part of exon 3 of the beta-catenin gene, and RT-PCR analysis from 3 of the 7 tumors detected transcripts that lacked exon 3 and the presence of the normal transcript . Researchers also detected two novel beta-catenin mRNA splice variants in the SW480 colon cancer cell line and in primary colorectal tumors . A truncated beta-catenin protein of 80-kDa was also detected in three colorectal metastases to the liver . Several of these isoforms have truncations in the NH2-terminus of the protein that produce deletions of key serine and threonines that are phosphorylated by GSK-3 beta, which is important for proteosomal degradation, which was hypothesized to stabilize the protein and have a dominant oncogenic effect . Data from this and other studies lead us to speculate that U2AF65 could be binding to a multi-stranded nucleic acid structure such as R-loops, D-loops, or G-quartet mRNA in vivo that is mimicked by the purine triplex DNA probe in our study, and that overexpression or increased EMSA binding activity of U2AF65 in tumor tissues could cause deregulation of mRNA splicing and protein isoform expression, such as beta-catenin, that could contribute to colorectal cancer initiation and/or progression.
We found that increased triplex DNA-binding activity in colorectal tumor extracts in vitro is associated with WRN helicase expression, increased total beta-catenin expression, lymph node disease, metastasis, and reduced overall survival in patients with colorectal cancer. Multifunctional splicing factor U2AF65 was identified as the major triplex-binding protein in human tissues and cell lines. Increased expression of U2AF65 is also associated with expression of splicing factors PSF and p54nrb, a higher tumor stage, and increased truncation of beta-catenin in colorectal tumors. We believe that our results contribute to and generate interest in the growing fields of alternative non-B DNA structures and genomic instability, aberrantly regulated splicing factors, mRNA splicing and protein isoforms related to cancer both as basic research objectives regarding the etiology of cancer and cancer diversity and as novel translational research in the search for promising prognostic, diagnostic and targeting tools.
We thank Mohammed Abba, Irfan Asangani, Nitin Patil, Christian Schmidt, and Frederick Wenz for insightful discussions and assistance. We also thank Donald Norwood for critical reading of the manuscript.
LDN is supported by a Guest Scientist Scholarship from the German Cancer Research Center (DKFZ) and the National Cancer Institute (NCI; 1RO1CA149501-01A1). CB is supported by the network SB-Cancer in the Helmholtz Alliance on Systems Biology, Heidelberg Germany. HM is supported by the German Federal Ministry of Education and Science in the framework of the program for medical genome research (01GS0890 and 01GS0864). DB is supported by the Department of Anesthesiology and Intensive Care Medicine, Medical Faculty Mannheim, University of Heidelberg, Mannheim Germany.
PK is supported by Experimental Surgery, Medical Faculty Mannheim, University of Heidelberg, Mannheim Germany. GM is supported by the Stiftung fur Krebs- und Scharlachforschung Mannheim, University of Heidelberg, Mannheim Germany and Wilhelm Sander Foundation, Munich, Germany. UK is supported by the German Federal Ministry of Education and Science in the framework of the program for medical genome research (01GS0890 and 01GS0864), Heidelberg. DPH is supported by the National Cancer Institute (NCI; 1RO1CA149501-01A1).
MWVD is supported by the North Carolina Biotechnology Center. HA is supported by Alfried Krupp von Bohlen and Halbach Foundation (Award for Young Full Professors), Essen, Hella-Bühler-Foundation, Heidelberg, Dr. Ingrid zu Solms Foundation, Frankfurt/Main, the Hector Foundation, Weinheim, Germany, the FRONTIER Excellence Initiative of the University of Heidelberg, the BMBF, Bonn, Germany, the Walter Schulz Foundation, Munich, Germany, the German-Israeli Project Cooperation, DKFZ Heidelberg, and the Deutsche Krebshilfe, Germany. Wilhelm Sander Foundation, Munich, Germany, and the DKFZ-MOST German Israel Program.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.