Evaluation of genome-wide chromatin library of Stat5 binding sites in human breast cancer

Background There is considerable interest in identifying target genes and chromatin binding sites for transcription factors in a genome-wide manner. Such information may become useful in diagnosis and treatment of disease, drug target identification, and for prognostication. In cancer diagnosis, patterns of transcription factor binding to specific regulatory chromatin elements are expected to complement and enhance current diagnostic predictions of tumor behavior based on protein and mRNA analyses. Signal transducer and activator of transcription-5 (Stat5) is a cytokine-activated transcription factor implicated in growth and progression of many malignancies, including hematopoietic, prostate, and breast cancer. We have explored immunoaffinity purification of Stat5-bound chromatin from breast cancer cells to identify Stat5 target sites in an unbiased, genome-wide manner. Results In this report, we evaluate the efficacy of a Stat5-bound chromatin library to identify valid Stat5 chromatin binding sites within the oncogenome of T-47D human breast cancer cells. A general problem with cloning of immunocaptured, transcription factor-bound chromatin fragments is contamination with non-specific chromatin. However, using an optimized strategy, five out of ten randomly selected clones could be experimentally verified to bind Stat5 both in vitro and in vivo as tested by electrophoretic mobility shift assay and chromatin immunoprecipitation, respectively. While there was no binding to fragments lacking a Stat5 consensus binding sequence, presence of a Stat5 binding sequence did not assure binding. Conclusion A chromatin library coupled with experimental validation may productively identify novel in vivo Stat5 chromatin binding sites in cancer, including abnormal regulatory sites in tumor-specific neochromatin.


Background
Transcription factors function uniquely at the interface of the genome and the proteome. A significant portion of transcription factors serve not only as executors of gene transcription programs, but also as biochemical sensors of extracellular stimuli. For instance, members of the nuclear receptor family are directly activated by lipophilic extra-cellular ligands, and transcription factors of the Smad and Stat families are activated by phosphorylation in response to cytokine stimulation of cell surface receptors. Chromatin-bound transcription factors that act both as sensors of extracellular cues and as transcriptional effectors carry exceptional instructive value about the biological state of individual cells. Their high biological information value makes such factors particularly attractive for use as markers to predict disease activity and outcome, as well as predictive markers of disease-responsiveness to drugs.
Based on related and broader rationale, the second phase of the human genome project, ENCODE (ENCyclopedia Of DNA Elements), has been initiated with the ambitious goal of identifying all regulatory elements of the human genome, including chromatin binding sites for individual transcription factors [1]. In fact, several transcription factors are prognostic biomarkers in cancer, including estrogen and progesterone receptors [2] and signal transducers and activators of transcription (Stats) [3,4]. However, as a result of tumor-specific alterations in chromatin accessibility and structure, individual transcription factors may regulate distinct gene sets in different tumor specimens. Specifically, genes that are actively regulated vary as a result of chromatin structure, DNA methylation, histone modifications, and the presence of additional cofactors. Tumor-specific patterns of transcription factor binding to target chromatin are expected to enhance diagnostic information beyond what is achieved through protein and mRNA analyses. Such added diagnostic information may directly improve disease prognostication and prediction of tumor responsiveness to therapy.
Our laboratory is particularly interested in the role of transcription factor Stat5 in human breast cancer, which is associated with favorable prognosis, especially in early stage malignancy [3]. Stat5 belongs to the Stat transcription factor family, which represents latent cytoplasmic transcription factors that are activated by phosphorylation of a conserved tyrosine residue in response to extracellular cytokines and hormones, such as prolactin, growth hormone, erythropoietin, and several interleukins. Basal activation of Stat5 has been shown in healthy breast epithelial cells [5] and in many early stage breast cancers, but is gradually lost during metastatic progression [3]. Furthermore, active Stat5 correlated with higher histological differentiation and reduced mitotic rate [6]. Additional evidence suggests that Stat5 may actively inhibit metastatic progression by promoting homotypic adhesion and inhibiting tumor cell scattering [7].
Based on technical progress with immunocapture of transcription factor-bound chromatin fragments, genomewide mapping of interaction sites may be achieved either by hybridization of captured DNA to linear microarrays of genomic DNA, or by cloning and sequencing of captured chromatin fragments. Microarray-based hybridization has been successfully used in the small yeast genome [8], but high cost and technical hurdles remain for human genome-wide DNA arrays at sufficient nucleotide resolution. Early efforts have focused on medium resolution arrays of restricted portions of the genome, such as the small chromosome 22 [9], arrays of classical promoter regions immediately upstream of transcriptional start sites [10], and arrays that contain CpG island clones [11]. In contrast, generation of a genome-wide library of transcription factor-bound chromatin fragments, a chromatin library, represents an inclusive and unbiased approach to the entire human genome [12]. Chromatin libraries also hold the potential to identify transcription factor binding to abnormal, tumor-specific neochromatin arising from genomic instability. However, progress has been hampered by a high degree of non-specific capture of irrelevant chromatin fragments and lack of methods for effective validation of captured sequences.
The purpose of the work described here is to optimize parameters to generate and validate a chromatin library for genome-wide identification of Stat5 target chromatin in human breast cancer. We identify novel Stat5 binding sites from a genome-wide chromatin library and validate the sites by prolactin-inducible Stat5 binding by electrophoretic mobility shift assay (EMSA) and chromatin immunoprecipitation (ChIP).

Results and Discussion
In contrast to transcription factors that bind to chromatin in a constitutive manner, Stat5 is a latent cytoplasmic transcription factor that is activated by tyrosine phosphorylation and binds tightly to DNA in response to extracellular cytokines, such as prolactin [13]. We used the welldifferentiated, estrogen receptor positive T-47D human breast cancer cell line, which maintains robust prolactininduced Stat5 activation [5,14], to generate a library of Stat5-bound chromatin fragments.
Sonication is necessary to shear genomic DNA into fragments that can be easily manipulated for PCR amplification, cloning, and sequencing. Fragments of approximately 400 base pairs (bp) allow for a complete sequencing read-through and are of sufficient size to localize the fragment within the human genome with a high degree of statistical certainty. Optimal shearing of chromatin from formaldehyde-fixed T-47D cells into approximately 400 bp fragments was established empirically ( Figure 1A, lane 5). This target size of chromatin fragments was confirmed by agarose gel electrophoresis of two parallel sets of sonicates of cells treated with or without prolactin for 30 min ( Figure 1B), prior to immunocapture of Stat5-bound fragments. An antibody that recognizes the highly homologous Stat5a and Stat5b isoforms [13] was used to capture Stat5-bound chromatin as detailed in Methods. Before subcloning of the captured chromatin fragments, the specificity of immunocapture was verified by analysis of binding to known human Stat5 target chromatin. In particular, we took advantage of earlier work that has identified a group of Stat5 regulated Specificity of immunocapture of Stat5-bound chromatin in T-47D human breast cancer cells genes that have been shown to contain the Stat5 consensus sequence, TTCNNNGAA, in the traditional promoter element [13,15]. Aliquots of the captured chromatin pool were amplified by PCR using oligonucleotide primers flanking Stat5 binding sites within the gene promoters of Cytokine-Inducible SH2 Protein (CISH), β-Casein, and α2-Macroglobulin. Due to the average chromatin fragment size of 400 bp, primers were designed to yield shorter PCR products of 200 -300 bp.
Stat5 was inducibly associated with the promoter of the CISH gene response element in T-47D cells ( Figure 1C). The capture was specific, since binding was only detected in prolactin-stimulated cells, and only when Stat5 antibody and not control IgG was used. PCR amplification of intact genomic DNA is shown as a control to verify the specificity of the PCR reaction, in addition to amplification of the pre-immunoprecipitation chromatin fragment pool (input DNA). Likewise, Stat5 was inducibly and specifically bound to the β-Casein gene promoter in T-47D cells ( Figure 1D, upper panel). In contrast, Stat5 did not associate with the Stat5-response element of the α2-Macroglobulin gene ( Figure 1D, lower panel), a gene reportedly responsive to Stat5 in liver [16], uterine stromal cells [17], and ovarian cells [18,19]. However, pretreatment of T-47D cells with the glucocorticoid hormone analog, dexamethasone, for four days prior to Stat5 activation made the α2-Macroglobulin promoter accessible to Stat5 binding ( Figure 1E). It has been well established that glucocorticoids play a vital role in many cell types and cell processes, including mammary differentiation. In fact, several genes have been shown to be regulated by cooperative Stat5-glucocorticoid receptor interactions [20][21][22][23]. In summary, based on Stat5 inducibility and antibody specificity in testing of known Stat5 chromatin interaction sites, we concluded that the conditions for effective immunocapture of Stat5-bound chromatin from T-47D cells were established.
The enriched, Stat5-bound chromatin pool was then cloned into a bacterial vector to generate a chromatin library. Because sonication generates random overhangs in double stranded DNA [24], T4 DNA polymerase was first used to blunt-end DNA fragments. Subsequently, a single 3' adenosine residue was added using Taq polymerase, and the resulting fragments were ligated into the pCR2.1 TA cloning vector. Transformed bacteria were plated on ampicillin-and S-gal-containing selection plates for blue/white screening. PCR amplification of inserts was performed directly on white bacterial colonies with common primers flanking the vector cloning site. A PCR reaction under standard conditions was used to lyse the bacteria and inactivate endogenous nucleases, cycled 36 times, and the products were separated by agarose gel electrophoresis. Initial analysis of 389 white colonies yielded 185 (48%) insert-containing PCR products. Figure  2 shows a display of PCR products from a run of 17 clones, in which three clones did not contain an insert and 14 clones had inserts of a median size of approximately 300 bp. PCR products containing an insert were purified and sequenced directly without plasmid minipreps in a cost-effective and time-saving manner. BLAST analysis was used to localize sequences to the human genome. Of 185 inserts, 31 (17%) sequences could be unambiguously matched to a location within the human genome. Sequences that could not be localized were either repetitive or did not produce a statistically significant homology to the published human genome.
To validate the quality of the chromatin library, ten clones were randomly selected and first tested for ability to bind activated Stat5 from nuclear extracts of T-47D cells in vitro by EMSA. Stat5 binds to the consensus sequence TT(N5)AA and to relatively conservative variations, with TTC(N3)GAA considered to be optimal [13,15]. Typically, Stat5-DNA interaction by EMSA is performed on synthetic oligonucleotides of 20-40 bp size [25]. To effectively determine whether Stat5 interacts directly with large immunocaptured chromatin fragments, we established conditions for rapid validation by EMSA on chromatin fragments up to 400 bp in length by reducing gel polyacrylamide concentration to 3% and using amplification and isotope labeling by PCR directly from the bacterial clones. Stat5 binding in vitro was detected in seven of the ten chromatin fragments, as evidenced by prolactin-inducible DNA-binding complexes that could be supershifted by a specific anti-Stat5 antibody, but not by non-specific IgG ( Figure 3A). Negative data are only presented for one cloned chromatin fragment (CCF #30) of the three nonbinding fragments (CCF # 21, # 25, and #30).
As a second and independent means to validate the quality of the Stat5 chromatin library, the same randomly selected fragments were analyzed for inducible Stat5 binding in vivo using the ChIP assay. Independent pools of immunocaptured chromatin fragments from T-47D cells were analyzed in which Stat5 was either inactivated by serum deprivation or activated by prolactin. Densitometry of the PCR products was used to verify at least a 2-fold increase in intensity between the (-) prolactin and (+) prolactin samples. In all cases there was no detectable product from the replicate samples that had been immunoprecipitated with a non-specific IgG antibody (data not shown), indicating a specific Stat5-mediated capture of genomic elements. All PCR amplifications were performed at least twice on at least two separate pools of immunoprecipitated genomic elements.
Of the seven chromatin fragments that were positive for in vitro Stat5 binding by EMSA, five (CCFs #5, #18, #23, #28, and #29) were also consistently positive for in vivo Stat5 binding by ChIP assay ( Figure 3B, upper panel). In addition, CCF #30 was positive by ChIP, but not by EMSA, possibly reflecting indirect binding via other proteins. Conversely, CCFs #11 and #14, which bound Stat5 in vitro by EMSA, were both negative by ChIP. Corresponding PCR products from the pre-IP DNA is shown ( Figure 3B, lower panel) and no product was detected in samples immunoprecipitated with non-specific IgG (data not shown). Due to sequence complexity, fragment-specific flanking primers could not be designed for fragments CCF #21 and #25, neither of which bound Stat5 in EMSA. The localization data and binding validation of the ten clones are summarized in Figure 4. Each of the CCFs that bound Stat5 by EMSA contained at least one broad consensus TT(N5)AA site as expected. Correspondingly, of the three EMSA-negative CCFs, #21 and #25 lacked consensus binding sites, while a single TT(N5)AA site was present in CCF #30. Furthermore, the five CCFs verified to bind both by EMSA and ChIP may be involved in transcriptional control of nearby genes (Figure 4), or alternatively, control transcription of small regulatory RNAs [26].

Conclusions
While the present work was being completed, an independent report also identified Stat5 binding sites using a genome-wide approach in mouse lymphoma cells [27], although direct binding by EMSA was not verified. We conclude that unbiased, genome-wide strategies can now be used to identify novel Stat5 binding sites by cloning immunocaptured chromatin fragments. At the current efficacy, approximately 20% of cloned Stat5-immunocaptured fragments from T-47D breast cancer cells could be localized within the normal human genome, and Stat5 binding in vitro and in vivo was confirmed in approximately half of those. Further reductions in capture of nonspecific chromatin, combined with refinements in cloning procedures [28], reduced sequencing cost, direct and high-throughput sequencing from bacterial colonies, and direct labeling of PCR products for EMSA testing, will allow streamlining of the procedure for genome-wide identification of Stat5-chromatin interaction sites. Ongoing efforts are also exploring whether some of the clones with only weak homology to the normal human genome represent Stat5-bound neochromatin unique to the cancer cells as a result of genomic instability.

Chromatin Immunoprecipitation
After grown to confluence (~2 × 10 7 cells/T175 cm 2 flask), T-47D cells were serum starved for 24 h and then treated with or without 10 nM hPRL for 30 min. For prodifferentiation experiments with glucocorticoid pretreatment, confluent cultures were maintained in serum-free RPMI-1640 with 1 µM Dex dissolved in DMSO or in DMSO alone for 96 hours, then stimulated with or without PRL for 30 min. Proteins were then crosslinked to the chromatin by the addition of formaldehyde (Fisher Scientific, Fairlawn, NJ) to a final concentration of 1% and incubated for 30 min at 37°C. The cells were rinsed, scraped, and pelleted in ice-cold PBS with 1 mM PMSF, 2 µg/ml aprotinin, and 2 µg/ml pepstatin A. Cell pellets were Genomic localization of cloned Stat5-chromatin interaction sites from T-47D human breast cancer cells  resuspended in 400 µl lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-HCl, pH 8.0, 1 mM PMSF, 2 µg/ml aprotinin, and 2 µg/ml pepstatin A) and the lysates were sonicated using a Sonic Dismembrator (Fisher Scientific, Pittsburgh, PA), fitted with a tapered microtip set to an amplitude of 50% and were pulsed twice for 30 s. Debris was pelleted by centrifugation for 10 min at 13,200 RPM at 4°C. The supernatants were then diluted 10-fold in IP buffer (0.1% SDS, 1.1% Triton-X, 1.2 mM EDTA, 16.7 mM Tris-HCl, pH 8.1, 16.7 mM NaCl, 1 mM PMSF, 2 µg/ ml aprotinin, and 2 µg/ml pepstatin A) and 1% was set aside for Pre-IP or input sample. The solution was then pre-cleared with beads (50% protein A-sepharose, 1 mg/ ml poly dI-dC, 0.1% BSA, in TE, pH 7.4) for 30 min at 4°C. Next, the samples were incubated overnight with Stat5 antibody (N-20, Santa Cruz Biotechnology) or IgG IP-control followed by immunoprecipitation with preincubated beads. The samples (beads) were then washed once with each of these buffers in the following order: Wash Buffer 1 (0.1% SDS, 1% Triton-X, 2 mM EDTA, 20 mM Tris-HCl, pH 8.0, 150 mM NaCl), Wash Buffer 2 (0.1% SDS, 1% Triton-X, 2 mM EDTA, 20 mM Tris-HCl, pH 8.0, 500 mM NaCl), Wash Buffer 3 (0.25 M LiCl, 1% NP-40, 1% Na-deoxycholate, 10 mM Tris-HCl, pH 8.0, 1 mM EDTA), then twice with Wash Buffer 4 (TE, pH 8.0). The samples were eluted from the beads in 1% SDS and 0.1 M NaHCO 3 . Next, the cross-links were reversed with 0.2 M NaCl overnight at 65°C followed by protein digestion with proteinase K for 1 hour at 45°C. The DNA was recovered by phenol:chloroform extraction and ethanol precipitation. This pool was then used as PCR template or could be blunted with T4 DNA polymerase (New England Biolabs, Beverly, MA) at 37°C for 5 minutes under standard conditions for uniform cloning. Taq DNA polymerase (New England Biolabs) was then used to add a 3' A overhang for cloning into a TOPO TA cloning kit (Invitrogen, Carlsbad, CA). The ligated vectors were transformed into electrocompetent bacteria, DH-5αE (Invitrogen), using the manufacturer's recommendations for transformation in a Bio-Rad Gene-Pulser (Hercules, CA), and plated on S-Gal/Ampicillin plates (Sigma Chemical Co., St. Louis, MO). Positive clones by blue-white screening were then analyzed directly by colony PCR in a standard reaction with vector-specific M13 and T7 primers that flank the cloning site and were incubated at 94°C for 4 min and cycled at 94°C for 30 s, 55°C for 45 s, and 72°C for 30 s. Colonies with quantifiable inserts were then purified (QiaQuick PCR purification kit, Qiagen, Valencia, CA), sequenced, and localized using BLAST (NCBI, NIH, Bethesda, MD).

Preparation of cellular extracts for EMSA
After reaching confluence, T-47D cells were infected with adenovirus containing wild type Stat5 at a Multiciplity Of Infection (MOI) = 6.67, as described previously [29]. Parallel samples of T-47D cells were not exposed to adenovirus as a mock infection (standard control). After infection, the cells were cultured in serum-free medium for 24 hours, then stimulated for 30 min with 10 nM hPRL, the culture medium was removed, and the cells were collected as described above. Nuclear lysates were collected as described previously [5,25].

Generation of radiolabeled DNA probes
Radiolabeled products were generated by PCR in a 10 µl reaction under standard conditions with the addition of 0.25 µl α 32 P dATP (10 mCi/ml; Amersham-Pharmacia, Piscataway, NJ). Initially the samples were incubated at 94°C for 1 min, then cycled 36 times at 94°C for 30 s, 55°C for 30 s, and 72°C for 30 s. The reaction was then held at 72°C for 5 min following the cycling to allow for product fill-in and addition of a 3' terminal "A". After completion of the cycling, the PCR products were purified using the Qiagen PCR purification kit, according to manufacturer's instructions. The final products were eluted in 50 µl and stored at -20°C until use.

DNA-protein binding reaction
The DNA-protein binding reactions were performed in a 10 µl mixture containing 3 µl of nuclear extract from the respective sample, and 1 µg of double-stranded poly dI:dC (Boehringer Mannheim, Indianapolis, IN), as previously described [25]. After 1 h on ice, samples (with 1 ng specific anti-Stat5 antibody (Santa Cruz Biotechnology), or 1 ng non-specific, purified IgG (Sigma), or no antibody) were incubated with 2.0 µl 32 P-labeled PCR probe and incubated for 20 min at room temperature. The samples were then resolved in a 3% non-denaturing polyacrylamide gel, as previously described [25].

ChIP in vivo validation
As an independent validation technique, we designed primers specific to each cloned chromatin fragment (CCF) then performed PCR on a separately-generated enriched pool of immunoprecipitated Stat5-chromatin interaction sites. The following primers were used for PCR: CCF #5 forward 5' TGA CAT CAG TGA GAG TGG AGG 3', reverse http://www.molecular-cancer.com/content/4/1/6