Skip to main content

Genome-wide cell-free DNA methylation analyses improve accuracy of non-invasive diagnostic imaging for early-stage breast cancer


Early detection is crucial to improve breast cancer (BC) patients’ outcomes and survival. Mammogram and ultrasound adopting the Breast Imaging Reporting and Data System (BI-RADS) categorization are widely used for BC early detection, while suffering high false-positive rate leading to unnecessary biopsy, especially in BI-RADS category-4 patients. Plasma cell-free DNA (cfDNA) carrying on DNA methylation information has emerged as a non-invasive approach for cancer detection. Here we present a prospective multi-center study with whole-genome bisulfite sequencing data to address the clinical utility of cfDNA methylation markers from 203 female patients with breast lesions suspected for malignancy. The cfDNA is enriched with hypo-methylated genomic regions. A practical computational framework was devised to excavate optimal cfDNA-rich DNA methylation markers, which significantly improved the early diagnosis of BI-RADS category-4 patients (AUC from 0.78–0.79 to 0.93–0.94). As a proof-of-concept study, we performed the first blood-based whole-genome DNA methylation study for detecting early-stage breast cancer from benign tumors at single-base resolution, which suggests that combining the liquid biopsy with the traditional diagnostic imaging can improve the current clinical practice, by reducing the false-positive rate and avoiding unnecessary harms.

Main text

Breast cancer (BC) is the most common cancer in women worldwide [1]. Mammogram and ultrasound are routinely administered to detect early BC in asymptomatic females, but prone to underestimation or over-diagnosis [1, 2]. The Breast Imaging Reporting and Data System (BI-RADS) categories, based on mammograms and ultrasonography, have been used to standardize the risk assessment for breast lesions [3]. However, the risk of malignancy for BI-RADS category 4 lesions varies from 3 to 94%, and this large statistical dispersion might lead to unnecessary biopsies according to current clinical guidelines [4].

Mutation-based circulating tumor DNA (ctDNA) analysis has been used for detecting early relapse, analyzing acquired resistance, and guiding adjuvant therapy in several cancers [5,6,7]. Moreover, combining liquid biopsy with diagnostic imaging has been recently demonstrated to have better performance [8]. However, the lack of multiple common mutations in BC has limited the sensitivity of mutation-based ctDNA detection [9]. On the other hand, DNA methylation-based markers have been effective for the early detection of many cancer types [10, 11]. However, most methylation-based studies were conducted to detect individuals with cancer among a population of non-conditional healthy controls using the locus-specific or CpG-rich genomic regions technologies [10, 11], methylated DNA immunoprecipitation sequencing (MeDIP-seq) [12], and reduced representation bisulfite sequencing (RRBS) [13]. Genome-wide DNA methylation characteristics of cfDNA between the malignant and benign tumors at single-base resolution remain largely unknown.

Study design

Herein, we recruited 210 consecutive female patients with BI-RADS category 4 breast lesions that were biopsied after mammography and ultrasonography examinations from the Cancer Hospital of the Chinese Academy of Medical Sciences and Peking Union Medical College (CHCAMS, n = 160, discovery cohort) and the Harbin Medical University Cancer Hospital (HMUCH, n = 50, validation cohort) from April 1, 2019, to August 31, 2019. This study was reviewed and approved by the ethics committee of each participating hospital. Each participant provided written informed consent. The diagnosis of each patient was based on the pathology results from resection specimens by the surgical biopsies or core needle biopsies. Twenty tumor samples (10 malignant and 10 benign) were collected for whole-genome bisulfite sequencing (WGBS) from patients that underwent biopsies at CHCAMS. A median of 3 mL of plasma was collected from all participants (n = 210) before surgery (Fig. 1). A diagnostic model using the identified cfDNA methylation markers alone or in combination with imaging findings improved the accuracy of early-stage BC detection (Fig. 2a). The cfDNA methylome of each participant was also measured by the WGBS. Because of the low amount of ctDNA in the total cfDNA, we devised a computational framework to boost the detection of cfDNA methylation markers, which consisted of identification of differentially methylated regions (DMRs) based on the tumor samples, cfDNA enrichment analysis, fragment size selection, fragment-based statistical inference for cfDNA malignant ratios, and a prediction model of early-stage BC (Fig. 2b; Supplementary methods).

Fig. 1

Patient enrollment and sample collection. We recruited 210 consecutive female patients from the Cancer Hospital of the Chinese Academy of Medical Sciences and Peking Union Medical College (CHCAMS, n = 160, the discovery cohort) and the Harbin Medical University Cancer Hospital (HMUCH, n = 50, the validation cohort) from April 1, 2019, to August 31, 2019, as part of the DETEct study (Deciphering Epigenetic signatures in Tumor and Exploiting ctDNA)

Fig. 2

The workflow of data analysis and models development. a The images of standard mammography and ultrasonography were interpreted and classified according to the fifth edition of the Breast Imaging Reporting and Database System (BI-RADS) standard by two experienced radiologists independently at each center. The cfDNA methylome of each participant was measured by the whole-genome bisulfite sequencing (WGBS). Additionally, machine learning was applied to identify cfDNA methylation markers of early-stage breast cancers. Finally, a diagnostic model using the identified cfDNA methylation markers in combination with radiology and ultrasound findings was accessed the ability to improve the accuracy of early-stage breast cancer detection. DCIS, ductal carcinoma in situ. b A comprehensive framework was devised to develop the classifier for identifying early-stage breast cancer based on the cfDNA methylation markers from massive cfDNA WGBS fragments. It includes four processes: the differentially methylated region (DMR) calling, cfDNA enrichment, cfDNA origin inference, and model development. First, 5613 hyper-DMRs and 51,962 hypo-DMRs were identified from WGBS data of 10 benign and 9 malignant breast primary tissue samples. Second, the cfDNA enrichment scores were computed by the mean number of fragments in DMRs. Then, fragment size selection was conducted to reduce the effect of plenty of non-tumor cfDNA in plasma based on that ctDNA fragments are shorter than non-tumor cfDNA fragments. Next, a fragment-based strategy was devised to statistically infer the origin (malignant or not) of each fragment, based on the distributions of DNA methylation pattern of tissues in DMRs. Finally, a predictive score (cfMeth score) based on cfDNA methylation ratio in each plasma sample was computed using a random forest classifier

Clinical characteristics of the participants

Six patients in the discovery cohort and one patient in the validation cohort were excluded due to low data quality. A total of 77 patients with BCs and 77 patients with benign lesions from CHCAMS constituted the discovery cohort. Forty-nine patients from HMUCH constituted the validation cohort, 24 of whom had biopsy-confirmed BCs (Fig. 1). The breast lesions were further interpreted as BI-RADS subcategories 4a, 4b, and 4c, depending on the probability of malignancy. Patients were followed up every 3 months for 6 months using mammograms and ultrasounds until a final diagnosis was made. All of the patients with confirmed BCs were in the early stages of the disease, and most of them were hormone receptor positive/luminal (Tables S1 and S2).

Identification of cfDNA methylation markers

The cfDNA concentrations were surveyed in each cohort, and none of them could directly discriminate between malignant and benign tumors (Fig. S1A). The quality and size distribution of plasma DNA samples was assessed by the Agilent 2100 Bioanalyzer (Agilent, USA). All distribution showed a distinct peak for the cfDNA (Fig. S1B). Furthermore, to remove the genomic DNA contamination, fragments lower than 500 bp in length were retained during bead-based library purification. After library preparation and sequencing, we inferred the size profile of cfDNA by analyzing the WGBS data (n = 101 malignant and 102 benign), which showed a typical pattern of the size distribution for cfDNA with a prominent mode at 167 bp (Fig. S1C). The cfDNA together with the 19 tissue samples yielded a total of 4.0 Tb of WGBS data, covering roughly 88% of the reference genome with 11.2× depth on average (Table S3).

We found that cfDNA fragments of WGBS are enriched in coding regions and intergenic regions compared to the loss in gene promoter regions (Fig. 3a). The amount of corresponding cfDNA to different genomic regions was negatively correlated with the CpG density and GC content (Fig. 3b and Fig. S2). A total of 57,575 DMRs (51,962 hypo-DMRs vs. 5613 hyper-DMRs) was identified between 9 malignant and 10 benign tumor samples (Fig. S3). As expected, CpG density was significantly higher in hyper-DMRs than in hypo-DMRs (p < 0.0001; Fig. 3c). Accordingly, the average amount of cfDNA fragments in hypo-DMRs is significantly higher than ones in hyper-DMRs in all of malignant and benign samples (p < 0.0001; Fig. 3d). The depletion of the cfDNA fragments in CpG-rich hyper-DMRs could be due to the preferential digestion of open chromatin regions in cfDNA [14]. Those findings help us to optimize cfDNA methylation markers in hypomethylated regions. Aberrant hypomethylation, generally in the closed chromatins of cfDNA-rich gene bodies or intergenic regions, is a common feature of various malignant tumors [14]. To ensure quantification of high-quality cfDNA, the hypo-DMRs were selected as candidate DNA methylation markers.

Fig. 3

The cell-free DNA methylation landscapes and the cfDNA methylation analysis for breast cancer diagnosis. a Gene body with ±10 kb profiles of mean sequencing coverage of cfDNA fragments using WGBS. The gene lengths were normalized to 20 kb. The cfDNA fragments were enriched in coding regions and intergenic regions compared to the loss in gene promoter regions. TSS, transcription start site; TES, transcription end site. b The amount of cfDNA in different genomic regions negatively correlating with their CpG density (r = 0.98, p < 0.0001). c Boxplots showing CpG density was significantly higher in hyper-DMRs than in hypo-DMRs (p < 0.0001). Boxplots represent the interquartile range (25–75%), with the median; whiskers correspond to 1.5 times the interquartile range. d The average amount of cfDNA fragments in hypo-DMRs is significantly higher than ones in hyper-DMRs in all of the malignant and benign samples (p < 0.0001 by Pearson’s chi-squared test). e The cfDNA malignant ratio of the top 10 optimal cfDNA hypo-DMRs markers in plasma samples from patients with breast cancer and benign breast lesions. ns, not significant; * p ≤ 0.05; ** p ≤ 0.01; *** p ≤ 0.001; **** p ≤ 0.0001. f and g Receiver operating characteristic (ROC) curves of the diagnostic model based on the cfMeth score. The area under the curve (AUC) of the cfMeth score obtained for the discovery (f) and validation (g) cohorts were 0.89 (95% CI, 0.84–0.94) and 0.81 (95% CI, 0.69–0.93). h and i ROC curves of the combined diagnostic model. The AUC of the cfMeth score obtained for the discovery (h) and validation (i) cohorts were 0.94 (95% CI, 0.90–0.97) and 0.93 (95% CI, 0.84–1.00)

The cfDNA malignant ratio was computed for the hypo-DMRs for each patient sample (Supplementary methods). The 10 optimal hypo-DMRs to distinguish between malignant and benign plasma samples were selected using data from the discovery cohort. They were mostly in the intergenic regions, in which the malignant ratios were significantly higher from patients with malignant tumors than ones from patients with benign lesions (p < 0.05 for all; Fig. 3e). However, there are four functional genes (RYR2, RYR3, GABRB3, and DCDC2C) and two lncRNAs (AC096570.1 and LINC00923) in the ten methylation markers (Table S4).

A prediction model for early-stage breast cancer using cfDNA methylation

Using the cfDNA malignant ratios of the markers, a predictive score of cfDNA methylation (cfMeth score) in each plasma sample in the discovery cohort was computed using a random forest classifier. To reduce overfitting, 10-fold cross-validation was used (Fig. S4). The area under the curve (AUC) of the model obtained for the discovery cohort and the validation cohort were 0.89 (95% CI, 0.84–0.94; Fig. 3f) and 0.81 (95% CI, 0.69–0.93; Fig. 3g), respectively, which was superior to the AUCs of BI-RADS findings (AUC = 0.78–0.79 for mammography and ultrasound) or relevant tumor biomarkers (AUC = 0.57–0.70 for CA15–3 and CEA; Fig. S5). The cfMeth scores in the prediction model were positively and robustly associated with the stages of BC, which include the ductal carcinoma in situ (Fig. S6). Additionally, lower cfMeth scores were associated with lower histologic grade and proliferation indices (Ki-67 measured by immunohistochemistry; Figs. S6 and S7).

A combinational model of cfMeth scores and diagnostic imaging

A diagnostic model combining cfMeth scores with mammography and ultrasound was developed in the discovery cohort using the ridge logistic regression. The combined model performed better than either one of the separate approaches (AUC = 0.94, 95% CI, 0.90–0.97; Fig. 3h). The malignant and benign groups had statistically different distributions of the combined scores in both the discovery and validation cohorts (Fig. S8). A cutoff point of the combined model was selected to be at which the false-negative rate was less than 2% in the discovery cohort, with a sensitivity of 98.7%, a specificity of 68.8%, and an accuracy of 83.8% (Table S6). In the validation cohort, the performance was similar (AUC = 0.93 [95% CI, 0.84–1.00]; Fig. 3i) with a sensitivity of 91.7%, a specificity of 88.0%, and an accuracy of 89.8%, which demonstrated no evidence of overfitting. Overall, the detection rate of the combined model was 93.3% (42/45), 100% (34/34), and 100% (22/22) at a specificity of 73.5% for stage I, II, and III, respectively (Table S5).


Liquid biopsies are emerging as a non-invasive adjunct or alternative to standard tumor biopsies [5,6,7, 10, 11]. Compared to previous locus-specific methylation analysis [10,11,12,13], this study provided a useful resource of extensive data to uncover cfDNA methylation characteristics at the genome-wide scale in breast cancer. Based on the WGBS methods covering ~ 88% human genomes, we have demonstrated the amount of cfDNA in different genomic regions negatively correlating with their CpG density.

Comparing to the recent study of the cfDNA methylation-based classifier for identifying BCs from healthy female controls [15], this study demonstrated a better performance by this combination strategy, even successful at distinguishing malignant breast lesions from benign ones, especially in stage I and II. Clinical use of the combined approach might reduce the number of unnecessary biopsies in women with BI-RADS category 4 findings.

However, there were also some limitations in this study. As a prior study, the tissue sample size was relatively small. Besides, the onset ages were not adequately matched in this study, which might lead to some potential bias in the methylation selection. The large-scale replication studies with more tumor tissue and cfDNA from multicenter will benefit to investigate the subtype-specific methylation biomarkers, the clinical utility and stability of the combined non-invasive strategy in future work.


In conclusion, we performed a blood-based whole-genome DNA methylation study at the single-base resolution for detecting early-stage breast cancer, which suggests that combining liquid biopsy with traditional diagnostic imaging can improve the accuracy of early-stage breast cancer diagnoses.

Availability of data and materials

The supplement data that support the findings of this study are openly available in the supplementary materials. All data can be viewed in NODE ( by pasting the accession (OEP000860) into the text search box or through the URL: Scripts used to generate the findings in this study have been deposited on



Breast cancer


Breast imaging reporting and data system


Circulating tumor DNA


Cell-free DNA


Methylated DNA immunoprecipitation sequencing


Reduced representation bisulfite sequencing


Whole-genome bisulfite sequencing


Differentially methylated regions


Area under the curve


Receiver operating characteristics


  1. 1.

    Waks AG, Winer EP. Breast cancer treatment: a review. JAMA. 2019;321:288–300.

    CAS  Article  Google Scholar 

  2. 2.

    Ohuchi N, Suzuki A, Sobue T, Kawai M, Yamamoto S, Zheng Y-F, Shiono YN, Saito H, Kuriyama S, Tohno E, et al. Sensitivity and specificity of mammography and adjunctive ultrasonography to screen for breast cancer in the Japan Strategic Anti-cancer Randomized Trial (J-START): a randomised controlled trial. Lancet. 2016;387:341–8.

    Article  Google Scholar 

  3. 3.

    American College of Radiology. American College of Radiology Breast Imaging Reporting and Data System Atlas (BI-RADS Atlas). Reston: American College of Radiology; 2013.

    Google Scholar 

  4. 4.

    Bevers TB, Helvie M, Bonaccio E, Calhoun KE, Daly MB, Farrar WB, Garber JE, Gray R, Greenberg CC, Greenup R, et al. Breast cancer screening and diagnosis, version 3.2018, NCCN clinical practice guidelines in oncology. J Natl Compr Cancer Netw. 2018;16:1362–89.

    Article  Google Scholar 

  5. 5.

    Rothwell DG, Ayub M, Cook N, Thistlethwaite F, Carter L, Dean E, Smith N, Villa S, Dransfield J, Clipson A, et al. Utility of ctDNA to support patient selection for early phase clinical trials: the TARGET study. Nat Med. 2019;25:738–43.

    CAS  Article  Google Scholar 

  6. 6.

    Ye Q, Ling S, Zheng S, Xu X. Liquid biopsy in hepatocellular carcinoma: circulating tumor cells and circulating tumor DNA. Mol Cancer. 2019;18:114.

    Article  Google Scholar 

  7. 7.

    Lim SY, Lee JH, Diefenbach RJ, Kefford RF, Rizos H. Liquid biomarkers in melanoma: detection and discovery. Mol Cancer. 2018;17:8.

    Article  Google Scholar 

  8. 8.

    Lennon AM, Buchanan AH, Kinde I, Warren A, Honushefsky A, Cohain AT, Ledbetter DH, Sanfilippo F, Sheridan K, Rosica D, et al. Feasibility of blood testing combined with PET-CT to screen for cancer and guide intervention. Science. 2020;369(6499):eabb9601.

    CAS  Article  Google Scholar 

  9. 9.

    Cohen JD, Li L, Wang Y, Thoburn C, Afsari B, Danilova L, Douville C, Javed AA, Wong F, Mattox A, et al. Detection and localization of surgically resectable cancers with a multi-analyte blood test. Science. 2018;359:926–30.

    CAS  Article  Google Scholar 

  10. 10.

    Xu RH, Wei W, Krawczyk M, Wang W, Luo H, Flagg K, Yi S, Shi W, Quan Q, Li K, et al. Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nat Mater. 2017;16:1155–61.

    CAS  Article  Google Scholar 

  11. 11.

    Luo H, Zhao Q, Wei W, Zheng L, Yi S, Li G, Wang W, Sheng H, Pu H, Mo H, et al. Circulating tumor DNA methylation profiles enable early diagnosis, prognosis prediction, and screening for colorectal cancer. Sci Transl Med. 2020;12:eaax7533.

  12. 12.

    Shen SY, Singhania R, Fehringer G, Chakravarthy A, Roehrl MHA, Chadwick D, Zuzarte PC, Borgida A, Wang TT, Li T, et al. Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature. 2018;563:579–83.

    CAS  Article  Google Scholar 

  13. 13.

    Guo S, Diep D, Plongthongkum N, Fung HL, Zhang K, Zhang K. Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA. Nat Genet. 2017;49:635–42.

    CAS  Article  Google Scholar 

  14. 14.

    Snyder Matthew W, Kircher M, Hill Andrew J, Daza Riza M, Shendure J. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell. 2016;164:57–68.

    CAS  Article  Google Scholar 

  15. 15.

    Liu MC, Oxnard GR, Klein EA, Swanton C, Seiden MV, Cummings SR, Absalan F, Alexander G, Allen B, Amini H, et al. Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol. 2020;31:745–59.

    CAS  Article  Google Scholar 

Download references


This study is part of the DETEct study (Deciphering Epigenetic signatures in Tumor and Exploiting ctDNA; Chinese Clinical Trial Registry number: ChiCTR1900026080). We thank all the individuals, families, and physicians involved in the study for their participation.


This research was funded in part by the National Natural Science Foundation of China (61871294 to J.S., 81802669 to J.L., 81501852 to N.W., 81872149 to S.X., 11701385 to Y.Z., 81472046 and 81772299 to Z.W.,), Science Foundation of Zhejiang Province (LR19C060001 to J.S), the Fundamental Research Funds for Wenzhou Institute of University of Chinese Academy of Sciences (WIBEZD2017009–05 to J.S.), Tsinghua University-Peking Union Medical College Hospital Initiative Scientific Research Program, the CAMS Innovation Fund for Medical Sciences (2020-I2M-C&T-B-068 to J.L.), the CAMS Initiative Fund for Medical Sciences (2016-I2M-3-003 to N.W., 2016-I2M-2-006 and 2017-I2M-2-001 to Z.W., and 2016-I2M-1-001 to X.W. and Z.L.), Beijing Natural Science Foundation (JQ20032 to N.W.), Non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences (No. 2019PT320025), and the National Key Research and Development Program of China (2018YFC0910506 to N.W. and Z.W.).

Author information




J.S., J.L., N.W., and X.W. conceived the study. J.L., S.X., T.Q., Z.X., X.W., D.P., H.C., and Z.L. collected study materials or patients. H.Z., Y.H., J. L, X.W., K.L., Y.M., M.Z., W.Z., and J.L. performed data cleaning and statistical analysis. J.S., H.Z., Y.H., Y.Z., S.X., and X. W devised the algorithm and performed data analysis and interpretation. J.L., J.S., H.Z., and N.W. wrote the manuscript. The author(s) read and approved the final manuscript.

Corresponding authors

Correspondence to Xiang Wang or Nan Wu or Jianzhong Su.

Ethics declarations

Ethics approval and consent to participate

This study followed the criteria of REMARK (REporting recommendations for tumor MARKer prognostic studies) was reviewed and approved by the ethics committees of the Cancer Hospital of the Chinese Academy of Medical Sciences and Peking Union Medical College and the Harbin Medical University Cancer Hospital. Each participant provided written informed consent. This study was compliant with ethics committees for patient data release and act privacy.

Consent for publication

All the authors have read and approved the final manuscript for publication.

Competing interests

The authors have no conflict of interest to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Zhao, H., Huang, Y. et al. Genome-wide cell-free DNA methylation analyses improve accuracy of non-invasive diagnostic imaging for early-stage breast cancer. Mol Cancer 20, 36 (2021).

Download citation