합성 데이터를 활용한 폐암 환자의 생존분석 가능성 검정open accessA Study on the Availability of Survival Analysis of Lung Cancer Patients Using Synthetic Data
- Other Titles
- A Study on the Availability of Survival Analysis of Lung Cancer Patients Using Synthetic Data
- Authors
- 유제형; 이승희; 김종엽; 손지웅; 구관우; 이수현
- Issue Date
- Nov-2022
- Publisher
- 한국보건정보통계학회
- Keywords
- Survival analysis; Synthetic data; Kaplan-Meier estimation; Cox regression model; Lung cancer
- Citation
- 보건정보통계학회지, v.47, no.4, pp.279 - 289
- Journal Title
- 보건정보통계학회지
- Volume
- 47
- Number
- 4
- Start Page
- 279
- End Page
- 289
- URI
- https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/88117
- DOI
- 10.21032/jhis.2022.47.4.279
- ISSN
- 2465-8014
- Abstract
- Objectives: This was a pilot study to investigate the possibility of clinical analysis to support the lack of sample size of real data and to generate synthetic data. Since real data has many limitations, such as ethical issues and costly issues, there have been many attempts to create realistic synthetic data. The focus is on whether synthetic data can be used instead of real data. Methods: This study analyzed 11,978 lung cancer patients who used anticancer drug therapy using synthetic data as a quasi-experimental study. Clinically significant variables were extracted and some tables containing patient status and treatment records were preprocessed. This experiment was applied to the propensity score matching technique to prevent the bias of covariates. Then, the preprocessed data were analyzed using Kaplan-Meier estimation and Cox proportional hazards model. Results: When plotting the survival curves, the curves from the synthetic data did not match the curves for the actual data of the other covariates. In Cohort 1, Gen I had a better 5-year OS than Gen II [S1 = 0.973, S2 = 0.953, p < 0.05]. Similarly, Gen I anti-cancer was better than Gen III in Cohort 2 [S1 = 0.990, S3 = 0.884, p < 0.05]. In the exploratory sub- group analysis using the Cox regression model, the risk ratio was estimated. We found that Gen I had a better effect on HR than Gen II and III. However, those results were different from the actual trend. Conclusions: It was found that the analysis based on the DATA-FREE-BOX data was different from the trend of the survival analysis conducted with the real data. The trend of this analysis could be different from the real trend. It will be able to contribute to data-validation. Moreover, it is expected that the same methodology can be applied in clinical studies based on actual data by utilizing the technique used in this study.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - ETC > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.