The Japan Cardiovascular Database-Keio Interhospital Cardiovascular Studies (JCD-KiCS) is a large, ongoing, prospective, multi-center (n=15) PCI registry to collect clinical data from consecutive patients undergoing PCI in Japan, in collaboration with the National Cardiovascular Data Registry (NCDR) CathPCI9,10,11. In JCD-KiCS, all PCI procedures were performed under the direction of the intervention team of each participating hospital according to standard of care. Participating hospitals were instructed to register data from consecutive PCIs using an electronic data collection software system equipped with a data query engine and validations to maintain data quality. Data entry was performed by dedicated clinical research coordinators who were specifically trained in JCD-KiCS. Data quality was ensured through the use of an automated validation system and bi-monthly standardized education and training for the clinical research coordinators. The lead study coordinator (IU) and an extensive on-site audit by the investigator (SK) ensured proper registration of each patient. The protocol of this study was in line with the principles of the Declaration of Helsinki and was approved by the Ethics Committee of Keio University School of Medicine and the committee of each participating hospital (National Hospital Organization Review Board for Clinical Trials; the Eiju General Hospital Ethics Committee; the Ethics Committee of the Saiseikai Utsunomiya Hospital; Tokyo Saiseikai Central Hospital Research Ethics Committee; Japanese Red Cross Ashikaga Hospital Ethics Committee; Kawasaki City Hospital Institutional Review Board; Saitama City Hospital Ethical Review Board; Isehara Kyodo Hospital Institutional Review Board; Tokyo Dental College Ichikawa General Hospital Institutional Review Board, the Hiratsuka City Hospital Independent Ethics Committee, the Saint Luke’s Health System Institutional Review Board, the Hino Municipal Hospital Institutional Review Board and the Yokohama Municipal Citizen’s Hospital Ethics Committee ). All participants received verbal or written consent to collect the baseline data, and informed consent was obtained from each participant individually.
We extracted 24,848 consecutive patients who underwent PCI between July 2008 and September 2020. Because several parameters are used as input variables for one model and as exclusion criteria for other models (e.g. hemodialysis before PCI is an input variable of the in-hospital mortality model). and exclusion criteria of the AKI model), we created each outcome-specific cohort using a two-step process. First, we excluded patients with missing indications (n = 967), those without hemoglobin before and after the procedure (n = 901), and those without serum creatinine before and after the procedure (n = 22) (analytical cohort). Next, we applied outcome-specific exclusion criteria, followed by imputation of missing values to form each cohort (detailed in Fig. 1). Each population was randomly divided into a training group of 75% of patients and a test group of the remaining 25% of patients with approximately the same proportion of events.
Definitions and Results
The definition of AKI, bleeding and in-hospital mortality was consistent with the original NCDR-CathPCI models4,5,6. Briefly, AKI was defined as ≥ 0.3 mg/dL absolute or ≥ 1.5-fold relative increase in post-PCI creatinine or new dialysis initiation. Bleeding was defined as any of the following bleeding occurring within 72 hours of PCI or prior to hospital discharge (whichever occurs first): site-reported bleeding at the site of arterial access; retroperitoneal, gastrointestinal, urogenital hemorrhage, intracranial hemorrhage, cardiac tamponade, or postoperative hemoglobin decrease of 3 g/dl in patients with preoperative hemoglobin ≤ 16 g/dl or postoperative blood transfusions without bypass patients with preoperative hemoglobin ≥ 8 g/dl intervention. In-hospital mortality was defined as any post-procedural death in the same hospital admission. Since JCD-KiCS was developed in collaboration with NCDR-Cath PCI, most of the clinical variables were defined in accordance with the Data Dictionary (version 4.1).9. For example, cardiogenic shock has been defined as a sustained (> 30 min) episode of systolic blood pressure < 90 mm Hg and/or cardiac index < 2.2 L/min/m2 identified as secondary to cardiac dysfunction and/or the need for intravenous inotropic or vasopressor agents or mechanical support to maintain blood pressure and cardiac index above specified values within 24 hours post-procedure.
Dealing with missing data
After enrolling the analytical cohort, we imputed the missing preprocedural hemoglobin value with the postprocedural hemoglobin value for the developed AKI and hospital mortality model, and imputed the missing preprocedural creatinine values with those of the postoperative one. procedural creatinine for the developed bleeding and in-hospital mortality models. Because the absence rate for all other variables was <5%, we treated the missing values to use median imputation for the continuous variables and mode imputation for the categorical variables.
We have developed two models: LR models and Extreme Gradient Descent Boosting (XGB) models. XGB is an ML algorithm that builds a series of relatively simple decision trees combined with boosting methods to develop more robust final predictions. In the LR model we used the same categorized variables from the original NCDR-CathPCI risk assessments (original model) and in the XGB model we used the same variables but treated continuous raw variables categorized in the original models. The full list of variables was as follows:
AKI model: age (categorized as <50, 50–59, 60–69, 70–79, 80–89, and ≥90 years), heart failure within 2 weeks, estimated glomerular filtration rate (eGFR) (categorized as <30, 30-44, 45-59 and ≥ 60 mL/min/1.73 m2), diabetes mellitus, history of heart failure, history of cerebrovascular disease, non-ST elevation acute coronary syndrome (NSTEACS), ST elevation myocardial infarction (STEMI), cardiogenic shock at presentation, cardiopulmonary arrest at presentation, anemia defined as hemoglobin at admission less than 10 g/dL and use of IABP.
Bleeding model: STEMI, age (categorized as < 60, 60-70, 71-79 and ≥ 80 years), BMI (categorized as < 20, 20-30, 30-39 and ≥ 40 kg/m).2), previous PCI, eGFR (categorized as < 30, 30-44, 45-59 and ≥ 60 mL/min/1.73 m2), cardiogenic shock at presentation, female gender, hemoglobin at presentation (categorized as < 13, 13–15, ≥ 15 g/dL), and PCI status (emergency, salvage, urgency, and elective).
In-Hospital Mortality Model: Age (categorized as < 60, 60-69, 70-79, and ≥ 80 years), cardiogenic shock at presentation, history of heart failure, peripheral arterial disease, chronic obstructive pulmonary disease, estimated GFR (categorized as < 30, 30-44, 45-59, 60-89 and ≥ 90 mL/min/1.73 m2), NYHA Classification IV at presentation, STEMI and PCI status (emergency, salvage, urgent and elective).
To optimize the hyperparameters of the XGB model, we used a stratified triple cross-validation with a random search. After determining the best hyperparameters, XGB models were developed using the full training set (hold-out methods, supplementary material for a more detailed explanation). In addition, we constructed the augmented LR and XGB models using additional variables selected for clinical significance. The additional variables were as follows:
Extended AKI model: Contrast volume and timing of PCI (i.e. during work or holiday periods).
Extended bleeding model: Number of antiplatelet drugs used, use of anticoagulants in PCI, and timing of PCI.
Extended in-hospital mortality model: Technical failure of PCI, defined as failure to pass the guidewires or if the TIMI grade after PCI was 1 or 0 (slow flow or no flow) and the timing of PCI.
Statistics and metrics
Continuous variables were pooled as medians with interquartile ranges and compared using Mann-Whitney U tests, and categorical variables were pooled as frequencies and compared using chi-square tests or Fisher’s exact tests, as appropriate.
Delong’s C-statistics with 95% confidence intervals (95% CIs) and the area under the precision recall area under the curve (PRAUC) were used to estimate model discrimination. The model calibration was assessed using the Brier score and the calibration plot. The Brier score is defined as the mean square difference between the observed and predicted results and ranges from 0 to 1.00, with 0 representing the best possible calibration. The two main components decomposed from the Brier score, i.e. reliability and resolution, were also assessed. Calibration plots were used to plot the mean risk score versus the observed outcome rate for a specific quintile of predicted risk. In addition, we used the Net Reclassification Index (NRI) to assess the clinical utility of the LR and XGB models with cut-off values of 10%, 4%, and 2.5% for AKI, bleeding, and in-hospital mortality, respectively . A P a value of <0.05 was considered statistically significant. This study is based on the transparent reporting of a multivariable predictive model for individual prognostic or diagnostic guidelines (TRIPOD).
We used a multiple imputation method instead of a median imputation method to handle missing values. The multiple imputation model included all predefined predictors and outcomes as recommended12. Ten imputed data sets were generated and the C statistics were combined using Rubin’s rules.
All analyzes were performed in R (version 4.0.4; R Project for Statistical Computing, Vienna, Austria) using the tidymodels package (version 0.1.2) for data preprocessing, hyperparameter tuning, learning and performance metrics13,14,15. We used xgboost (version 188.8.131.52) for extreme incline descent boost16pROC (version 184.108.40.206) for calculating C statistics17Verification (version 1.42) for calculating Brier scores18PredictABEL (version 1.2.4) to calculate the NRI19 Mice (version 3.14.0) to perform multiple imputations20.