SIR 2024

General IR

Session: JVIR: Trials, Abstracts, and What It Takes to Publish an Award-Winning Article

Interpretable Machine Learning Models for Predicting the Publication Success Using SIR Abstracts

Tuesday, March 26, 2024

3:18 PM – 3:27 PM MT

Presenting Author(s)

Okan Ince, MD

Research Fellow
Rush University Medical Center

Financial relationships: Full list of relationships is listed on the CME information page.

Author/Co-author(s)

Bulent Arslan, MD, FSIR

Professor and Chair, Vascular and Interventional Service Line
Rush University Medical Center

Financial relationships: Full list of relationships is listed on the CME information page.

Purpose:

To evaluate the ability of machine learning models in predicting publication of prior SIR abstracts.

Materials and Methods: Scientific oral presentation abstracts from SIR 2018 annual meeting were reviewed from JVIR Supplemental volume. Topics and designs of the abstracts were categorized as well as their sample sizes, author counts and maximum follow-up spans. Studies with positive conclusions are categorized as positive studies. Publication status was determined by conducting advanced search on Pubmed and Google Scholar. Main keywords and first author’s name were included at initial search. Other authors were attempted if no results found. The search process was conducted in Google Scholar is no result was found in Pubmed. After standardization and one-hot encoding of categorical features, and splitting the data into train and test sets, a backward propagated sequential feature selection (SFS) algorithm was used for feature selection. 4 different machine learning models which were support vector machine (SVM), logistic regression (LR), CatBoost and LightGBM were built with using the selected features, and performance metrics were calculated. In terms of interpretability, shapley additive explanations (SHAP) values are calculated for each model.

Results: There were 406 oral presentation abstracts in SIR 2018 annual meeting. 240(59.1%) were categorized in Oncology. 308(75.9%) abstracts were in retrospective design. Median sample size was 72 (IQR:175), number of authors was 6 (IQR: 4) and follow up duration was 10 months (IQR:21). 376(92.6%) abstracts conclude with positive results. 192(47.3%) abstracts were published after the meeting. Sample size, follow up duration, being positive study, retrospective design and management category were selected by the SFS algorithm. The accuracies of SVM, LR, CatBoost and LightGBM models for prediction of publication were 64%, 65%, 65% and 62%, respectively. SHAP values indicated that for all the models, retrospective design and follow-up duration were the most influential features.

Conclusion: Machine learning models bear great potential in predicting of publication of an abstract by using certain criteria. With further refinement and additional data, these models can be useful tools for authors in gauging the publishability of their work.