Using NLP to Predict CPT Codes from IR Post-Procedure Notes

Monday, March 25, 2024

12:00 PM – 1:00 PM MT

Poster Presenter(s)

Hossam A. Zaki, BS

Medical Student
Warren Alpert Medical School of Brown University

Financial relationships: Full list of relationships is listed on the CME information page.

Author/Co-author(s)

Xiao Wu, MD (she/her/hers)

Resident Physician
University of California, San Francisco

Financial relationships: Full list of relationships is listed on the CME information page.
Vishal Kumar, Associate Professor

Associate Program Director
UCSF / ZSFG

Financial relationships: Full list of relationships is listed on the CME information page.
JS

Jae Ho Sohn, MD

Attending
UCSF

Disclosure information not submitted.

Purpose:

The objective is to predict billing codes for standard interventional radiology procedures from post-procedure notes to reduce time spent on this tedious task and improve reimbursement.

Materials and methods:

Utilizing the MIMIC-IV database, procedure titles that include the words “embolization” or “catheter” were queried, and the respective post-procedure notes were extracted. These terms were chosen as they represent frequently performed IR procedures with distinct CPT codes. Any codes with fewer than 50 related notes were excluded. Additionally, if a note had multiple instances of the same code, it was counted only once. Our model was trained on two datasets: a query that searched solely for "embolization" in procedure titles (1,590 notes across 17 codes) and another that searched for both "embolization" and "catheter" (5,590 examples across 42 codes).

We utilized two NLP models, BERT and XLNet. BERT was developed by Google and is able to understand the context of words in a sentence by looking at the words before and after. XLNet on the other hand addresses some of the limitations of BERT. XLNet will attempt to understand all possible orders of the words in a sentence, allowing it to better understand the concept of each word in a sentence. Notes were tokenized using the model’s pre-trained tokenizer. A probability threshold of 0.375 was used.

Each model was individually fine-tuned on 80% of the data and validated on the remaining 20%

Results:

For the embolization-specific dataset, BERT’s average F1 score, precision, and recall were 0.79, 0.76, and 0.82, respectively. XLNet achieved an average F1 score, precision, and recall of 0.84, 0.84, and 0.84, respectively. This indicates that both models are able to accurately predict billing codes, with XLNet outperforming BERT by a significant margin.

For the embolization and catheter joint dataset, BERT’s average F1 score, precision, and recall were 0.81, 0.81, and 0.82 respectively. XLNet’s average F1 score, precision, and recall were 0.85, 0.83, and 0.88, respectively. Similarly, this indicates that both models can predict medical codes accurately, with XLNet outperforming BERT. This also shows that the addition of more codes did not result in a loss in performance.

Underperforming codes were often complex, underrepresented in the dataset, and wrapped within other codes including 36246 (Selective placement of a catheter in a second-order artery branch).

Conclusion:

This study shows that NLP can be used for IR medical coding from post-procedure notes. This can help human coders to be more efficient and prevent additional costs due to coding errors.