A Hybrid AI-based Method for ICD Classification of Medical Documents

Date:

Short talk presenting our ontology-based transfer learning method for classifying medical documents with ICD codes. The core idea: instead of transferring statistical model weights — which risks leaking patient data under the GDPR — we normalize the classifier’s feature space through medical ontologies, making trained models reusable across institutions without additional training data.

Problem

Hospitals must annotate patient records with ICD and OPS codes for billing and documentation. This is expensive manual work requiring medical expertise. Standard ML classifiers trained on one hospital’s data cannot simply be transferred to another due to GDPR restrictions and vocabulary differences between institutions.

Approach

We replace raw text features with ontology-grounded concepts. A modified Neural Concept Recognition (NCR) algorithm maps unstructured text to concepts in MeSH, enriched with UMLS terminology and colloquial synonyms from OpenThesaurus and Wikidata. The classifier ensemble (Random Forest, Logistic Regression, SVM with soft voting) operates on this normalized feature space — so the model itself carries no patient-identifiable information.

Key Results

Evaluated on clinical reports (source domain) transferred to a social media health forum (target domain) — deliberately different vocabulary, same underlying medical concepts:

  • Baseline (bag-of-words, no ontology): F1 = 0.51 / 0.60 for labels S06 and R55
  • MeSH-based transfer: F1 = 0.90 / 0.73
  • Enriched ontology transfer: F1 = 0.92 / 0.77

No training data in the target domain was required.

Publication

D. Bruneß, M. Bay, C. Schulze, M. Guckert, M. Minor. A Hybrid AI-based Method for ICD Classification of Medical Documents. In: ICIMTH ‘23, Studies in Health Technology and Informatics, IOS Press, 2023.

Based on the full paper: An Ontology-based Transfer Learning Method Improving Classification of Medical Documents (IEEE ICMLA 2022).