25-1 Text analytics project - The DRAGON Challenge

Recently, I started an NLP project for my <Text Analytics> class. Following our professor’s recommendation to choose an ongoing or recently finished competition, our team decided to take on the DRAGON Challenge. Over the next few months, I’ll be sharing a series of posts detailing our project experience.

(FYI: I referred to List of Data Science Competition Platforms when searching for a project)

Overview

The DRAGON Challenge (Diagnostic Report Analysis: General Optimization of NLP) involves developing NLP algorithms for automated medical data curation.
🏥 Data Scope: Over 28,824 annotated medical diagnosis reports from 22,895 patients, collected from five Dutch care centers.
📋 Tasks: The challenge comprises 28 clinically relevant tasks.
🤖 Pre-trained Models: It offers pre-trained models that have been trained on 4,000,000 clinical reports. All models are available on HuggingFace.

Data

Clinical Reports: A total of 28,824 reports from 22,895 patients were included, gathered from five Dutch care centers.
Patient Visits: The data covers patients with diagnostic or interventional visits between January 1, 1995, and February 12, 2024.
Sample Reports: You can view sample reports on Github.

Annotation
- For 27/28 tasks, all reports were manually annotated
- For task 18, the 4803 development cases were automatically annotated using GPT-4
- the 172 testing cases were manually annotated

Data Access: Due to privacy restrictions, participants cannot directly access or download the medical report data—it is only available through the Grand Challenge (GC) platform for model training and testing.

Pre-trained models

Models Available: BERT-base (Dutch), RoBERTa-base/large (Multilingual), Longformer-base/large (English)
Training data:
- Medical reports from Ziekenhuisgroep Twente hospital.
- Duration: July 13th 2000 – April 25th 2023
- Size: 4,152,762 reports
- Split: training(80%), validation(10%), test(10%)

25-1 Text analytics project – The DRAGON Challenge

Overview

Data

Pre-trained models

Leave a Reply Cancel reply

Overview

Data

Pre-trained models

Related Posts

[Insurance Retrieval] Starting a project: Information Retrieval for Insurance Documents

Leave a Reply Cancel reply