MIC-DICOM-Deidentifier

A Hybrid AI and Rule-based Approach to DICOM De-identification: A Solution for the MIDI-B Challenge at MICCAI 2024

Abstract

This project presents a robust de-identification system for DICOM files, developed for the MIDI-B challenge at MICCAI 2024. The solution integrates AI-based and rule-based techniques to de-identify Protected Health Information (PHI) within DICOM metadata and image data, achieving near-perfect accuracy. Through iterative improvements, our approach ensures privacy while preserving image utility, achieving an accuracy of 99.91% on the test dataset.

Keywords

DICOM De-identification, AI-based de-identification, Rule-based de-identification, Protected Health Information (PHI), RoBERTa model, I2B2 2014 dataset, PaddleOCR

Introduction

This project addresses the challenge of DICOM file de-identification, crucial for protecting patient privacy in medical imaging. Leveraging the MIDI-B challenge framework, we built a system that removes PHI from both metadata and image data, based on the DICOM PS 3.15 confidentiality guidelines and the Safe Harbor method by TCIA.

Download the Data

To use this system, download the dataset from The Cancer Imaging Archive (TCIA) using this link. Place the raw images under an images directory and the de-identified images under an images-2 directory.

Installation

Install the necessary dependencies using poetry:

poetry install --no-root

Running Jupyter Notebooks

To run Jupyter notebooks, follow these steps:

Install Jupyter Notebook globally:
```
pip install jupyter
```
Add the current Poetry environment to the Jupyter notebook kernels:
```
poetry run python -m ipykernel install --user --name dcm-deid
```
Start the Jupyter notebook:
```
jupyter notebook
```
Open the notebooks in your browser and select the dcm-deid kernel. Ensure it is selected for running the notebooks, especially if another kernel is chosen by default.

Dataset Directory

Download the required dataset from TCIA as mentioned above. Ensure the data is organized as follows:

dataset_directory/
├── images/         # Raw DICOM images
└── images-2/       # De-identified DICOM images

Usage

To de-identify DICOM files, execute the following command:

python deidentify.py --input_dir <path_to_dicom_files> --output_dir <output_path>

Results

Our hybrid method achieved a final accuracy of 99.91% on the MIDI-B test dataset, with specific improvements in handling private tags and reducing false positives.

Acknowledgements

Supported by the Helmholtz Metadata Collaboration, Hub Health, and the RACOON project in NUM 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
data		data
dcm_deidentifiers		dcm_deidentifiers
dcm_validator		dcm_validator
deid_app		deid_app
utils		utils
.env		.env
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
deid.py		deid.py
evaluation.py		evaluation.py
image-1.png		image-1.png
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIC-DICOM-Deidentifier

A Hybrid AI and Rule-based Approach to DICOM De-identification: A Solution for the MIDI-B Challenge at MICCAI 2024

Abstract

Keywords

Introduction

Download the Data

Installation

Running Jupyter Notebooks

Dataset Directory

Usage

Results

Acknowledgements

About

Releases

Packages

Languages

License

MIC-DKFZ/miccai2024_midi-b-submission

Folders and files

Latest commit

History

Repository files navigation

MIC-DICOM-Deidentifier

A Hybrid AI and Rule-based Approach to DICOM De-identification: A Solution for the MIDI-B Challenge at MICCAI 2024

Abstract

Keywords

Introduction

Download the Data

Installation

Running Jupyter Notebooks

Dataset Directory

Usage

Results

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages