Skip to content

Generate synthetic health data with a dockerized version of Synthea.

License

Notifications You must be signed in to change notification settings

hpi-dhc/synthea-v270

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Instructions for synthea-v270

This repository provides a Docker instance of Synthea, a tool for generating synthetic patient records.

Setup

0) Prerequisites:

  • An up-to-date Docker instance (tested with v20.10.14).
  • git installed.
  • A stable internet connection.
  • Only for Linux users: Please provide Docker and its corresponding user group(s) read AND write access to your file system (specifically: The folder from which you start your Docker containers and all its subdirectories).

1) Clone this repository:

Use either of the following two commands to clone this repository to your computer:

git clone https://github.com/hpi-dhc/synthea-v270.git
git clone [email protected]:hpi-dhc/synthea-v270.git

On your machine, cd to the cloned folder and create an empty folder called output.

mkdir output

This folder is needed to store the output generated by the dockerized Synthea.

2) Build Docker image:

Initially, you need to build the Docker image (name: synthea-v270) from the Dockerfile provided within this repository. Make sure that your working directory is this folder.

docker build -t synthea-v270 .

3) Run image as a Docker container:

Once the image has been built, run it as a container (here, its name is 'synthea'), mount the synthea.properties file (the file shipped with this repository uses default settings except for the parameter to produce additional .csv output for further processing in an ETL pipeline for an OMOP CDM-formatted database), and mount the output folder of the container to your host file system to access the data in a later project. Select the appropriate populationSize (e.g., here: 123). If you wish to see the console output of synthea, remove the -d flag from the command below. The container will automatically be stopped and removed after the data will have been generated.

docker run --rm -d --name synthea \
    --mount type=bind,source=$(pwd)/synthea.properties,target=/app/synthea.properties,readonly \
    --mount type=bind,source=$(pwd)/output,target=/app/output \
    -e populationSize=123 \
    synthea-v270

4) Access your files:

The synthetic patient data output (persisted as .csv files on the host machine) is available in the ./output/csv folder for inspection and further processing in other projects, e.g., the OMOP import container environment. The corresponding JSON-formatted FHIR (v4) data is in the ./output/fhir folder.

Notes

  • All settings for Synthea can be changed in the ./synthea.properties file, e.g., when you wish to have additional STU 3- or DSTU 2-formatted FHIR output). All FHIR JSON files will also be in the ./output folder.
  • Every time you build a new cohort, you need delete the content of the ./output folder of your host prior to executing the script.
  • The only Synthea version currently (2022-04-05) compatible with the OHDSI ETL-Synthea scripts is v2.7.0, even though Synthea is already at v3.x. One of the notable differences is that the newer versions use a slightly different syntax for the synthea.properties file.
  • If not otherwise specified, all commands are executed on the host machine with the working directory being the cloned repository.
  • This repository is intended for local use only. Even though easily implementable best practices for creating Dockerfiles were followed, deployment in a production setting would require additional security mechanisms.
  • This project intentionally refrains from using a copyleft license. Nevertheless, all users are kindly invited to contribute to the project, specifically to leave a note to the author if you find parts of the code to be broken or the explanations in this README ambiguous.

License

Copyright 2022 Hasso Plattner Institute

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

(Written by Jan Philipp Sachs on May 17, 2022)

About

Generate synthetic health data with a dockerized version of Synthea.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published