This library contains the SEER Java implementations of the Multiple Primary and Histology Coding Rules.
The implementation was partially based on the KCR Multiple Primary Rules Library developed by the Kentucky Cancer Registry.
The library is available on Maven Central.
To include it to your Maven or Gradle project, use the group ID com.imsweb
and the artifact ID mph
.
You can check out the release page for a list of the releases and their changes.
The entry point to the library is the MphUtils.computePrimary method. It takes two MphInput representing the two tumors to evaluate, and returns an MphOutput that contains the result (SINGLE_PRIMARY, MULTIPLE_PRIMARIES, QUESTIONABLE or INVALID_INPUT) and a description of the rules that were executed to obtain that result.
Here is a typical call to the library:
public class MphTest {
public static void main(String[] args) {
MphInput input1 = new MphInput();
input1.setDateOfDiagnosisYear("2016");
input1.setDateOfDiagnosisMonth("06");
input1.setDateOfDiagnosisDay("15");
input1.setPrimarySite("C509");
input1.setLaterality("0");
input1.setHistologyIcdO3("8000"); // use for DX year 2001+
input1.setBehaviorIcdO3("3"); // use for DX year 2001+
//input1.setHistologyIcdO2("8000"); // use for DX year prior to 2001
//input1.setBehaviorIcdO2("3"); // use for DX year prior to 2001
MphInput input2 = new MphInput();
input2.setDateOfDiagnosisYear("2016");
input2.setDateOfDiagnosisMonth("06");
input2.setDateOfDiagnosisDay("15");
input2.setPrimarySite("C501");
input2.setLaterality("0");
input2.setHistologyIcdO3("8000"); // use for DX year 2001+
input2.setBehaviorIcdO3("3"); // use for DX year 2001+
//input2.setHistologyIcdO2("8000"); // use for DX year prior to 2001
//input2.setBehaviorIcdO2("3"); // use for DX year prior to 2001
MphOutput output = MphUtils.getInstance().computePrimaries(input1, input2);
System.out.println("Result: " + output.getResult());
}
}
That simple example was copied from this lab class.
Different sets of rules are used based on the diagnosis year and the histology.
If histology is in the range 9590-9993, one of the Hematopoietic set of rules will be used:
- If DX year 2010 and later, the "Hematopoietic 2010+" rules will be used.
- If DX year 2001-2009, the "Hematopoietic 2001-2009" rules will be used.
- If DX year 2000 or earlier, the "Hematopoietic 2000 and earlier" rules will be used.
If histology is not in the range 9590-9993, one of the following solid tumors sets will be used:
Solid Tumor
- 2021+ Cutaneous Melanoma
- 2023+ Other Sites
- 2018+ all other modules
MPH
- 2007-2020 Cutaneous Melanoma
- 2007-2022 Other Sites
- 2007-2017 for all other modules
If DX year is 2006 or earlier and the case is not Benign Brain (C700-C729, C751-C753 with behavior 0/1), the "2006 and earlier Solid Malignant" rules will be used.
If DX year is 2006 or earlier and the case is Benign Brain (C700-C729, C751-C753 with behavior 0/1), the "2006 and earlier Benign Brain" rules will be used.
The project contains a lab class that can be used to generate CSV files that contains fake data along with the library result.
A sample testing file is available in the project: mph-testing-2000-2022.csv.gz
To create larger file, clone the project and execute the main method of that class locally.
The class allows the following parameters (defined in the top of the main method):
- numTests: the number of rows for the generated CSV file
- minDxYear: the minimum DX year to use
- maxDxYear: the maximum DX year to use
The CSV files will contain the following columns:
- year1
- month1
- day1
- site1
- hist1
- beh1
- lat1
- year2
- month2
- day2
- site2
- hist2
- beh2
- lat2
- result
- reason
The lab class uses the Data Generator library to create the fake data.
This library was developed through the SEER program.
The Surveillance, Epidemiology and End Results program is a premier source for cancer statistics in the United States. The SEER program collects information on incidence, prevalence and survival from specific geographic areas representing a large portion of the US population and reports on all these data plus cancer mortality data for the entire country.
The Kentucky Cancer Registry is the population-based central cancer registry for the Commonwealth of Kentucky.