Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tooling to determine original dcm file from anonymised path #1280

Open
rkm opened this issue Aug 29, 2022 · 4 comments
Open

Tooling to determine original dcm file from anonymised path #1280

rkm opened this issue Aug 29, 2022 · 4 comments
Labels
priority/low type/enhancement New feature or request

Comments

@rkm
Copy link
Member

rkm commented Aug 29, 2022

When investigating issues with an anonymised file in an extraction, it is often useful to review the original file for comparison. This is currently difficult to do as there is no direct link from the anonymised file back to the source file.

A tool, or a new application in the smi binary, could achieve this by looking-up the original path:

  • Either in the CohortPackager database for the extraction, or
  • in the metadata database
@rkm rkm added the type/enhancement New feature or request label Aug 29, 2022
@tznind
Copy link
Contributor

tznind commented Aug 29, 2022

I think the metadata database would be most powerful. That way it could support identifiable UID or anonymous UID and it wouldn't have to rely on an image having been extracted to be able to look it up.

That would enable answering other use cases like 'for this image in the SR NLP db / mongodb, is it in relational too? or not'

@tznind
Copy link
Contributor

tznind commented Aug 29, 2022

Nothing stopping it drawing info from both though.

@howff
Copy link
Contributor

howff commented Mar 15, 2023

At the moment I've just got a big text file of filenames which I grep ;-)

Another method might be to see if MongoDB can give you a list of keys in the index (by quickly reading the index rather than slowly reading the database), which you could then grep. If it only stores hashes then this won't work.

Another method might be to see if MongoDB can create a computed index, you could create a new index called FileName being computed from Basename(dicomFilePath). Postgres has support for computed indexes, maybe MongoDB does too. Then you could replace the -an.dcm in the anonymised filename and look up the result in the computed index.

Unless I've completely misunderstood what you mean by "metadata database", were you referring to one of the mysql or sql-server databases?

@howff
Copy link
Contributor

howff commented Aug 15, 2023

Unless I'm mistaken the anonymised path ends with the SOPinstanceUID plus -an.dcm so adding a MongoDB index on SOPinstanceUID would help immensely. Could also add study and series ids?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/low type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants