Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding feature: strain mapping locally #281

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

rtlortega
Copy link

This is a new feature for generating the file needed for running NPLinker called strain_mappings in local mode.

The src and test were modified in:

  • nplinker
    -> antismash_loader.py -> creating a dict for genome --> bgcs
    -> strain -> utils.py -> four new functions:
  1. extract_strain_metadata (for the creation of the json file)
  2. extract features metabolite id -> for making a strain --> features dictionary
  3. extract bgcs genome id -> for making a strain --> bgcs dictionary
  4. merge features and bgcs
  5. build a nice dictionary with the info needed for running NPLinker in local mode

I also added the testing functions:

  • test for each function

There is a notebook for running all the functions step by step: ~/nplinker/tests/unit/local_strain_mapping.ipynb
I would strongly suggest that the test/data information is updated with the correct information as antismash results generates. The current strain_mapping is incorrect as well as the folder generated in test/unit/data/antismash. I generated another folder where the data is correct, but I was not sure if adding that to the pull request yet.

@justinjjvanderhooft
Copy link

@rtlortega - are the antismash results different due to the use of a different version? I.e., what version did you use for the latest results?

@rtlortega
Copy link
Author

rtlortega commented Oct 29, 2024

@justinjjvanderhooft I haven't worked with antismash so far, so I can't tell. I didn't use antishmash results itself, I just re-arranged the naming of the folders as it should be, according to Dora and Annette's knowledge about it. I also know that the antismash folder results is incorrect because the stain_mappings in the test/data folder is wrong.

@justinjjvanderhooft
Copy link

justinjjvanderhooft commented Oct 29, 2024

@rtlortega well, it would be "wrong" if is not compatible with the version of antiSMASH used for the current code base of NPLinker, but it wouldn't be "wrong" if a newer version of antiSMASH has a different output than NPLinker expected. Anyways, in both case, it would be good to update the loading of the antiSMASH results to make it compatible....

@CunliangGeng CunliangGeng self-requested a review October 30, 2024 12:54
Copy link
Member

@CunliangGeng CunliangGeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Rosina, thanks for the PR. Before I review the details of the changes, let's first make sure all changes have followed required formats/styles as well as all tests have been added and passed. So, please fix the following issues first:

  1. Please update your changes to make sure the code style and static typing are good using the tool ruff and mypy. See related errors in this check: https://github.com/NPLinker/nplinker/actions/runs/11557614894/job/32277179283?pr=281

  2. I did not see any new unit test in the folder tests/unit for your new functions/methods. Please add unit tests for your changes, and make sure they have passed before pushing them to the PR.

Regarding your comments:

I would strongly suggest that the test/data information is updated with the correct information as antismash results generates. The current strain_mapping is incorrect as well as the folder generated in test/unit/data/antismash. I generated another folder where the data is correct, but I was not sure if adding that to the pull request yet.

The data/files in the tests/unit/data are independent with each other (as much as possible). So you don't have to care about the relationship between the antismash data and the strain_mappings.json, they are independent and are used for different unit tests.
You should add a new folder with new example data to test your new functions. Please remember: make the example data as simple/small as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

3 participants