Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix so Export tool does not return duplicate data when storage contains multiple experiments with same ensemble names #8587

Merged
merged 8 commits into from
Sep 3, 2024

Conversation

dafeda
Copy link
Contributor

@dafeda dafeda commented Aug 29, 2024

Issue
Resolves #8555

Have to stop using storage.get_ensemble_by_name as ensemble names are not unique across experiments.

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure tests pass locally (after every commit!)

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@dafeda dafeda changed the title Commitin Fix so Export Data does not return duplicate data Aug 29, 2024
@dafeda dafeda changed the title Fix so Export Data does not return duplicate data Fix so Export Data does not return duplicate data when storage contains multiple experiments with same ensemble names Aug 29, 2024
@dafeda dafeda added the release-notes:bug-fix Automatically categorise as bug fix in release notes label Aug 29, 2024
@dafeda dafeda self-assigned this Aug 29, 2024
@dafeda dafeda changed the title Fix so Export Data does not return duplicate data when storage contains multiple experiments with same ensemble names Fix so Export tool does not return duplicate data when storage contains multiple experiments with same ensemble names Aug 29, 2024
@dafeda dafeda force-pushed the csv-export-get-ensemble-second-attempt branch from b16b814 to 2b4f1c0 Compare August 29, 2024 10:13
@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2024

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.86%. Comparing base (e9c4a92) to head (7f6a037).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/ert/gui/ertwidgets/listeditbox.py 90.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8587      +/-   ##
==========================================
+ Coverage   90.80%   90.86%   +0.05%     
==========================================
  Files         339      339              
  Lines       20845    20866      +21     
==========================================
+ Hits        18929    18960      +31     
+ Misses       1916     1906      -10     
Flag Coverage Δ
gui-tests 75.76% <92.59%> (+0.01%) ⬆️
integration-tests 53.73% <22.22%> (-0.02%) ⬇️
unit-tests 68.70% <22.22%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dafeda dafeda force-pushed the csv-export-get-ensemble-second-attempt branch 3 times, most recently from 882805a to d6c27d3 Compare August 30, 2024 12:15
@xjules
Copy link
Contributor

xjules commented Aug 30, 2024

PR title suggestion: Prevent exporting duplicate data when storage contains multiple experiments with same ensemble names

)

data = pandas.concat([data, ensemble_data])
except KeyError as exc:
Copy link
Contributor

@xjules xjules Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where this exception belongs to. Maybe where the ensemble is loaded: ensemble = self.storage.get_ensemble(ensemble_id)? Since previously:

 try:
                ensemble = self.storage.get_ensemble_by_name(ensemble)
            except KeyError as exc:
                raise UserWarning(f"The ensemble '{ensemble}' does not exist!") from exc


df = pd.read_csv(file_name)
# Make sure data is not duplicated.
assert df.iloc[0]["COEFFS:a"] != df.iloc[20]["COEFFS:a"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires some explanation. I guess it should be 10 rows of 1 run experiment and 10 rows with another run and this will make sure that the values are not duplicate? But then we should firstly assert that both runs have the same experiment name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is not beautiful.
I've pushed an update that is hopefully easier to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thingy: assert exp1_name==exp2_name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I've pushed a fix.

Copy link
Contributor

@xjules xjules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Just had some smaller comments.

Happens when there are multiple experiments that have ensembles with same names.
@dafeda dafeda force-pushed the csv-export-get-ensemble-second-attempt branch from d6c27d3 to 3686bbe Compare September 2, 2024 10:21
@dafeda dafeda force-pushed the csv-export-get-ensemble-second-attempt branch from 3686bbe to d669a13 Compare September 2, 2024 10:24
Copy link
Contributor

@xjules xjules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR @dafeda ! 🚀 Could you squash the commits?

@dafeda dafeda merged commit e1b0f3e into equinor:main Sep 3, 2024
35 checks passed
@dafeda dafeda deleted the csv-export-get-ensemble-second-attempt branch September 3, 2024 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:bug-fix Automatically categorise as bug fix in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export data tool returns duplicated data when multiple experiments have ensembles with the same names
3 participants