-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix so Export tool
does not return duplicate data when storage contains multiple experiments with same ensemble names
#8587
Fix so Export tool
does not return duplicate data when storage contains multiple experiments with same ensemble names
#8587
Conversation
Export tool
does not return duplicate data when storage contains multiple experiments with same ensemble names
b16b814
to
2b4f1c0
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8587 +/- ##
==========================================
+ Coverage 90.80% 90.86% +0.05%
==========================================
Files 339 339
Lines 20845 20866 +21
==========================================
+ Hits 18929 18960 +31
+ Misses 1916 1906 -10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
882805a
to
d6c27d3
Compare
PR title suggestion: Prevent exporting duplicate data when storage contains multiple experiments with same ensemble names |
) | ||
|
||
data = pandas.concat([data, ensemble_data]) | ||
except KeyError as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where this exception belongs to. Maybe where the ensemble is loaded: ensemble = self.storage.get_ensemble(ensemble_id)
? Since previously:
try:
ensemble = self.storage.get_ensemble_by_name(ensemble)
except KeyError as exc:
raise UserWarning(f"The ensemble '{ensemble}' does not exist!") from exc
src/ert/resources/workflows/jobs/internal-gui/scripts/gen_data_rft_export.py
Outdated
Show resolved
Hide resolved
|
||
df = pd.read_csv(file_name) | ||
# Make sure data is not duplicated. | ||
assert df.iloc[0]["COEFFS:a"] != df.iloc[20]["COEFFS:a"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires some explanation. I guess it should be 10 rows of 1 run experiment and 10 rows with another run and this will make sure that the values are not duplicate? But then we should firstly assert that both runs have the same experiment name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is not beautiful.
I've pushed an update that is hopefully easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more thingy: assert exp1_name==exp2_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I've pushed a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR! Just had some smaller comments.
Happens when there are multiple experiments that have ensembles with same names.
d6c27d3
to
3686bbe
Compare
3686bbe
to
d669a13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR @dafeda ! 🚀 Could you squash the commits?
Issue
Resolves #8555
Have to stop using
storage.get_ensemble_by_name
as ensemble names are not unique across experiments.When applicable