Fix so `Export tool` does not return duplicate data when storage contains multiple experiments with same ensemble names #8587

dafeda · 2024-08-29T10:07:01Z

Issue
Resolves #8555

Have to stop using storage.get_ensemble_by_name as ensemble names are not unique across experiments.

PR title captures the intent of the changes, and is fitting for release notes.
Added appropriate release note label
Commit history is consistent and clean, in line with the contribution guidelines.
Make sure tests pass locally (after every commit!)

When applicable

When there are user facing changes: Updated documentation
New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
Large PR: Prepare changes in small commits for more convenient review
Bug fix: Add regression test for the bug
Bug fix: Create Backport PR to latest release

codecov-commenter · 2024-08-29T10:34:11Z

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.86%. Comparing base (e9c4a92) to head (7f6a037).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
src/ert/gui/ertwidgets/listeditbox.py	90.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8587      +/-   ##
==========================================
+ Coverage   90.80%   90.86%   +0.05%     
==========================================
  Files         339      339              
  Lines       20845    20866      +21     
==========================================
+ Hits        18929    18960      +31     
+ Misses       1916     1906      -10

Flag	Coverage Δ
gui-tests	`75.76% <92.59%> (+0.01%)`	⬆️
integration-tests	`53.73% <22.22%> (-0.02%)`	⬇️
unit-tests	`68.70% <22.22%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

xjules · 2024-08-30T12:53:44Z

PR title suggestion: Prevent exporting duplicate data when storage contains multiple experiments with same ensemble names

src/ert/gui/tools/export/export_panel.py

src/ert/resources/workflows/jobs/internal-gui/scripts/csv_export.py

xjules · 2024-08-30T13:09:46Z

src/ert/resources/workflows/jobs/internal-gui/scripts/csv_export.py

-            )
-
-            data = pandas.concat([data, ensemble_data])
+            except KeyError as exc:


Not sure where this exception belongs to. Maybe where the ensemble is loaded: ensemble = self.storage.get_ensemble(ensemble_id)? Since previously:

try: ensemble = self.storage.get_ensemble_by_name(ensemble) except KeyError as exc: raise UserWarning(f"The ensemble '{ensemble}' does not exist!") from exc

src/ert/resources/workflows/jobs/internal-gui/scripts/gen_data_rft_export.py

tests/unit_tests/gui/test_csv_export.py

xjules · 2024-08-30T13:32:21Z

tests/unit_tests/gui/test_csv_export.py

+
+    df = pd.read_csv(file_name)
+    # Make sure data is not duplicated.
+    assert df.iloc[0]["COEFFS:a"] != df.iloc[20]["COEFFS:a"]


This requires some explanation. I guess it should be 10 rows of 1 run experiment and 10 rows with another run and this will make sure that the values are not duplicate? But then we should firstly assert that both runs have the same experiment name?

I agree that this is not beautiful.
I've pushed an update that is hopefully easier to understand.

one more thingy: assert exp1_name==exp2_name

Good idea, I've pushed a fix.

xjules

Nice PR! Just had some smaller comments.

Happens when there are multiple experiments that have ensembles with same names.

xjules

Very nice PR @dafeda ! 🚀 Could you squash the commits?

dafeda changed the title ~~Commitin~~ Fix so Export Data does not return duplicate data Aug 29, 2024

dafeda changed the title ~~Fix so Export Data does not return duplicate data~~ Fix so Export Data does not return duplicate data when storage contains multiple experiments with same ensemble names Aug 29, 2024

dafeda added the release-notes:bug-fix Automatically categorise as bug fix in release notes label Aug 29, 2024

dafeda self-assigned this Aug 29, 2024

dafeda changed the title ~~Fix so Export Data does not return duplicate data when storage contains multiple experiments with same ensemble names~~ Fix so Export tool does not return duplicate data when storage contains multiple experiments with same ensemble names Aug 29, 2024

dafeda force-pushed the csv-export-get-ensemble-second-attempt branch from b16b814 to 2b4f1c0 Compare August 29, 2024 10:13

dafeda force-pushed the csv-export-get-ensemble-second-attempt branch 3 times, most recently from 882805a to d6c27d3 Compare August 30, 2024 12:15