Attribute aggregation and transformation #31

glitt13 · 2024-11-15T00:00:16Z

An approach to aggregate and transform existing attribute data to create new attribute data.

Additions

xssa_attrs_tform.yaml: The example configuration file describing how variables are aggregated and transformed.
fs_tfrm_attrs.py : this is the main processing script that calls functions inside tfrm_attr.py
tfrm_attr.py : contains new functions for the fs_proc package
test_tfrm_attr.py: unit tests
fs_attrs_miss.R: Reads in missing comid-attributes file sometimes generated during fs_tfrm_attrs.py and attempts to find missing attribute data. In case missing attribute data exist, this Rscript is called inside the fs_tfrm_attrs.py to attempt to acquire data and use it for attribute transformation.

Removals

Changes

Converted scripts/config/attr_gen_camels.R from hard-coding into a generalizable form that uses the config file scripts/config/attr_gen_camels_config.yaml

Testing

Unit testing with test_tfrm_attr.py has been challenging to implement under a normal unittest package approach owing to a mysterious dask.dataframe as dd error. Implemented a work-around that partially tests this package by nixing most instances of using classes.

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Windows
Linux
Browser

Accessibility

Keyboard friendly
Screen reader friendly

Other

…s comid of NA

…ndlist

…ection() into standard processing

…script

…list

…ribute transforms

…at of comid

…tion

…package version.

…rmation functions

…t transformation script's documentation

…at strings just-in-case user doesn't use f'{dir_base}'

…ssing comids or variables have been identified, else write message that there could be an issue in the logic

…r.hydfab

…add documentation to transformation config file

…tems

…expected column name from a list of possible colnames

…f missing attributes identified

…bute-comid pairings

Ben-Choat

I've left some comments, mostly (or maybe all) related to formatting and commenting. I did not execute this code or run any tests. I've approved, but recommend reviewing my comments. Let me know if you have any questions.

There are several dozen functions available in RaFTS. Getting those documented in a common location will be important for enabling users to pick it up and use it without scouring through code.

Ben-Choat · 2024-11-25T16:00:28Z

pkg/fs_algo/fs_algo/fs_algo_train_eval.py

I recommend refactoring long commands like "dir_std_base = list([x for x in self.attr_config['file_io'] if 'dir_std_base' in x][0].values())[0].format(dir_base=dir_base, home_dir=home_dir)". Breaking it down into multiple lines/commands would be helpful in terms of readability.

Also, if we are attempting to go with PEP8 compliance, PEP8 typically recommends lines less than 79 characters. The line above is 150 characters, which typically indicates it should be broken into multiple lines not only for formatting but also for interpretability.

Ben-Choat · 2024-11-25T16:03:51Z

pkg/fs_algo/fs_algo/tests/test_tfrm_attr.py

+        mock_rglob.assert_called_once_with("*67890*")
+        mock_read_parquet.assert_not_called()
+
+


Recommend remving the commented code, or if we are leaving it for potential use later, adding a comment to indicate that.

Ben-Choat · 2024-11-25T16:08:52Z

pkg/fs_algo/fs_algo/tests/test_tfrm_attr.py

Missing function docstrings with several functions. Additional commenting that cleary explains what each function does, and why each test is perfored would be helpful to lead the user/coder/reader/reviewer through the code.

Ben-Choat · 2024-11-25T16:13:38Z

pkg/fs_algo/fs_algo/tfrm_attr.py

+        raise ValueError("Expecting path to file containing comids to be csv or parquet file")
+    return df
+
+


"Likely" is a bit confusing in the comments in this function. Does it mean you are not sure what is actually happening?

Ben-Choat · 2024-11-25T16:16:51Z

pkg/fs_algo/fs_algo/tfrm_attr.py

+                # Determine which column identifies the comids in a given metadata file
+                loc_id_col = [x for x in loc_id_cols if x in df_meta.columns]
+                if len(loc_id_col) != 1:
+                    raise ValueError("Could not find any of the location ID " +


PIP8 compliance would have us include the '+' at the begining of the following line oppoosed to the end of the current line. Also, since each of these strings is within ValueError() paranthesis, the "+" are not needed.

Ben-Choat · 2024-11-25T16:34:11Z

pkg/proc.attr.hydfab/R/proc_attr_grabber.R

@@ -15,6 +15,90 @@ library(data.table)
 library(pkgcond)


Probably want to check if packages are installed, and install them if not.

Ben-Choat · 2024-11-25T16:37:36Z

pkg/proc.attr.hydfab/R/proc_attr_grabber.R

+  Retr_Params <- base::list(paths = base::list(
+    # Note that if a path is provided, ensure the
+    # name includes 'path'. Same for directory having variable name with 'dir'
+    dir_db_hydfab=dir_db_hydfab,


consistency in formatting ... name = value, or name=value. I think most R standards suggest spaces around "=".

Ben-Choat · 2024-11-25T16:39:52Z

pkg/proc.attr.hydfab/R/proc_attr_grabber.R

+  }
+
+  path_missing_attrs <- std_miss_path(Retr_Params$paths$dir_db_attrs)
+  df_miss <- utils::read.csv(path_missing_attrs,)


I don't think the comma is needed in utils::read.csv(xxxx,). Not sure if it will matter in terms of functioning.

Ben-Choat · 2024-11-25T16:40:34Z

pkg/proc.attr.hydfab/R/proc_attr_grabber.R

+
+  path_missing_attrs <- std_miss_path(Retr_Params$paths$dir_db_attrs)
+  df_miss <- utils::read.csv(path_missing_attrs,)
+  if(nrow(df_miss)>0){


suggest adding spaces around operators like > and = in the following chunk.

Ben-Choat · 2024-11-25T16:42:28Z

scripts/config/attr_gen_camels.R

@@ -28,32 +40,40 @@ main <- function(){
    lapply( function(x) gsub(pattern = "Gage_", replacement = "",x=x)) |>
    unlist()

-  utils::write.table(gage_ids,glue::glue('{home_dir}/noaa/camels/gagesII_wood/camels_ii_gage_ids.txt'),row.names = FALSE,col.names = FALSE)


in general, arguments to a function should be separated by a space after each ','... e.g., write.table(xxxx, row...., col...)

glitt13 added 24 commits November 1, 2024 12:49

Add alternate comid retrieval via sf geometry in case nwissite return…

7c0a582

…s comid of NA

merge upstream/main

44e1182

fix: add gage_id inside each loc_attrs df; fix: set fill=TRUE for rbi…

8a937ce

…ndlist

fix: add usgs_vars sublist to Retr_Params

3951059

feat: add a format checker on Retr_Params

8c7ed2c

feat: add attribute variable name checker, incorporate check_attr_sel…

339279c

…ection() into standard processing

feat: developing approach to transform attributes

df57202

feat: add cmd/config file capability to retrieving camels attributes …

e7d5e0f

…script

fix: update script to work with return of a data.table rather than a …

f5b01b7

…list

fix: address path/glue format issues

34efc2e

refactor: negligible change

96b01a9

feat: add parquet file read option based on check for comid in filename

3a7d4c1

doc: update fs_read_attr_comid documentation based on read_type

218eba0

doc: update yaml config files to jive with latest developments in att…

08b867b

…ribute transforms

feat: core functionality that aggregates & transforms attributes

a9b215e

refactor: move config file read out of for-loop; fix: ensure str form…

d61c9c2

…at of comid

fix: add error if Null vals returned following aggregation/transforma…

2ec53f1

…tion

feat: create file listing needed comid-attributes pairings

e2d79c1

doc: describe steps in creating transformed attributes; feat: update …

6cf7a7b

…package version.

feat: add attribute generation script for camels catchments

5d288c3

fix: remove deprecated wrapper function from tfrm_attr

c9bdad6

fix: resolve merge conflicts

35f0663

fix: change dask dataframe to eager evaluation

0ea2a92

feat: partially-created unit tests corresponding to attribute transfo…

329a8e1

…rmation functions

glitt13 requested a review from Ben-Choat November 15, 2024 00:00

glitt13 added 5 commits November 15, 2024 07:58

feat: convert missing comid/attrs scripts into functions; doc: augmen…

3e0f378

…t transformation script's documentation

fix: add in home_dir as optional part of attr config's directory form…

67398a3

…at strings just-in-case user doesn't use f'{dir_base}'

fix: add logic on whether a warning prints after first checking if mi…

b14c63a

…ssing comids or variables have been identified, else write message that there could be an issue in the logic

feat: add attribute config file parser function to R package proc.att…

60b776c

…r.hydfab

fix: address undefined objects in attr_cfig_parse

02de2b0

glitt13 added 10 commits November 18, 2024 12:47

fix: remove duplicated attr_cfig_parse from package file

33d11ec

feat: create missing attributes finder wrapper function

2c40272

doc: update descriptive documentation for fs_attrs_miss_wrap()

5ee6d28

feat: add the missing attributes Rscript and wrapper documentation

b8ff3cb

fix: remove items in transformation config file no longer used; doc: …

5b57a09

…add documentation to transformation config file

cherry-pick transform config file doc updates and remove deprecated i…

d00fa8b

…tems

fix: patch the attribute metadata comid column read by searching for …

21a41a7

…expected column name from a list of possible colnames

feat: add Rscript call that attempts to retrieve missing attributes i…

f2dfdd1

…f missing attributes identified

doc: add printout explaining Rscript called to retrieve missing attri…

7ada133

…bute-comid pairings

merge missing comid-attribute grabber functionality

93d0655

Ben-Choat approved these changes Nov 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attribute aggregation and transformation #31

Attribute aggregation and transformation #31

glitt13 commented Nov 15, 2024 •

edited

Loading

Ben-Choat left a comment

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

Ben-Choat Nov 25, 2024

		mock_rglob.assert_called_once_with("67890")
		mock_read_parquet.assert_not_called()

		raise ValueError("Expecting path to file containing comids to be csv or parquet file")
		return df

Attribute aggregation and transformation #31

Are you sure you want to change the base?

Attribute aggregation and transformation #31

Conversation

glitt13 commented Nov 15, 2024 • edited Loading

Additions

Removals

Changes

Testing

Screenshots

Notes

Todos

Checklist

Testing checklist

Target Environment support

Accessibility

Other

Ben-Choat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glitt13 commented Nov 15, 2024 •

edited

Loading