-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attribute aggregation and transformation #31
base: main
Are you sure you want to change the base?
Conversation
…ection() into standard processing
…ribute transforms
…rmation functions
…t transformation script's documentation
…at strings just-in-case user doesn't use f'{dir_base}'
…ssing comids or variables have been identified, else write message that there could be an issue in the logic
…add documentation to transformation config file
…expected column name from a list of possible colnames
…f missing attributes identified
…bute-comid pairings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some comments, mostly (or maybe all) related to formatting and commenting. I did not execute this code or run any tests. I've approved, but recommend reviewing my comments. Let me know if you have any questions.
There are several dozen functions available in RaFTS. Getting those documented in a common location will be important for enabling users to pick it up and use it without scouring through code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend refactoring long commands like "dir_std_base = list([x for x in self.attr_config['file_io'] if 'dir_std_base' in x][0].values())[0].format(dir_base=dir_base, home_dir=home_dir)". Breaking it down into multiple lines/commands would be helpful in terms of readability.
Also, if we are attempting to go with PEP8 compliance, PEP8 typically recommends lines less than 79 characters. The line above is 150 characters, which typically indicates it should be broken into multiple lines not only for formatting but also for interpretability.
mock_rglob.assert_called_once_with("*67890*") | ||
mock_read_parquet.assert_not_called() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend remving the commented code, or if we are leaving it for potential use later, adding a comment to indicate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing function docstrings with several functions. Additional commenting that cleary explains what each function does, and why each test is perfored would be helpful to lead the user/coder/reader/reviewer through the code.
raise ValueError("Expecting path to file containing comids to be csv or parquet file") | ||
return df | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Likely" is a bit confusing in the comments in this function. Does it mean you are not sure what is actually happening?
# Determine which column identifies the comids in a given metadata file | ||
loc_id_col = [x for x in loc_id_cols if x in df_meta.columns] | ||
if len(loc_id_col) != 1: | ||
raise ValueError("Could not find any of the location ID " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PIP8 compliance would have us include the '+' at the begining of the following line oppoosed to the end of the current line. Also, since each of these strings is within ValueError() paranthesis, the "+" are not needed.
@@ -15,6 +15,90 @@ library(data.table) | |||
library(pkgcond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to check if packages are installed, and install them if not.
Retr_Params <- base::list(paths = base::list( | ||
# Note that if a path is provided, ensure the | ||
# name includes 'path'. Same for directory having variable name with 'dir' | ||
dir_db_hydfab=dir_db_hydfab, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency in formatting ... name = value, or name=value. I think most R standards suggest spaces around "=".
} | ||
|
||
path_missing_attrs <- std_miss_path(Retr_Params$paths$dir_db_attrs) | ||
df_miss <- utils::read.csv(path_missing_attrs,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the comma is needed in utils::read.csv(xxxx,). Not sure if it will matter in terms of functioning.
|
||
path_missing_attrs <- std_miss_path(Retr_Params$paths$dir_db_attrs) | ||
df_miss <- utils::read.csv(path_missing_attrs,) | ||
if(nrow(df_miss)>0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest adding spaces around operators like > and = in the following chunk.
@@ -28,32 +40,40 @@ main <- function(){ | |||
lapply( function(x) gsub(pattern = "Gage_", replacement = "",x=x)) |> | |||
unlist() | |||
|
|||
utils::write.table(gage_ids,glue::glue('{home_dir}/noaa/camels/gagesII_wood/camels_ii_gage_ids.txt'),row.names = FALSE,col.names = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, arguments to a function should be separated by a space after each ','... e.g., write.table(xxxx, row...., col...)
An approach to aggregate and transform existing attribute data to create new attribute data.
Additions
Removals
Changes
Testing
test_tfrm_attr.py
has been challenging to implement under a normal unittest package approach owing to a mysteriousdask.dataframe as dd
error. Implemented a work-around that partially tests this package by nixing most instances of using classes.Screenshots
Notes
Todos
Checklist
Testing checklist
Target Environment support
Accessibility
Other