-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocessing to create SUMMA input #142
base: develop
Are you sure you want to change the base?
Conversation
Merge latest version of pysumma from develop branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is a lot of functionality here for creating the various input files for SUMMA from "raw" data sources. Because of that, this is a rather large pull request and I think we'll need a couple of rounds on it. I just went through and looked at some of the general architecture and style and thing that once we get that cleaned up it'll be easier to see how to make this "mesh" with the rest of the pysumma codebase a little bit more cleanly.
One thing that does concern me a bit is the inclusion of the SSURGO database. I would prefer not to maintain external datasets in this repo. That is something that users should download on their own.
@@ -8,6 +8,10 @@ | |||
from .force_file_list import ForcingList | |||
from . import utils | |||
from .calibration import Ostrich, OstrichParam | |||
from . import hydroshare_utils | |||
from .preprocessing import LocalAttributes, InitialCondition, ParamTrial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be good to have these not only live in preprocessing
but be more general classes which act as interfaces to these methods. Then we could do something like LocalAttributes.from_netcdf(filepath)
or LocalAttributes.from_tif(filepath)
which might end up being a bit cleaner. Let's also make sure these follow the conventions in the FileManager
class, so we should update ParamTrial
-> TrialParams
@@ -0,0 +1,152 @@ | |||
from osgeo import gdal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, but I think a better name would just be initial_condition.py
rather than create_initial_condition.py
so that we can have it be more general than just methods for making the initial conditions, but also for modifying them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for the create_local_attribute
and create_param_trial
ssurgo_soil_db = pkg_resources.resource_filename( | ||
__name__, 'meta/ssurgo_soils_db.csv') | ||
|
||
class InitialCondition(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some documentation in here, especially for the more complex functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you explain exactly what assumptions/datasets we're using to create this here?
gru_mukey_depth = gru_mukey_1_col.merge(SSURGO_soil_depth, on='MUKEY') | ||
gru_mukey_layer = gru_mukey_depth[['nSoil']] | ||
gru_mukey_layer['nSoil'].unique().max() | ||
gru_initial_cond_hru = gru_mukey_layer.assign(**{'scalarv': hru_ic["scalarv"], 'scalarSnowAlbedo': hru_ic["scalarSnowAlbedo"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of really long lines in this PR, could you work to split some of it so line length is < 100 characters long?
if fdr.loc[index_r, index_c] == 1 and index_r-1 >= 0 and index_c+1 >= 0: | ||
downHRUindex.loc[index_r, | ||
index_c] = hruid.loc[index_r-1, index_c+1] | ||
elif fdr.loc[index_r, index_c] == 2 and index_r-1 >= 0 and index_c >= 0: | ||
downHRUindex.loc[index_r, | ||
index_c] = hruid.loc[index_r-1, index_c] | ||
elif fdr.loc[index_r, index_c] == 3 and index_r-1 >= 0 and index_c-1 >= 0: | ||
downHRUindex.loc[index_r, | ||
index_c] = hruid.loc[index_r-1, index_c-1] | ||
elif fdr.loc[index_r, index_c] == 4 and index_r >= 0 and index_c-1 >= 0: | ||
downHRUindex.loc[index_r, | ||
index_c] = hruid.loc[index_r, index_c-1] | ||
elif fdr.loc[index_r, index_c] == 5 and index_r+1 >= 0 and index_c-1 >= 0: | ||
downHRUindex.loc[index_r, | ||
index_c] = hruid.loc[index_r+1, index_c-1] | ||
elif fdr.loc[index_r, index_c] == 6 and index_r+1 >= 0 and index_c >= 0: | ||
downHRUindex.loc[index_r, | ||
index_c] = hruid.loc[index_r+1, index_c] | ||
elif fdr.loc[index_r, index_c] == 7 and index_r+1 >= 0 and index_c+1 >= 0: | ||
downHRUindex.loc[index_r, | ||
index_c] = hruid.loc[index_r+1, index_c+1] | ||
elif fdr.loc[index_r, index_c] == 8 and index_r >= 0 and index_c+1 >= 0: | ||
downHRUindex.loc[index_r, | ||
index_c] = hruid.loc[index_r, index_c+1] | ||
elif fdr.loc[index_r, index_c] < 0 and index_r >= 0 and index_c+1 >= 0: | ||
downHRUindex.loc[index_r, index_c] = -9999 | ||
else: | ||
downHRUindex.loc[index_r, index_c] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what any of this means - could you provide either some comments about what these are or rename some of the variables/raw integers so that this is easier to understand?
if lulc_df['vegTypeIndex'].loc[index] == 11 and 12 and 13 and 14 and 15 and 16 and 17: | ||
lulc['vegTypeIndex'][index] = 1 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 22 and 24: | ||
lulc['vegTypeIndex'][index] = 4 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 21: | ||
lulc['vegTypeIndex'][index] = 5 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 23: | ||
lulc['vegTypeIndex'][index] = 7 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 32 and 33: | ||
lulc['vegTypeIndex'][index] = 9 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 41: | ||
lulc['vegTypeIndex'][index] = 11 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 42: | ||
lulc['vegTypeIndex'][index] = 13 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 43: | ||
lulc['vegTypeIndex'][index] = 15 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 51 and 52 and 53 and 54 and 72: | ||
lulc['vegTypeIndex'][index] = 16 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 31 and 62: | ||
lulc['vegTypeIndex'][index] = 17 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 61: | ||
lulc['vegTypeIndex'][index] = 18 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 82: | ||
lulc['vegTypeIndex'][index] = 20 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 81: | ||
lulc['vegTypeIndex'][index] = 21 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 84: | ||
lulc['vegTypeIndex'][index] = 22 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 71 and 74 and 75 and 76 and 77 and 83: | ||
lulc['vegTypeIndex'][index] = 23 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 91 and 92: | ||
lulc['vegTypeIndex'][index] = 24 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 73: | ||
lulc['vegTypeIndex'][index] = 27 | ||
else: | ||
pass | ||
|
||
elif option == "MODIFIED_IGBP_MODIS_NOAH": | ||
for index, raster in lulc_df.iterrows(): | ||
if lulc_df['vegTypeIndex'].loc[index] == 42: | ||
lulc['vegTypeIndex'][index] = 2 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 41: | ||
lulc['vegTypeIndex'][index] = 4 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 43: | ||
lulc['vegTypeIndex'][index] = 5 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 61 and 62: | ||
lulc['vegTypeIndex'][index] = 11 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 21 and 22: | ||
lulc['vegTypeIndex'][index] = 12 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 11 and 12 and 13 and 14 and 16 and 17: | ||
lulc['vegTypeIndex'][index] = 13 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 23 and 24 and 31 and 32 and 33: | ||
lulc['vegTypeIndex'][index] = 14 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 91 and 92: | ||
lulc['vegTypeIndex'][index] = 15 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 51 and 52 and 53 and 54: | ||
lulc['vegTypeIndex'][index] = 17 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 84: | ||
lulc['vegTypeIndex'][index] = 19 | ||
elif lulc_df['vegTypeIndex'].loc[index] == 71 and 72 and 73 and 74 and 75 and 76 and 77 and 81 and 82 and 83: | ||
lulc['vegTypeIndex'][index] = 20 | ||
else: | ||
pass | ||
return lulc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, just having these big chains of elif
s on raw integers is very mysterious. I have no idea what each of these stands for.
lulc = pd.DataFrame( | ||
np.zeros((lulc_df.shape[0], lulc_df.shape[1])), columns=['vegTypeIndex']) | ||
lulc = lulc.astype('int64') | ||
if option == "USGS" and "USGS-RUC": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic here is not what you want. The and "USGS-RUC"
part doesn't actually do anything. I think you mean if option == "USGS" or option == "USGS-RUC"
, which could also be written option in ["USGS", "USGS-RUC"]
.
pass | ||
return lulc | ||
|
||
def gru_local_attributes(self, gru_name, gru_num, dir_grassdata, hru_start_num, hru_area, contour_length=10, slopeTypeIndex=1, vegTypeIndex_option='USGS'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, a lot of very long lines here.
""" | ||
|
||
# Create a netCDF file | ||
initial_cond = Dataset(name, "w", format="NETCDF3_CLASSIC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to use xarray
here just to be internally consistent. Also, is there a reason you're using NETCDF3_CLASSIC
? If it's not a hard constraint I'd rather use something either newer, or (even better IMO) agnostic to the version.
@@ -0,0 +1,95 @@ | |||
library(rgrass7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my first look through it's not entirely clear where these are used. Could you point me to where these are called? I'd prefer not to introduce too many external scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This goes for all of the scripts in preprocessing/meta/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you just provide some sort of header comments on what these actually do?
Hi David and Andrew, |
(https://www.hydroshare.org/resource/303f64250a1d45b8b3c24031a4500781/)
Step_1_Download_RawData_of_CoweetaSub18.ipynb
Step_2_Delineate_watershed_and_extract_local_attributes_using_GRASS_GIS.ipynb
- For the step-2 notebook, we need GRASS-GIS, R-packages, and R-libraries. So Drew and I preconfigured these computational environments in CyberGIS-Jupyter for water.
Step_3_Create_Local_Attribute_CSV_file_with_GRUs_and_HRUs.ipynb
Step_4_Create_Initial_Condition_of_SUMMA_Input.ipynb
Step_5_Create_NetCDF_SUMMA_Input.ipynb
Step_6_Create_3hour_Forcing_using_Metsim.ipynb
Step_7_SUMMA_execution.ipynb