-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAS-2232 - Functionality added to support SMAP L3 products #15
Conversation
…emoved duplicate method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sudha, There's a lot here to review and unfortunately I haven't worked with HOSS directly before. I tried to look through what I could and make decent suggestions.
But for sure, you need to test your changes. When you run the tests a coverage report is generated. Before your changes the results were:
Test Coverage Estimates
Name Stmts Miss Cover
--------------------------------------------------
Hoss/dimension_utilities.py 156 2 99%
hoss/spatial.py 61 0 100%
--------------------------------------------------
TOTAL 668 8 99%
And after they dropped considerably:
Test Coverage Estimates
Name Stmts Miss Cover
--------------------------------------------------
hoss/dimension_utilities.py 242 65 73%
hoss/spatial.py 70 8 89%
--------------------------------------------------
TOTAL 771 81 89%
It is very difficult to be able to understand the changes when I can't look at test and see what a function was supposed to do. Likewise, the function comments were not updated to describe the new functionality. Hopefully I'm not confusing anything with my lack of familiarity. I will defer to Owen if there's differences of opinion on what should be done.
Final test instructions assume a strong understanding of the problem you were solving, but I was able to eventually run the two test URLs and get output. Should the output files be geolocated somehow? Because they open in panoply, but aren't geo-2d, just 2d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sudha-murthy for putting a lot of effort into this PR!
This review is not 100% thorough. I think there are a few bigger things here that need to be addressed, and that will make it easier to digest in full:
- I think the new functionality to deal with overriding dimensions should not just go in the
get_projected_x_y_ranges
. By doing so, it's adding complexity to that function, which makes things harder to understand overall. I also think it is forcing you to do some things that would otherwise be unnecessary - for example, I don't think you need to write the variables to the NetCDF-4 file, you just need to ask a function for 1-D dimension variables and spit out 1-Dnumpy
arrays. - I'm really worried about the use of
set
objects to contain overriding dimensions. The ordering of dimensions is important, and must match the underlying array axes. I suspect this is the reason you are having issues with 3-D variables (it's certainly part of the issue). - A lot of the additional code is written in ways that adds high levels of cyclomatic complexity. I've suggested a few ways to cut that down, but it's worth taking a step back and working out what is common to code branches and what is truly different. Try to minimise what has to go in an
if
/elif
/else
, and then make sure each branch gives you the same output. Theadd_index_ranges
function is perhaps the area of most concern. - There are a few places where the code branches and only assigns variables in one branch which are then used after the branching is finished. This will cause errors when something skips that block and later tries to access a non-existent variable.
- This PR has no new or updated unit tests, either for new functions of new branches of code. I know I'm religious about these things, but they are absolutely needed. Particularly on a PR of this scope. There are a bunch of changes, and they are complicated and hard to understand. Not only do tests make sure things work, reading them makes it easier to understand how the code flows.
- There are a lot of places where code has changed significantly, but the documentation strings describing the functions have not. They should definitely be updated to keep in sync with what's going on.
Sorry for all the comments, but hopefully they are helpful!
hoss/dimension_utilities.py
Outdated
@@ -55,6 +59,42 @@ def is_index_subset(message: Message) -> bool: | |||
) | |||
|
|||
|
|||
def get_override_projected_dimensions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not include any changes to unit tests. Each new function you are adding (or new branch of code within an existing conditional block) needs unit testing that hits every branch of the code within a function. I've not reviewed any of the code changes in detail yet, but this lack of tests is an absolute blocker for me to approve the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Owen. Will add the unit tests.
hoss/dimension_utilities.py
Outdated
if variable.dimensions == []: | ||
override_dimensions = get_override_dimensions(varinfo, [variable_name]) | ||
if len(override_dimensions) > 0: | ||
for override in reversed(list(override_dimensions)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you reversing things here? The ordering should be preserved. If the order of override dimensions does not reflect the ordering of array axes, then that needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revisit that. I had a problem with fixing the override dimensions list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes are in commit 756f7c0
@@ -422,22 +559,48 @@ def add_index_range( | |||
|
|||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this function: READ THIS COMMENT FIRST:
I think this is really overcomplicated. Fundamentally, all this function is doing is getting a list of dimensions for a variable. Then it is looking in a cache to see if there is an index range for any of the dimension names, before using that cache value to tack something on the end of a string.
Maybe I'm being naive, but it feels like really all you need is something to determine the correct list of variable dimensions, then all the rest of the logic (looking in the cache and appending strings to the end of the variable name) is the same. That latter stuff 100% does not need to be in the duplicated in multiple condition branches. It's making this function unnecessarily hard to read.
The other, really important comment: I am super suspicious of the bit where you are needing to reverse
the order of the dimensions list. However that is derived, it should be reliably in the ordering of the variable axes. I'm wary that what this means is that you have found that for your particular use-case the "random" ordering in a set is predictable for the pseudo-dimensions you have for SMAP L3, and you can coincidentally impose the order you need coincidentally by reversing the set. I really think dimensions should not be passed around in sets, because you need that ordering. I strongly suspect this is the root cause of your issues with 3-D variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ordering and the shape are things I need to get from varinfo. I dont have that information. Updated DAS-2241 for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rephrasing a different comment: I'm very uncomfortable with the thought of merging code with known problems into main
. Going back to one of the things mentioned in the TRT breakout last week:
Teams ought to have a high-quality, maintainable baseline without known defects (particularly in new features) that is deployable to the production environment at all times.
Instead of making a massive PR that does 80% of a lot of stuff, this probably should be submitted piecemeal, with each piece being made 100% solid in a series of smaller PRs.
If you and David feel we're at a point that we can't break the changes down in that way, then a compromise might be to update this PR to merge into a feature branch. Then once all of the multiple PRs are merged into the feature branch, the feature branch can be merged into main
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think terminology has confused some of this discussion. The question of reversing is not one of reversing dimensions, but the lat/lon variables coming from the Coordinates attribute. The recommendation here is that the get_coordinates method itself should return in a standard order (reversed in case), based upon the variable name - and not using reverse as shown here.
I don't think this is a case of "Known Issue".
We are planning to release a version for NSIDC to start testing with, which may not handle the 3D cases or other issues, but this release should not break any existing use cases, and should not truly break, but simply not handle as desired. Incremental feature release is a tenet of agile development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes are in commit 756f7c0
I have simplified the method. The order for SMAP is 'projected_y', 'projected_x'. The override section of the code is only used by SMAP at the moment. It can be generalized if we can get that order of dimensions from varinfo. I am not sure if the order of dimensions is used for other products currently handled by hoss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of sets for required_dimensions is based on how it is returned by varinfo and how it was originally in hoss before my changes. the bounds update requires the dimensions to be a set , It fails for a list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a difference between a set of all dimensions for all required variables (as used in the prefetch function), which aggregates all Variable.dimensions
attributes (which individually are lists), and the dimensions on an individual variable. Variable.dimensions
is not a set for ordering purposes, it is a list.
With regards to bounds
- we know that the pseudo-dimensions won't have a bounds
attribute, so you might be better to not try to find bounds variables for them. Then you'll avoid any compatibility issues there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The add_index_range is a lot simpler now
A couple of quick thoughts:
I agree that this PR is busy with a lot of comment. I'd recommend that @sudha-murthy picks a few of the comments in the PR to address at a time, pushes a commit up with them, and indicates in the comments that she has addressed them (with a link to the commit). Then the person making the comment can decide whether to mark it as resolved. Resolving the comments will make the PR look a lot less cluttered, and allow us to all focus on the remaining pieces.
Yup - agreed. |
hoss/coordinate_utilities.py
Outdated
) -> bool: | ||
""" | ||
returns the list of required variables without any | ||
dimensions | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two quick things here:
- The return type in the type hint is wrong: it should be
set[str]
notbool
. - It would probably be clearer in the documentation string to say "set of required variables" not "list of required variables". (I'm guessing you meant list in a generic sense, rather than the Python type, but it's a little ambiguous)
hoss/coordinate_utilities.py
Outdated
|
||
|
||
def get_variables_with_anonymous_dims( | ||
varinfo: VarInfoFromDmr, required_variables: set[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: It probably isn't necessary in this function to refer to the variables as required_variables
. That is a piece of information outside of the scope of this function, maybe a better choice of name would be variable_names
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding some unit tests - I'll keep an eye out for more in future commits.
I resolved older comments I had left, which now look to be taken care of (thanks for that). There are still outstanding older items, and I've added some more comments on things that I looked at in more detail this time around. (To be honest, given the huge scale of this PR, it's hard to review it all in a single go, and so there are still bits I'm spotting only now, despite trying to go through the PR a couple of times)
hoss/coordinate_utilities.py
Outdated
if lat_arr.ndim > 1: | ||
col_size = lat_arr.shape[0] | ||
row_size = lat_arr.shape[1] | ||
if (lon_arr.shape[0] != lat_arr.shape[0]) or (lon_arr.shape[1] != lat_arr.shape[1]): | ||
raise IrregularCoordinateDatasets(lon_arr.shape, lat_arr.shape) | ||
if lat_arr.ndim and lon_arr.ndim == 1: | ||
col_size = lat_arr.size | ||
row_size = lon_arr.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit wonky:
The middle check that the array sizes are equal explicitly checks the size of the 0th and 1st axes of the array. But after that you have a condition for whether the arrays only have one dimensions. This means one of two things:
- Either the last bit with
lat_array.ndim
andlon_arr.ndim
are both 1 will never get reached (because they are always 2-D or higher) - The coordinate arrays could be 1-D, and then the check for
lon_arr.shape[1] != lat_arr.shape[1]
will raise an exception, because theshape
tuple doesn't have enough elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is corrected.
hoss/coordinate_utilities.py
Outdated
row_size = lat_arr.shape[1] | ||
if (lon_arr.shape[0] != lat_arr.shape[0]) or (lon_arr.shape[1] != lat_arr.shape[1]): | ||
raise IrregularCoordinateDatasets(lon_arr.shape, lat_arr.shape) | ||
if lat_arr.ndim and lon_arr.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition isn't doing what I believe you think it is.
My guess is you are trying to check if both lat_arr
and lon_arr
have 1 dimension. What's really happening is that you are checking if lat_arr.ndim
has a "truthy" value (so isn't 0, where it's an integer), and then you are checking if lon_arr.ndim == 1
. If it helps, the checks are really:
if (lat_arr.ndim) and (lon_arr.ndim == 1):
I think what you are trying to do is:
if lat_arr.ndim == 1 and lon_arr.ndim == 1:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that. I think it was a typo. I corrected it
hoss/coordinate_utilities.py
Outdated
|
||
""" | ||
override_variable = varinfo.get_variable(variable_name) | ||
projected_dimension_name = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would fit much better with the rest of the style of the code if you used else
for these sorts of default values, instead of declaring them and then overriding them if a condition is met.
Yes, it's a personal preference, but it's consistent with the rest of the code (which is the more important thing here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I had to declare it above as default value to keep the variable in scope for the return call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you've understood my comment - you have two choices here: declare it first, or declare it in an else
. I am asking you to be consistent to the rest of the style of the repository and put the default value in an else
statement.
To avoid two else
statements, you could restructure the logic below as:
if override_variable is not None and override_variables.is_latitude():
projected_dimension_name = 'projected_y'
elif override_variable is not None and override_variable.is_longitude():
projected_dimension_name = 'projected_x'
else:
projected_dimension_name = ''
I still have two other concerns here:
- Returning an empty string seems weird.
- The projected dimension name should be distinct in the case of two grids. Using just
projected_x
andprojected_y
will cause clashes for those SMAP L3 products with both global and polar grids. (This must be resolved before merging this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- corrected the names to add the group name (in next commit)
- simplified the structure to not individually check for lat/lon (in next commit)
- about returning an empty string..-technically it should never be called from our code unless we have the coordinates. I could throw an exception - just seemed overkill
row_size = 0 | ||
col_size = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one example but there are quite a few places in this code that I think default values are being used instead of raising a genuine exception. If none of the conditions below are met, then it would be much better to catch the issue now and raise a user-friendly exception before later code tries to use these values and finds it can't do what it needs to (and raises some unintelligible exception to the end user).
I think the question to ask with a bunch of these functions is: if they fall back on the default values, can the rest of the code work using those return values. I think in this case (and a few others) the honest answer is no.
hoss/coordinate_utilities.py
Outdated
lat_col = lat_arr[:, 0] | ||
lon_row = lon_arr[0, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is making an assumption that latitude and longitude always represent the same axes in an array for all collections. We know that isn't true: for example the GPM_3IMERGHH collection from GES DISC have things the wrong way around (time, lon, lat).
It would be better to take both arrays and check one row and one column of each to find which array is varying along each dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check both to get the same index values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your answer does not explain what will happen for collections when the ordering is reversed. I'm about to write an big old essay on the way to do this that will avoid any assumptions on latitude and longitude ordering (which I genuinely think will lead to problems).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new method gets row and column on both lat/lon datasets. Will add a test for lat/lon with both types of ordering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated - de350b6
def test_get_x_y_index_ranges_from_coordinates( | ||
self, | ||
mock_get_x_y_extents, | ||
mock_get_dimension_index_range, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good test!
You definitely need either one more that requests SMAP L3 data for multiple grids, or this test could be updated to do so using a collection that has multiple grids.
# lat_arr = prefetch_dataset[self.latitude][:] | ||
# lon_arr = prefetch_dataset[self.longitude][:] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These commented out lines should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed - de350b6
lat_fill, | ||
lon_fill, | ||
) | ||
for actual, expected in zip(actual_geo_corners, expected_geo_corners): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things here:
- Why only
self.assertAlmostEqual
? You are defining the expected values, so you could define them to be exactly what you need. - (Possibly a personal preference)
zip
makes things a little more complicated. Instead you could do something like:
for index, expected_corner in enumerate(expected_geo_corners):
self.assertTupleEqual(actual_geo_corners[index], expected_corner)
If you stick with zip
, could you maybe tweak the variables in the loop so they are actual_corner
and expected_corner
. Just to make it easier when reading this test back. Thanks!
|
||
""" | ||
|
||
expected_result1 = np.array([0, 1, 2, 3, 4]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something of a theme here - there are a bunch of variable names in these tests that are really vague (expected_result1
, expected
, expected_result
, similar things with actual
). It would be clearer to the reader if the names were more specific. Something like expected_valid_indices
.
Unit tests are a secondary form of documentation for developers, and can be really informative. It's really helpful (to me at least 😅) if they are as easy to read as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done - de350b6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting to add in the tests! A recommendation for next time would be to couple the writing of a function with the accompanying tests, instead of leaving all the tests to the end. (I tend to write tests for functions just after writing the code for the function itself, because that way it's still fresh in my head, but also when I move on from that function, I feel confident that it works when I call it elsewhere)
if variable.dimensions: | ||
range_strings = get_range_strings(variable.dimensions, index_ranges) | ||
else: | ||
override_dimensions = get_override_projected_dimensions(varinfo, variable_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sudha-murthy - this is one of the things we discussed this morning, and I wanted to capture a comment on the PR so we don't lose it.
get_override_projected_dimensions
only returns a list of fake spatial dimensions for the variable. However, for variables that have more dimensions (e.g., 3-D variables), this list is returning a 2-element list for a numpy
array with 3 axes. This is causing the failure you reported with requests to OPeNDAP. When making requests to OPeNDAP each variable must follow one of two formats in the DAP4 constraint expression:
- Not include any index information for the variable. E.g.:
/variable_name
. - Including index information for all dimensions. If you want the full range, you can just indicate that with
[]
. So it's something like:/variable_name[][1:2][3:4]
Because the returned list of override dimensions only includes spatial information, this function is only constructing part of the list of index information (assuming the example from the second bullet point above is what you want, this function is actually just getting `/variable_name[1:2][3:4]).
The answer here is to use a function that retrieves something for all axes on the variable, and in the correct order. (The naming of the non spatial pseudo-dimensions doesn't matter, because all we're doing is seeing if there is information in the cache of index ranges. If not, the []
will be applied for that dimension)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the logic that is done for a variable with real dimensions:
for dimension in variable_dimensions:
dimension_range = index_ranges.get(dimension)
if dimension_range is not None and dimension_range[0] <= dimension_range[1]:
range_strings.append(f'[{dimension_range[0]}:{dimension_range[1]}]')
else:
range_strings.append('[]')
You can see that if the dimension name is not in the cache of index_ranges
, then []
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the override code uses that same method.. But there is no 'dimension name' for the third dimension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - it sounds like we're pretty much agreed on the problem here:
get_override_projected_dimensions
only returns a list of fake spatial dimensions for the variable. However, for variables that have more dimensions (e.g., 3-D variables), this list is returning a 2-element list for anumpy
array with 3 axes.
So the solution is probably to be retrieving the list of variable dimensions in a different way (specifically how is still being hotly debated in a number of places).
hoss/spatial.py
Outdated
crs = get_variable_crs(non_spatial_variable, varinfo) | ||
projected_x = 'projected_x' | ||
projected_y = 'projected_y' | ||
override_dimension_datasets = update_dimension_variables( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you be consistent with the rest of the code and use "variable" to refer to variables, and reserve "dataset" to refer to the netCDF4.Dataset
object. Using "datasets" to mean variables will cause confusion in the future. (I acknowledge that "dataset" is also often used to mean "variable" by other software, but it is not so far used that way in the HOSS code, due to the terminology of the netCDF4
Python package)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done - 0666248
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. I took a quick skim and there's one place remaining that uses dataset
instead of variable. It's the get_valid_indices_in_dataset
function. Otherwise, I think the language is now consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the update_dimension_variables is now called create_dimension_scales_from_coordinates based on David's feedback (will be in the new commits in the new branch)
the get_valid_indices_in_dataset does get the valid indices from the numpy array passed in. Not sure if we can call it get_valid_indices_in_variable...
Okay, this is me trying to capture my thoughts after our meeting yesterday (and mulling it over a bit more since): I think, in general, I am fine with the way in which the projected x and projected y index ranges are calculated. While I don't think the calculation of the corners is strictly necessary, as you just need any two elements (preferably more, but two is enough) in a single row/column to infer the resolution and extents of a grid dimension, once there are a full battery of unit tests up, it is a methodology that should successfully get the index ranges. For the sake of moving this monster PR along, I am fine to proceed with the corners methodology. I think there are three bigger pieces here that I'm holding out for (along with outstanding smaller items about general coding best practices):
Why do we need this now? the current assumption that there is a specific ordering of these variables (
(Note - I believe the solution below would account for both N-D science variables and also degenerate dimension sizes Likely this only matters in the retrieval of the index ranges in the How can we do this:
Just as a general point, the problem of dimension ordering (once you have the correct index ranges) in no way depends on what is a projected x coordinate and what is a projected y coordinate. It is entirely determined by which of those 2-D arrays varies along the a row and which varies up a column. Making an assumption that couples dimension ordering to which is x and which is y will inevitably break things for some future collection. (See GPM_3IMERGHH for an example where latitude and longitude are ordered opposite to how we would expect) |
I have changed the corners method to getting just any 2 valid points. the version 8 had fill values in the lower right corner |
you mean latitude varies over row/col versus longitude varies over row/column right? |
I think my latest commit has some more checks to move to the next row or next column. It is not yet perfect since the same valid points have to be on the latitude and longitude grids |
yes. will make sure I add that test case and verify and correct the uniqueness issues |
I suppose I was still in the mindset of projecting the whole latitude and longitude arrays, which still feels simpler. Given the corners approach, that's fair - the That said this indicated that the corners approach is losing the information we need to determine which projected spatial dimension corresponds to which dimension axes. I think this comment has actually made me realise the corners approach is insufficient to do what we need. We need the 2-D projected coordinate arrays ( |
Description
SMAP L3 does not have 1D dimension scales and grid mapping variable which is needed by HOSS to do spatial subsetting.
Methods added to override the missing grid mapping with overrides in the hoss_config.json and supporting methods.
Methods also added to use the 2D coordinate datasets to generate 1D dimension scales that could be used for to calculate
the index ranges to provide the spatial subsetted outputs
Jira Issue ID
DAS-2232
Local Test Steps
PR Acceptance Checklist
CHANGELOG.md
updated to include high level summary of PR changes.docker/service_version.txt
updated if publishing a release.