Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-2236 - Updates to methods that calculate the points used to create the dimensions #18

Open
wants to merge 16 commits into
base: DAS-2208-SMAP-L3
Choose a base branch
from

Conversation

sudha-murthy
Copy link
Collaborator

Description

  • functions added to be able to do end toe nd test
  • updated functions that calculated the two sets lat/lon points to calculate the dimension arrays
  • A couple of unit tests need to be added
  • This does not support any test for ordering, multiple grids or 3D support

Jira Issue ID

DAS-2236

Local Test Steps

  • Make sure the unit tests pass
  • if you can test with harmony in a box with mask fill disabled - you can test the following url

http://localhost:3000/C1268452378-EEDTEST/ogc-api-coverages/1.0.0/collections/parameter_vars/coverage/rangeset?forceAsync=true&granuleId=G1268452388-EEDTEST&subset=lat(70%3A80)&subset=lon(-160%3A-150)&variable=Soil_Moisture_Retrieval_Data_AM%2Fstatic_water_body_fraction&skipPreview=true
and verify you can check the outputs in panoply

PR Acceptance Checklist

  • [X ] Jira ticket acceptance criteria met.
  • CHANGELOG.md updated to include high level summary of PR changes.
  • docker/service_version.txt updated if publishing a release.
  • [X ] Tests added/updated and passing.
  • Documentation updated (if needed).

hoss/coordinate_utilities.py Show resolved Hide resolved
hoss/coordinate_utilities.py Show resolved Hide resolved
hoss/coordinate_utilities.py Show resolved Hide resolved
hoss/coordinate_utilities.py Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/coordinate_utilities.py Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/dimension_utilities.py Outdated Show resolved Hide resolved
hoss/dimension_utilities.py Show resolved Hide resolved
hoss/dimension_utilities.py Show resolved Hide resolved
Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look, and I appreciate this PR being another more bite-sized piece of progress towards the end goal.

I think there are few themes to my comments:

  • You haven't added sufficient unit tests to provide thorough coverage of your changes. In most cases, you've got some tests for new functions, but not all. The tests you've added are generally good, but often are only testing the happy path. It's pretty important to also test realistic edge-cases, which isn't really done in this PR.
  • There are quite a lot of places where you are performing very similar tasks, but in some cases along a row, rather than up-and-down a column. Some of those tasks are basically the same thing (performing some operation on a 1-D slice of an array), so it would be cleaner to write a function to do something to a 1-D slice and then invoke it with the correct 1-D slice for either a row or a column.
  • I think generally, there are a few places where the documentation strings for each function could do with a bit more descriptive text. This will be the number one location that future developers try to read to understand what the code is meant to do.
  • General coding comments.

hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
for coordinate in coordinate_variables_list
if varinfo.get_variable(coordinate).is_latitude()
for coordinate in coordinate_variables
if varinfo.get_variable(coordinate) is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same comment for the similar change in the list comprehension for longitude_coordinate_variables)

Where do you catch any issues if there is a reference to a variable in a coordinates metadata attribute that isn't present in the file? If it's pointing to a latitude or longitude variable that isn't present in the file, at some point that's going to cause a failure when you try to access that variable. Is that gracefully handled somewhere?

(It's likely this is already nicely supported, but it's been a while since the previous PR and I just can't remember)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate comment - you have not added unit tests to prove this case is being handled as expected. (Needed both for latitude or longitude)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this method is first called before prefetch. it should not fail if the variables don't exist.
  • if it is present in the variable and not in the file - it will fail during prefetch. That is when it is first accessed from the file. - will check if I have a test on that
  • There is a unit test for the function - will add subtests if the variables don't exist

Copy link
Collaborator Author

@sudha-murthy sudha-murthy Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is present in the variable and not in the file - it will fail during prefetch. That is when it is first accessed from the file. - will check if I have a test on that

That makes sense. To test that, you'd have to mock a failed response from OPeNDAP. This might be a good place to look for an example of mocking a failed request.

There are unit tests to check for the above cases in line 205 of test_coordinate_utilities.py

That test is great, but it's not specifically for this function, it's for get_coordinate_array. The condition is present here in get_coordinate_variables, and so you need to write unit tests that isolate behaviour within this function (in addition to whatever already exists for other functions).

It's entirely plausible that get_coordinate_array could be updated in some way that means those unit tests you have a link to are removed. You would then lose the coverage of those test cases for get_coordinate_variables.

hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
hoss/coordinate_utilities.py Outdated Show resolved Hide resolved
tests/unit/test_coordinate_utilities.py Outdated Show resolved Hide resolved
tests/unit/test_coordinate_utilities.py Show resolved Hide resolved
tests/unit/test_dimension_utilities.py Show resolved Hide resolved
@D-Auty
Copy link

D-Auty commented Nov 22, 2024

I'll check in after Owen has had a chance to review. I've reviewed my suggestions and updates made are ok.

Copy link
Member

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking a bit to re-review. I've replied to some of the comment threads. I'll also check now to see if other comments got lost from the diff where they may have been on code that was removed or updated.

for coordinate in coordinate_variables_list
if varinfo.get_variable(coordinate).is_latitude()
for coordinate in coordinate_variables
if varinfo.get_variable(coordinate) is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is present in the variable and not in the file - it will fail during prefetch. That is when it is first accessed from the file. - will check if I have a test on that

That makes sense. To test that, you'd have to mock a failed response from OPeNDAP. This might be a good place to look for an example of mocking a failed request.

There are unit tests to check for the above cases in line 205 of test_coordinate_utilities.py

That test is great, but it's not specifically for this function, it's for get_coordinate_array. The condition is present here in get_coordinate_variables, and so you need to write unit tests that isolate behaviour within this function (in addition to whatever already exists for other functions).

It's entirely plausible that get_coordinate_array could be updated in some way that means those unit tests you have a link to are removed. You would then lose the coverage of those test cases for get_coordinate_variables.

hoss/coordinate_utilities.py Show resolved Hide resolved
return max_y_spread_pts, max_x_spread_pts


def get_max_x_spread_pts(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood - it is called twice: once to get the row and once to get the columns. That means calling the function get_max_x_spread_pts is misleading as it implies you only call it for x. I think my original request to rename the function still is applicable.

hoss/coordinate_utilities.py Show resolved Hide resolved

required_dimensions.update(bounds)
if latitude_coordinates and longitude_coordinates:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following this comment. Are you saying you are trying to guard against the possibility that you've only found either latitudes or longitudes, but not both?

Honest question: what should happen in that case?

prefetch_path = 'tests/data/SC_SPL3SMP_008_prefetch.nc4'
mock_get_opendap_nc4.return_value = prefetch_path
access_token = 'access'
output_dir = 'tests/output'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just use the self.temp_dir defined in the setUp method for the class?

prefetch_path, required_dimensions, self.varinfo, self.logger
)

@patch('hoss.dimension_utilities.check_add_artificial_bounds')
@patch('hoss.dimension_utilities.get_opendap_nc4')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice patching.

mock_check_add_artificial_bounds,
mock_get_coordinate_variables,
):
"""Ensure that when a list of required variables is specified, a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is missing its end.

prefetch_path = 'tests/data/SC_SPL3SMP_008_prefetch.nc4'
mock_get_opendap_nc4.return_value = prefetch_path
access_token = 'access'
output_dir = 'tests/output'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about using self.temp_dir.

requested_variables,
)
mock_get_opendap_nc4.assert_called_once_with(
url, set(), output_dir, self.logger, access_token, self.config
Copy link
Member

@owenlittlejohns owenlittlejohns Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Just so I am reading this properly...

If there are no normal dimensions or coordinates that are found for required variables, we make a request to OPeNDAP that has nothing specified in the DAP4 constraint expression? If so, that's not great - because that implies there's nothing for us to prefetch, and that we shouldn't be trying to do one. Making a request to OPeNDAP without a constraint expression will mean OPeNDAP returns the entire granule.

Just to help trace the code through:

(This is another reason why unit tests are great - they uncover behaviour like this!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefetch should only be called if a request is made that requires HOSS to determine index ranges (for spatial, temporal or dimension ranges). But this test is saying: sure you want us to do index-based subsetting, but we can't find anything that will allow us to calculate those index ranges.

To me, that feels like it should be an error.

)
self.assertListEqual(actual_x_y_points, expected_x_y_points)

def test_get_dimension_array_from_geo_points(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First up - thanks for adding unit tests for this new function!

I think the coverage of the two branches of conditional logic (x versus y) is good. I wonder, though, is it possible for the latitude array, longitude array and/or row indices have either only 1 value or 0 values? If so we probably need tests for those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants