Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-2232 -small functions added to support the main solution in the t… #16

Merged
merged 9 commits into from
Nov 1, 2024
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@
#
harmony-py~=0.4.10
netCDF4~=1.6.4
notebook~=7.0.4
notebook~=7.2.2
owenlittlejohns marked this conversation as resolved.
Show resolved Hide resolved
xarray~=2023.9.0
210 changes: 210 additions & 0 deletions hoss/coordinate_utilities.py
owenlittlejohns marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
""" This module contains utility functions used for
coordinate variables and functions to convert the
coordinate variable data to projected x/y dimension values
"""

import numpy as np
from netCDF4 import Dataset

# from numpy import ndarray
from varinfo import VariableFromDmr, VarInfoFromDmr

from hoss.exceptions import (
IncompatibleCoordinateVariables,
InvalidCoordinateDataset,
InvalidCoordinateVariable,
MissingCoordinateVariable,
MissingVariable,
)


def get_projected_dimension_names(varinfo: VarInfoFromDmr, variable_name: str) -> str:
"""returns the x-y projection variable names that would
match the group of the input variable. The 'projected_y' dimension
and 'projected_x' names are returned with the group pathname

"""
variable = varinfo.get_variable(variable_name)

if variable is not None:
projected_dimension_names = [
f'{variable.group_path}/projected_y',
f'{variable.group_path}/projected_x',
]
else:
raise MissingVariable(variable_name)

return projected_dimension_names


def get_projected_dimension_names_from_coordinate_variables(
varinfo: VarInfoFromDmr,
variable_name: str,
) -> list[str]:
"""
Returns the projected dimensions names from coordinate variables
"""
latitude_coordinates, longitude_coordinates = get_coordinate_variables(
varinfo, [variable_name]
)

if len(latitude_coordinates) == 1 and len(longitude_coordinates) == 1:
projected_dimension_names = get_projected_dimension_names(
varinfo, latitude_coordinates[0]
)

# if the override is the variable
elif (
varinfo.get_variable(variable_name).is_latitude()
or varinfo.get_variable(variable_name).is_longitude()
):
projected_dimension_names = get_projected_dimension_names(
varinfo, variable_name
)
else:
projected_dimension_names = []
return projected_dimension_names


def get_variables_with_anonymous_dims(
varinfo: VarInfoFromDmr, variables: set[str]
) -> set[str]:
"""
returns a set of variables without any dimensions
associated with it
"""

return set(
variable
for variable in variables
if len(varinfo.get_variable(variable).dimensions) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where we need the additional:

if (len(varinfo.get_variable(variable).dimensions) == 0
    or all([ for dimension in varinfo.get_variable(variable.dimensions) : 
                  varinfo.get_variable(dimension) not None and not [] )

(excuse the pidgeon python)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I will include it as a comment till we make the configuration change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly prefer what we have now. Also, the function contents exactly matches the function name.

Also, the snippet supplied above is not correct Python code, so it's hard to know for sure what you are trying to achieve. Trying to decompose that snippet:

if (len(varinfo.get_variable(variable).dimensions) == 0
    or all([ for dimension in varinfo.get_variable(variable.dimensions) : varinfo.get_variable(dimension) not None and not [] )
  • The first bit still makes sense - if the variable in question doesn't have dimensions.
  • Then, I think you are trying to see the VarInfoFromDmr instance does not have any of the listed dimensions as variables.
  • The and not [] is a no-op. It will always evaluate to True, because it is being evaluated in isolation, and you are asking if an empty list is "falsy", which it is.

While I don't like this approach, I think what you are trying to suggest would be more like:

if (
    len(varinfo.get_variable(variable).dimensions) == 0
    or all(
        varinfo.get_variable(dimension) == None
        for dimension in varinfo.get_variable(variable).dimensions
    )
)

If this was to be augmented in such a way, I would recommend breaking this check out into it's own function, because the set comprehension will become very hard to read.

Copy link

@D-Auty D-Auty Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see splitting out the function to clarify the code and document the comprehension.
I'm less clear on forcing the upstream code to use this code without the additional check, and then add in the call to the new function in every usage. That additional check is now essential to ensure the case of OPeNDAP creating "empty" dimensions does not allow this check, by itself, to succeed. And of course, OPeNDAP's "empty" dimensions is pretty much always going to be the case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(varinfo.get_variable(variable).dimensions) == 0
if (len(varinfo.get_variable(variable).dimensions) == 0
or any_absent_dimension_variables(variable)
...
def any_absent_dimension_variables(variable: VarInfo.variable) => bool
return any(
varinfo.get_variable(dimension) == None
for dimension in varinfo.get_variable(variable).dimensions
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function updated and unit tests added - d972777

)


def get_coordinate_variables(
varinfo: VarInfoFromDmr,
requested_variables: list,
) -> tuple[list, list]:
"""This function returns latitude and longitude variables listed in the
CF-Convention coordinates metadata attribute. It returns them in a specific
order [latitude, longitude]"
"""

coordinate_variables_list = varinfo.get_references_for_attribute(
requested_variables, 'coordinates'
)
latitude_coordinate_variables = [
coordinate
for coordinate in coordinate_variables_list
if varinfo.get_variable(coordinate).is_latitude()
]

longitude_coordinate_variables = [
coordinate
for coordinate in coordinate_variables_list
if varinfo.get_variable(coordinate).is_longitude()
]

return latitude_coordinate_variables, longitude_coordinate_variables


def get_row_col_sizes_from_coordinate_datasets(
lat_arr: np.ndarray,
lon_arr: np.ndarray,
) -> tuple[int, int]:
"""
This function returns the row and column sizes of the coordinate datasets

"""
# ToDo - if the coordinates are 3D
if lat_arr.ndim > 1 and lon_arr.shape == lat_arr.shape:
owenlittlejohns marked this conversation as resolved.
Show resolved Hide resolved
col_size = lat_arr.shape[1]
row_size = lat_arr.shape[0]
elif (
lat_arr.ndim == 1
and lon_arr.ndim == 1
and lat_arr.size > 0
and lon_arr.size > 0
):
# Todo: The ordering needs to be checked
col_size = lon_arr.size
row_size = lat_arr.size
owenlittlejohns marked this conversation as resolved.
Show resolved Hide resolved
else:
raise IncompatibleCoordinateVariables(lon_arr.shape, lat_arr.shape)
return row_size, col_size


def get_coordinate_array(
prefetch_dataset: Dataset,
coordinate_name: str,
) -> np.ndarray:
"""This function returns the `numpy` array from a
coordinate dataset.

"""
try:
coordinate_array = prefetch_dataset[coordinate_name][:]
except IndexError as exception:
raise MissingCoordinateVariable(coordinate_name) from exception

return coordinate_array


def get_1D_dim_array_data_from_dimvalues(
dim_values: np.ndarray, dim_indices: np.ndarray, dim_size: int
) -> np.ndarray:
"""
return a full dimension data array based on the 2 projected points and
grid size
"""

if (dim_indices[1] != dim_indices[0]) and (dim_values[1] != dim_values[0]):
dim_resolution = (dim_values[1] - dim_values[0]) / (
dim_indices[1] - dim_indices[0]
)
else:
raise InvalidCoordinateDataset(dim_values[0], dim_indices[0])

dim_min = dim_values[0] - (dim_resolution * dim_indices[0])
dim_max = dim_values[1] + (dim_resolution * (dim_size - 1 - dim_indices[1]))
return np.linspace(dim_min, dim_max, dim_size)


def get_valid_indices(
coordinate_row_col: np.ndarray, coordinate: VariableFromDmr
) -> np.ndarray:
"""
Returns indices of a valid array without fill values if the fill
value is provided. If it is not provided, we check for valid values
for latitude and longitude
"""

coordinate_fill = coordinate.get_attribute_value('_FillValue')
if coordinate_fill:
Copy link
Member

@owenlittlejohns owenlittlejohns Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I asked this on the other PR - should this check be mutually exclusive to the other checks in the longitude or latitude blocks, or should it be done in addition to those checks? Right now, if you have a fill value, you are only checking for where the coordinate is not fill, and not considering your other checks. (I tend to prefer this first check, but wanted to confirm what the logic was intended to be)

Copy link
Collaborator Author

@sudha-murthy sudha-murthy Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. if we have a fill value - we use that to check for validity of the data and if that is not available we check for the geo extent range. I guess we can check for it even if the fill value is provided - in case the coordinate data itself is bad data.

if we check the lat/lon valid range first, the check for fill does become redundant..the fill value would definitely be outside that range,...

I guess @autydp can weigh in

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would check the coordinates regardless, and let the fill-value be outside that range. It simplifies the code and those checks need to happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated - 51b110c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reworking this so that the fill value check and the latitude/longitude range checks can both happen.

I think the coordinate.is_latitude() and coordinate.is_longitude() checks could benefit from some numpy magic, rather than looping individually through each element. I think what you could use is the element-wise-and, which can be either written as & or np.logical_and. You could do something like:

if coordinate_fill is not None:
    is_not_fill = ~np.isclose(coordinate_row_col, float(coordinate_fill))
else:
    # Creates an entire array of `True` values.
    is_not_fill = np.ones_like(coordinate_row_col, dtype=bool)

if coordinate.is_longitude():
    valid_indices = np.where(
        np.logical_and(
            is_not_fill,
            np.logical_and(
                coordinate_row_col >= -180.0,
                coordinate_row_col <= 360.0
            )
        )
    )
elif coordinate is latitude():
    valid_indices = np.where(
        np.logical_and(
            is_not_fill,
            np.logical_and(
                coordinate_row_col >= -90.0,
                coordinate_row_col <= 90.0
            )
        )
    )
else:
    valid_indices = np.empty((0, 0))

Note, in the snippet above, I've also changed the first check from if coordinate_fill to if coordinate_fill is not None. That's pretty important, as zero could be a valid fill value, but if coordinate_fill = 0, then this check will evaluate to False.

Ultimately, I think the conditions you have now are equivalent to this, just maybe not as efficient. So the only thing I'd definitely like to see changed is that first if coordinate_fill condition, so that it's not considering of fill value of 0 to be non-fill.

valid_indices = np.where(
~np.isclose(coordinate_row_col, float(coordinate_fill))
owenlittlejohns marked this conversation as resolved.
Show resolved Hide resolved
)[0]
else:
valid_indices = np.where(coordinate_row_col)[0]

if coordinate.is_longitude():
filtered_valid_indices = np.array(
[
index
for index in valid_indices
if coordinate_row_col[index] >= -180.0
and coordinate_row_col[index] <= 360.0
]
)
elif coordinate.is_latitude():
filtered_valid_indices = np.array(
[
index
for index in valid_indices
if coordinate_row_col[index] >= -90.0
and coordinate_row_col[index] <= 90.0
]
)
else:
filtered_valid_indices = np.empty((0, 0))
return filtered_valid_indices
75 changes: 74 additions & 1 deletion hoss/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ class InvalidRequestedRange(CustomError):
def __init__(self):
super().__init__(
'InvalidRequestedRange',
'Input request specified range outside supported ' 'dimension range',
'Input request specified range outside supported dimension range',
owenlittlejohns marked this conversation as resolved.
Show resolved Hide resolved
)


Expand Down Expand Up @@ -108,6 +108,79 @@ def __init__(self):
)


class MissingVariable(CustomError):
"""This exception is raised when HOSS tries to get variables and
they are missing or empty.

"""

def __init__(self, referring_variable):
super().__init__(
'MissingVariable',
f'"{referring_variable}" is ' 'not present in source granule file.',
)


class MissingCoordinateVariable(CustomError):
"""This exception is raised when HOSS tries to get latitude and longitude
variables and they are missing or empty. These variables are referred to
in the science variables with coordinate attributes.

"""

def __init__(self, referring_variable):
super().__init__(
'MissingCoordinateVariable',
f'Coordinate: "{referring_variable}" is '
'not present in source granule file.',
)


class InvalidCoordinateVariable(CustomError):
"""This exception is raised when HOSS tries to get latitude and longitude
variables and they have fill values to the extent that it cannot be used.
These variables are referred in the science variables with coordinate attributes.

"""

def __init__(self, referring_variable):
super().__init__(
'InvalidCoordinateVariable',
f'Coordinate: "{referring_variable}" is '
'not valid in source granule file.',
)


class IncompatibleCoordinateVariables(CustomError):
"""This exception is raised when HOSS tries to get latitude and longitude
coordinate variable and they do not match in shape or have a size of 0.

"""

def __init__(self, longitude_shape, latitude_shape):
super().__init__(
'IncompatibleCoordinateVariables',
f'Longitude coordinate shape: "{longitude_shape}"'
f'does not match the latitude coordinate shape: "{latitude_shape}"',
)


class InvalidCoordinateDataset(CustomError):
"""This exception is raised when the two values passed to
the function computing the resolution are equal. This could
occur when there are too many fill values and distinct valid
indices could not be obtained

"""

def __init__(self, dim_value, dim_index):
super().__init__(
'InvalidCoordinateDataset',
'Cannot compute the dimension resolution for '
f'dim_value: "{dim_value}" dim_index: "{dim_index}"',
)


class UnsupportedShapeFileFormat(CustomError):
"""This exception is raised when the shape file included in the input
Harmony message is not GeoJSON.
Expand Down
Loading