Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for one-hot encoded features in minimization #87

Merged
merged 11 commits into from
Dec 24, 2023

Conversation

abigailgold
Copy link
Member

Any columns in the input data that represent one-hot encoded features can be minimized together to maintain the correctness of the encoding. This is the case both when using the transform method, and in the representative values within the generalizations structure.

Signed-off-by: abigailt <[email protected]>
Signed-off-by: abigailt <[email protected]>
…ons for 1-hot encoded features are consistent.

Signed-off-by: abigailt <[email protected]>
self.categorical_features = []
if categorical_features:
self.categorical_features = categorical_features
self.features_to_minimize = features_to_minimize
self.feature_slices = feature_slices
if self.feature_slices:
self.all_one_hot_features = set([str(feature) for encoded in self.feature_slices for feature in encoded])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use set comprehension right away instead of converting list comprehension to set.

@@ -375,6 +396,8 @@ def fit(self, X: Optional[DATA_PANDAS_NUMPY_TYPE] = None, y: Optional[DATA_PANDA
x_test_dataset = ArrayDataset(x_test, features_names=self._features)
self._ncp_scores.fit_score = self.calculate_ncp(x_test_dataset)
self._ncp_scores.generalizations_score = self.calculate_ncp(x_test_dataset)
else:
print('No fitting was performed as some information was missing')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the message be a bit more helpful?

elif range['end'] is None and range['start'] > 0:
feature_value = 1
elif range['start'] is not None and range['end'] is not None:
print(range)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the feature_value in this case? Seems like an unassigned feature_value will be appended in the next line.
Is the print the only thing that happens here? And even if so, shouldn't some text explain the meaning of this print?

feature_value = 1
elif range['start'] is not None and range['end'] is not None:
print(range)
new_cell['categories'][feature].append(feature_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what's the feature_value if none of the ifs match?

def _get_other_features_in_encoding(feature, feature_slices):
for encoded in feature_slices:
if feature in encoded:
return (list(set(encoded) - set([feature]))), encoded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: set([feature]) can be replaced with {feature}

new_cell['categories'][other_feature].append(1)
else:
new_cell['categories'][other_feature].append(0)
new_cell['categories'][other_feature].append(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand where this is narrowed down to the single correct value.

…nding so that options are narrowed down

Signed-off-by: abigailt <[email protected]>
Signed-off-by: abigailt <[email protected]>
Signed-off-by: abigailt <[email protected]>
Signed-off-by: abigailt <[email protected]>
@abigailgold abigailgold merged commit 6d81cd8 into main Dec 24, 2023
4 checks passed
@abigailgold abigailgold deleted the one_hot_minimization branch December 24, 2023 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants