-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ragged aux_labels in k2.compose #686
Conversation
Ready for review. |
@@ -1346,6 +1346,9 @@ Array1<T> ComputeHash(Ragged<int32_t> &src); | |||
repeats (i.e., multiplicity) of each output sequence. | |||
The caller does not need to pre-allocate it. It is | |||
allocated inside the function. | |||
@param [out] new2old_indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each output sublist could have appeared multiple times in the input, so is there a rule for which one is listed?
Should probably specify that this would be an idx0 if src
had 2 axes, and an idx01 if src
had 3 axes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each output sublist could have appeared multiple times in the input, so is there a rule for which one is listed?
Sublists of a seq are reordered by their hash values by calling SortSublists
.
Although the implementation ofSortSublists
does not mention that it is stable, the Python tests
show that the first one is kept, other repeats are discarded.
I think we don't care which one is kept since their contents are identical (assuming there are no hash conflicts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably specify that this would be an idx0 if src had 2 axes, and an idx01 if src had 3 axes.
Thanks. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it just puts an arbitrary one of the inputs, you should specify that. (Also specify if this is nondeterministic, i.e.
may give different outputs for the same input on different run.s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it just puts an arbitrary one of the inputs, you should specify that. (Also specify if this is nondeterministic, i.e.
may give different outputs for the same input on different run.s).
Fixed. It is described in the comment that it uses a deterministic algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It is needed in
where
aux_labels
indecoding_lats
is a ragged tensor.This pull-request extends
k2.compose
to support that.[EDITED]: Mark it as WIP. Will merge after testing it in snowfall.
Used in k2-fsa/snowfall#106