Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ragged aux_labels in k2.compose #686

Merged
merged 6 commits into from
Mar 18, 2021

Conversation

csukuangfj
Copy link
Collaborator

@csukuangfj csukuangfj commented Mar 16, 2021

It is needed in

k2.compose(decoding_lats, word_fsas)

where aux_labels in decoding_lats is a ragged tensor.

This pull-request extends k2.compose to support that.


[EDITED]: Mark it as WIP. Will merge after testing it in snowfall.

Used in k2-fsa/snowfall#106

@csukuangfj csukuangfj changed the title Support ragged aux_labels in k2.compose WIP: Support ragged aux_labels in k2.compose Mar 16, 2021
@csukuangfj csukuangfj changed the title WIP: Support ragged aux_labels in k2.compose Support ragged aux_labels in k2.compose Mar 18, 2021
@csukuangfj
Copy link
Collaborator Author

Ready for review.

@@ -1346,6 +1346,9 @@ Array1<T> ComputeHash(Ragged<int32_t> &src);
repeats (i.e., multiplicity) of each output sequence.
The caller does not need to pre-allocate it. It is
allocated inside the function.
@param [out] new2old_indexes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each output sublist could have appeared multiple times in the input, so is there a rule for which one is listed?

Should probably specify that this would be an idx0 if src had 2 axes, and an idx01 if src had 3 axes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each output sublist could have appeared multiple times in the input, so is there a rule for which one is listed?

Sublists of a seq are reordered by their hash values by calling SortSublists.
Although the implementation ofSortSublists does not mention that it is stable, the Python tests
show that the first one is kept, other repeats are discarded.

I think we don't care which one is kept since their contents are identical (assuming there are no hash conflicts).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably specify that this would be an idx0 if src had 2 axes, and an idx01 if src had 3 axes.

Thanks. Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it just puts an arbitrary one of the inputs, you should specify that. (Also specify if this is nondeterministic, i.e.
may give different outputs for the same input on different run.s).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it just puts an arbitrary one of the inputs, you should specify that. (Also specify if this is nondeterministic, i.e.
may give different outputs for the same input on different run.s).

Fixed. It is described in the comment that it uses a deterministic algorithm.

Copy link
Collaborator

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@danpovey danpovey merged commit b33eab3 into k2-fsa:master Mar 18, 2021
@csukuangfj csukuangfj deleted the fix-compose branch March 18, 2021 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants