-
Notifications
You must be signed in to change notification settings - Fork 42
WIP: add compute-post. #210
base: master
Are you sure you want to change the base?
Conversation
I just created a pull-request in Lhotse lhotse-speech/lhotse#319 to add Also, I find the alignment information contained in the supervision is too simple, see symbol: str
start: Seconds
duration: Seconds Can we move the alignment class from snowfall to lhotse? snowfall/snowfall/tools/ali.py Lines 20 to 28 in bce7330
|
The usage of $ snowfall ali compute-ali -l data/lang_nosp -p ./exp/cuts_post.json --max-duration=500 -o exp |
snowfall/tools/ali.py
Outdated
phone_ids_with_blank = [0] + phone_ids | ||
ctc_topo = k2.arc_sort(build_ctc_topo(phone_ids_with_blank)) | ||
|
||
if not (lang_dir / 'HLG.pt').exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be refactored to a function and re-used across this script and decode scripts (and possibly others)
def load_or_compile_HLG(lang_dir: Path) -> k2.Fsa: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Will refactor it and add options to enable/disable LM rescoring.
snowfall/tools/ali.py
Outdated
HLG = k2.Fsa.from_dict(d) | ||
|
||
HLG = HLG.to(device) | ||
HLG.aux_labels = k2.ragged.remove_values_eq(HLG.aux_labels, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this line doing? It looks like it's "sparsifying" the aux_labels (word ids) but how does HLG know which labels correspond to which aux_labels after that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just removes 0's from the word sequences; actually, it may not be necessary any more because we changed some defaults of what happens when you do remove_epsilons and convert linear to ragged attributes.
snowfall/tools/ali.py
Outdated
supervision_segments, | ||
allow_truncate=sf - 1) | ||
|
||
lattices = k2.intersect_dense_pruned(HLG, dense_fsa_vec, 20.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pruning related arguments here could be function parameters
output_dir.mkdir(exist_ok=True) | ||
storage_path = output_dir / 'posts' | ||
|
||
posts_writer = lhotse.NumpyFilesWriter(storage_path=storage_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to create a lot of files, not sure if NumpyHdf5Writer is preferable
Can you describe the issue more? I'm not sure I understand what's missing there. We could move Snowfall's frame-wise alignment to Lhotse but I'm not sure how to make the two representations compatible with each other (the CTM-like description seems more general to me as you can cast it to frame-wise representation with different frame shifts). |
BTW I wonder if we should support piping these programs together, Kaldi-style. Click easily allows doing that with file type arguments. We could do that by writing/reading JSONL-serialized manifests in a streaming manner. Since most operations on WDYT? |
... there is also some code for line-by-line incremental JSONL writing in Lhotse that could be extended to support this. |
This cool; I'm afraid I'm not following it in detail. |
Fair enough. The idea is to allow sth like:
but I just realized that with the current way things are done in Lhotse, we would have store the actual arrays/tensors on disk and just pass the manifests around, which might not be optimal. Maybe it's not relevant for now and we can see how to do that in the future, if needed at all. |
BTW, I tend to think being able to do something at all tends to be more important than that thing being efficient-- premature optimization being the root of all evil etc., although I did plenty of it in Kaldi. I don't know what the optimal solution is here, I am afraid I have not been following this PR closely enough. |
Agreed. But for the record, the full quote is actually:
|
Usage:
I find that there is one issue with the Torch Scripted module: We have to know the signature of the
forward
functionof the model as well as its subsampling factor.
Working on
compute-ali
and will submit them together.