Skip to content

hon9kon9ize/yue-g2p-benchmark

Repository files navigation

Cantonese G2P Evaluation Benchmark

This project is a benchmark for evaluating Cantonese Grapheme-to-Phoneme (G2P) systems. The benchmark is based on the Jyutping Romanization System. The data is obtained from word.hk.

The dataset includes character pairs along with their corresponding ground truth phonemes for G2P model evaluation. Each text file line contains a word, an underscore (_), and the target character for phoneme prediction. This format focuses on single-character phoneme prediction within words, making it suitable for assessing G2P models on specific characters.

Metrics

The benchmark uses the following metrics:

  • Accuracy: The percentage of words that are correctly converted the specified character from graphemes to phonemes.
  • Levenshtein Distance: The average Levenshtein distance between the predicted phonemes and the ground truth phonemes.

Usage

Pre-requisites

# pull submodules
git submodule update --init --recursive
# install dependencies
pip install -r requirements.txt
# install g2pW-Cantonese dependencies
pip install -r g2pW-Cantonese/requirements.txt

Run the Benchmark

python run.py

Leaderboard

Leaderboard

Runtime Comparison

runtime

How to Submit

To submit your G2P system, please subclass the G2PModel class in models and implement the _predict method. Then, add your model to the models list in run.py. Finally, run run.py to generate the results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages