Cantonese G2P Evaluation Benchmark

This project is a benchmark for evaluating Cantonese Grapheme-to-Phoneme (G2P) systems. The benchmark is based on the Jyutping Romanization System. The data is obtained from word.hk.

The dataset includes character pairs along with their corresponding ground truth phonemes for G2P model evaluation. Each text file line contains a word, an underscore (_), and the target character for phoneme prediction. This format focuses on single-character phoneme prediction within words, making it suitable for assessing G2P models on specific characters.

Metrics

The benchmark uses the following metrics:

Accuracy: The percentage of words that are correctly converted the specified character from graphemes to phonemes.
Levenshtein Distance: The average Levenshtein distance between the predicted phonemes and the ground truth phonemes.

Usage

Pre-requisites

# pull submodules
git submodule update --init --recursive
# install dependencies
pip install -r requirements.txt
# install g2pW-Cantonese dependencies
pip install -r g2pW-Cantonese/requirements.txt

Run the Benchmark

python run.py

Leaderboard

Runtime Comparison

How to Submit

To submit your G2P system, please subclass the G2PModel class in models and implement the _predict method. Then, add your model to the models list in run.py. Finally, run run.py to generate the results.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
g2pW-Cantonese @ d5fce72		g2pW-Cantonese @ d5fce72
models		models
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
data.py		data.py
requirements.txt		requirements.txt
result.png		result.png
run.py		run.py
runtime.png		runtime.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cantonese G2P Evaluation Benchmark

Metrics

Usage

Pre-requisites

Run the Benchmark

Leaderboard

Runtime Comparison

How to Submit

About

Releases

Packages

Languages

hon9kon9ize/yue-g2p-benchmark

Folders and files

Latest commit

History

Repository files navigation

Cantonese G2P Evaluation Benchmark

Metrics

Usage

Pre-requisites

Run the Benchmark

Leaderboard

Runtime Comparison

How to Submit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages