From 22d256bdfcc2a0e18f09856f9b6e57534f5afe77 Mon Sep 17 00:00:00 2001 From: laubonghaudoi Date: Tue, 19 Nov 2024 14:56:07 -0800 Subject: [PATCH] Update index.html --- index.html | 53 ++++++++++++++++++++++++++--------------------------- 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/index.html b/index.html index 6f8ecab..9cd4f09 100644 --- a/index.html +++ b/index.html @@ -76,7 +76,7 @@

Total Duration

- 66.02 個鐘 hours
(3960.67 分鐘 minutes + 66.02 個鐘 hours
(3960.67 分鐘 minutes)

@@ -98,19 +98,18 @@

介紹 Introduction

- 本數據集由廣州最出名嘅話劇演員、説書藝人(講古佬)張悦楷講《三國演義》錄音製成。所有錄音均錄於 - 1980 - 年代。數據集所有文本均由人工轉寫,並根據《三國演義》原文校對嚟確保準確性。 + 本數據集由廣州最出名嘅話劇演員、説書藝人(講古佬)張悦楷喺 1980 + 年代電台播講《三國演義》嘅錄音製成。數據集所有文本均由人工轉寫,並根據《三國演義》原文校對嚟確保準確性。

-

+

This dataset was made from recordings of Zoeng Jyut Gaai, the most famous drama actor and storyteller in Canton, storytelling - Romance of the Three Kingdoms. All recordings were recorded - in the 1980s. All texts in the dataset were transcribed manually and - proofread according to the original text of - Romance of the Three Kingdoms to ensure accuracy. + Romance of the Three Kingdoms during the 1980s. All texts + in the dataset were transcribed manually and proofread according to + the original text of Romance of the Three Kingdoms to + ensure accuracy.

-

+

本數據集可用於各種用途,例如語音合成(TTS)、語音識別(ASR)、語言模型(LLM)、語言學分析等等。介紹 Introduction

張悦楷語音合成 就係一個用本數據集訓練出嚟嘅 TTS 系統。

-

+

This dataset is multi-purposed. It can be used for Text-To-Speech (TTS), Automatic Speech Recognition (ASR), Language Modeling, linguistics analysis, etc. As an example, @@ -163,53 +162,53 @@

數據樣例 Data samples

下載 Download

-

+

如果你想單純克隆所有 wav 文件,可以用下面嘅命令嚟凈係克隆個 wav/ 路徑,避免 clone 晒成個 repo:

-

+

If you want to clone only the wav files without cloning the entire repo, use the following commands to clone the wav/ directory only:

git clone --filter=blob:none --sparse https://huggingface.co/datasets/laubonghaudoi/zoengjyutgaai_saamgwokjinji
+          >git clone --filter=blob:none --sparse https://huggingface.co/datasets/CanCLID/zoengjyutgaai_saamgwokjinji
 
 cd zoengjyutgaai_saamgwokjinji
 
 git sparse-checkout init --cone
 git sparse-checkout set wav
 git checkout
-

+

所有文字轉寫都喺 wav/metadata.csv入面。

-

+

All text transcriptions are in wav/metadata.csv.

説明 Info

-

+

所有源字幕 SRT 文件都存放喺 Hugging Face 倉庫嘅srt/路經下。所有源音頻都以 .webm 格式放喺 .webm/ 路經下。

-

+

All source subtitle SRT files are stored in the srt/ directory of the Hugging Face repository. All source audio are stored in .webm format in the .webm/ directory.

-