Skip to content

Generate Cantonese Instruction dataset by Gemini Pro using Stanford's Alpaca prompts for fine-tuning LLMs.

License

Notifications You must be signed in to change notification settings

hon9kon9ize/yue-alpaca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

廣東話草泥馬 Cantonese Alpaca

Cantonese Alpaca

Generate Cantonese Instruction dataset by Gemini Pro using Stanford's Alpaca prompts for fine-tuning LLMs. this repo contain a script to generate the dataset and manually translate seed prompts to Cantonese from Alpaca repo.

You can find the generated dataset on Huggingface here.

Pre-requisites

pip install -r requirements.txt

Usage

export GOOGLE_AISTUDIO_API_KEY=YOUR_API_KEY

python generate.py

Citation Information

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

About

Generate Cantonese Instruction dataset by Gemini Pro using Stanford's Alpaca prompts for fine-tuning LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages