Note: 29.2.2024
- Download some datasets or subsets such as from Wikipedia or other corpora: https://huggingface.co/datasets/wikipedia
- Build a new corpus/corpuses for Bulgarian
- Continue the BgGPT etc. development
- If you manage, convert the GPT2-Medium model from h5 to ggml for fast CPU inference (in progress, fixed 50255 tokens (50257) on the fly, but some other tf-pt incompatibilities, transposes...)
- Continue to work with Whisper, integrate with AutoClap and Toshko 2
- ...