Releases: common-voice/CorporaCreator
Releases · common-voice/CorporaCreator
release-v1.0.0
What's Changed
- Fixed #5, now dev + test have more non-power users and train more power users by @kdavis-mozilla in #6
- Fixed #7 (There is no language independent preprocessor) by @kdavis-mozilla in #8
- Fixed #9 (Preprocessors do not have access to the user_id and sentence) by @kdavis-mozilla in #10
- Fixed #12 (Newly exported clips.tsv has new languages) by @kdavis-mozilla in #13
- Export has client_id not user_id, so changed code (See #4) by @kdavis-mozilla in #15
- Fixed #18 (Some sentences contain HTML). by @kdavis-mozilla in #19
- Fixed #14 (Some sentences contain URL encoded text) by @kdavis-mozilla in #16
- Fixed #17 (Allow language specific preprocessors to reject sentences) by @kdavis-mozilla in #22
- Fixed #20 (Allow the common preprocessor to reject sentences) by @kdavis-mozilla in #21
- Fixed #23 (Preprocessors don't include docs on how to reject sentences) by @kdavis-mozilla in #24
- Formatted with black code formatter by @kdavis-mozilla in #26
- Add et to init by @Gregoor in #28
- Fixed #31 (et preprocessor contains invalid python) by @kdavis-mozilla in #32
- Readme and logging by @JRMeyer in #30
- Fixed #27 (Parallelize create-corpora) by @kdavis-mozilla in #34
- Fixed #38 (common.py should remove control codes) and Fixed #39 (common.py should remove byte order marks) by @kdavis-mozilla in #41
- Fixed typo by @kdavis-mozilla in #45
- Removed overly long listing by @kdavis-mozilla in #46
- Fixed typos + added clarifications by @kdavis-mozilla in #47
- Added Language Dependent Cleaning Info by @kdavis-mozilla in #48
- Added info on contributing by @kdavis-mozilla in #49
- Added info on obtaining audio + digits by @kdavis-mozilla in #50
- Added table of contents by @kdavis-mozilla in #51
- Added a bit of needed context by @kdavis-mozilla in #52
- Fix small typo in README by @the01 in #53
- Added flag for running only one 1 or more languages by @JRMeyer in #56
- issubset is in set not list by @kdavis-mozilla in #60
- removed unnecessary quotationmarks from de-lang sentences by @simnotes in #54
- Issue61 by @kdavis-mozilla in #62
- implemented XOR splitting method of train,dev,test by @simnotes in #58
- Welsh by @DewiBrynJones in #77
- common.py --- collapse whitespace for all langs by @JRMeyer in #80
- Welsh preprocessing + one change to common.py by @JRMeyer in #63
- remove bullet point, note about abbreviations by @JRMeyer in #81
- kyrgyz mis-typed "oe" by @JRMeyer in #82
- Fixed #83 renamed output tsv's by @kdavis-mozilla in #84
- Fixed #89 (Mark as invalid sentences with digits) by @kdavis-mozilla in #90
- Fixed #91 (README.rst and common.py out of sync) by @kdavis-mozilla in #92
- Add Mozilla Code of Conduct by @Mozilla-GitHub-Standards in #95
- Add default preprocessor, add check for existence of valid data by @phirework in #102
- Minor fixes based on flake8 - closes issue #103 by @phirework in #105
- Update validation logic following email discussion w Kelly and Megan by @phirework in #107
- Update column name for accents by @mozgzh in #118
- Update readme for latest commit about "accents" by @HarikalarKutusu in #119
- Update README.rst by @rlneumiller in #116
- Add variant column by @mozgzh in #124
New Contributors
- @kdavis-mozilla made their first contribution in #6
- @Gregoor made their first contribution in #28
- @JRMeyer made their first contribution in #30
- @the01 made their first contribution in #53
- @simnotes made their first contribution in #54
- @DewiBrynJones made their first contribution in #77
- @Mozilla-GitHub-Standards made their first contribution in #95
- @phirework made their first contribution in #102
- @mozgzh made their first contribution in #118
- @HarikalarKutusu made their first contribution in #119
- @rlneumiller made their first contribution in #116
Full Changelog: https://github.com/common-voice/CorporaCreator/commits/release-v1.0.0