Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
kurianbenoy committed Jun 9, 2024
1 parent 6c4f5f5 commit 457d8da
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 23 deletions.
32 changes: 20 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,11 +48,11 @@ pip install git+https://github.com/kurianbenoy/whisper_normalizer.git
- I made a video walk through on how to use the `whisper_normalizer`
python package.

[Colab Notebook
Link](https://colab.research.google.com/gist/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506/hello-world_whisper_normalizer.ipynb)
[Colab Notebook Link of walk
through](https://colab.research.google.com/gist/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506/hello-world_whisper_normalizer.ipynb)

[Github Gist
Link](https://gist.github.com/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506)
[Github Gist Link of walk
through](https://gist.github.com/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506)

[![Hello world to
whisper_normalizer](https://img.youtube.com/vi/c7trf0zul6g/0.jpg)](https://www.youtube.com/watch?v=c7trf0zul6g)
Expand Down Expand Up @@ -100,6 +100,15 @@ and

You can use the same thing in this package as follows:

``` python
from whisper_normalizer.english import EnglishTextNormalizer

english_normalizer = EnglishTextNormalizer()
english_normalizer("I'm a little teapot, short and stout. Tip me over and pour me out!")
```

'i am a little teapot short and stout tip me over and pour me out'

``` python
from whisper_normalizer.basic import BasicTextNormalizer

Expand All @@ -109,16 +118,15 @@ normalizer("I'm a little teapot, short and stout. Tip me over and pour me out!")

'i m a little teapot short and stout tip me over and pour me out '

``` python
from whisper_normalizer.english import EnglishTextNormalizer

english_normalizer = EnglishTextNormalizer()
english_normalizer("I'm a little teapot, short and stout. Tip me over and pour me out!")
```
## Using BasicTextNormalizer in your mother tongue might be a bad idea

'i am a little teapot short and stout tip me over and pour me out'
Whisper Text Normalizer is not always recommended to be used. [Dr Kavya
Manohar](https://www.linkedin.com/in/kavya-manohar/) has written a
blogpost on why it might be a bad idea on her [blopost titled Indian
Languages and Text Normalization: Part
1](https://kavyamanohar.com/post/indic-normalizer/).

### This model extends Whisper_normalizer to support Indic languages as well.
## This model extends Whisper_normalizer to support Indic languages as well.

The logic for normalization in Indic languages is derived from
[indic-nlp-library](https://github.com/anoopkunchukuttan/indic_nlp_library).
Expand Down
31 changes: 20 additions & 11 deletions nbs/index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@
"\n",
"- I made a video walk through on how to use the `whisper_normalizer` python package.\n",
"\n",
"[Colab Notebook Link](https://colab.research.google.com/gist/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506/hello-world_whisper_normalizer.ipynb)\n",
"[Colab Notebook Link of walk through](https://colab.research.google.com/gist/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506/hello-world_whisper_normalizer.ipynb)\n",
"\n",
"[Github Gist Link](https://gist.github.com/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506)\n",
"[Github Gist Link of walk through](https://gist.github.com/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506)\n",
"\n",
"[![Hello world to whisper_normalizer](https://img.youtube.com/vi/c7trf0zul6g/0.jpg)](https://www.youtube.com/watch?v=c7trf0zul6g)\n"
]
Expand Down Expand Up @@ -114,7 +114,7 @@
{
"data": {
"text/plain": [
"'i m a little teapot short and stout tip me over and pour me out '"
"'i am a little teapot short and stout tip me over and pour me out'"
]
},
"execution_count": null,
Expand All @@ -124,10 +124,10 @@
],
"source": [
"#|eval: false\n",
"from whisper_normalizer.basic import BasicTextNormalizer\n",
"from whisper_normalizer.english import EnglishTextNormalizer\n",
"\n",
"normalizer = BasicTextNormalizer()\n",
"normalizer(\"I'm a little teapot, short and stout. Tip me over and pour me out!\")"
"english_normalizer = EnglishTextNormalizer()\n",
"english_normalizer(\"I'm a little teapot, short and stout. Tip me over and pour me out!\")"
]
},
{
Expand All @@ -138,7 +138,7 @@
{
"data": {
"text/plain": [
"'i am a little teapot short and stout tip me over and pour me out'"
"'i m a little teapot short and stout tip me over and pour me out '"
]
},
"execution_count": null,
Expand All @@ -148,17 +148,26 @@
],
"source": [
"#|eval: false\n",
"from whisper_normalizer.english import EnglishTextNormalizer\n",
"from whisper_normalizer.basic import BasicTextNormalizer\n",
"\n",
"english_normalizer = EnglishTextNormalizer()\n",
"english_normalizer(\"I'm a little teapot, short and stout. Tip me over and pour me out!\")"
"normalizer = BasicTextNormalizer()\n",
"normalizer(\"I'm a little teapot, short and stout. Tip me over and pour me out!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using BasicTextNormalizer in your mother tongue might be a bad idea\n",
"\n",
"Whisper Text Normalizer is not always recommended to be used. [Dr Kavya Manohar](https://www.linkedin.com/in/kavya-manohar/) has written a blogpost on why it might be a bad idea on her [blopost titled Indian Languages and Text Normalization: Part 1](https://kavyamanohar.com/post/indic-normalizer/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### This model extends Whisper_normalizer to support Indic languages as well. \n",
"## This model extends Whisper_normalizer to support Indic languages as well. \n",
"\n",
"The logic for normalization in Indic languages is derived from [indic-nlp-library](https://github.com/anoopkunchukuttan/indic_nlp_library). The logic for Malayalam normalization is expanded beyond the Indic NLP library by `MalayalamNormalizer`."
]
Expand Down

0 comments on commit 457d8da

Please sign in to comment.