update README

kurianbenoy · Jun 9, 2024 · 457d8da · 457d8da
1 parent 6c4f5f5
commit 457d8da
Show file tree

Hide file tree

Showing 2 changed files with 40 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -48,11 +48,11 @@ pip install git+https://github.com/kurianbenoy/whisper_normalizer.git
 - I made a video walk through on how to use the `whisper_normalizer`
   python package.
 
-[Colab Notebook
-Link](https://colab.research.google.com/gist/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506/hello-world_whisper_normalizer.ipynb)
+[Colab Notebook Link of walk
+through](https://colab.research.google.com/gist/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506/hello-world_whisper_normalizer.ipynb)
 
-[Github Gist
-Link](https://gist.github.com/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506)
+[Github Gist Link of walk
+through](https://gist.github.com/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506)
 
 [![Hello world to
 whisper_normalizer](https://img.youtube.com/vi/c7trf0zul6g/0.jpg)](https://www.youtube.com/watch?v=c7trf0zul6g)
@@ -100,6 +100,15 @@ and
 
 You can use the same thing in this package as follows:
 
+``` python
+from whisper_normalizer.english import EnglishTextNormalizer
+
+english_normalizer = EnglishTextNormalizer()
+english_normalizer("I'm a little teapot, short and stout. Tip me over and pour me out!")
+```
+
+    'i am a little teapot short and stout tip me over and pour me out'
+
 ``` python
 from whisper_normalizer.basic import BasicTextNormalizer
 
@@ -109,16 +118,15 @@ normalizer("I'm a little teapot, short and stout. Tip me over and pour me out!")
 
     'i m a little teapot short and stout tip me over and pour me out '
 
-``` python
-from whisper_normalizer.english import EnglishTextNormalizer
-
-english_normalizer = EnglishTextNormalizer()
-english_normalizer("I'm a little teapot, short and stout. Tip me over and pour me out!")
-```
+## Using BasicTextNormalizer in your mother tongue might be a bad idea
 
-    'i am a little teapot short and stout tip me over and pour me out'
+Whisper Text Normalizer is not always recommended to be used. [Dr Kavya
+Manohar](https://www.linkedin.com/in/kavya-manohar/) has written a
+blogpost on why it might be a bad idea on her [blopost titled Indian
+Languages and Text Normalization: Part
+1](https://kavyamanohar.com/post/indic-normalizer/).
 
-### This model extends Whisper_normalizer to support Indic languages as well.
+## This model extends Whisper_normalizer to support Indic languages as well.
 
 The logic for normalization in Indic languages is derived from
 [indic-nlp-library](https://github.com/anoopkunchukuttan/indic_nlp_library).

diff --git a/nbs/index.ipynb b/nbs/index.ipynb
@@ -61,9 +61,9 @@
     "\n",
     "- I made a video walk through on how to use the `whisper_normalizer` python package.\n",
     "\n",
-    "[Colab Notebook Link](https://colab.research.google.com/gist/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506/hello-world_whisper_normalizer.ipynb)\n",
+    "[Colab Notebook Link of walk through](https://colab.research.google.com/gist/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506/hello-world_whisper_normalizer.ipynb)\n",
     "\n",
-    "[Github Gist Link](https://gist.github.com/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506)\n",
+    "[Github Gist Link of walk through](https://gist.github.com/kurianbenoy/7d27d9ec193a4a97ec7821235bddc506)\n",
     "\n",
     "[![Hello world to whisper_normalizer](https://img.youtube.com/vi/c7trf0zul6g/0.jpg)](https://www.youtube.com/watch?v=c7trf0zul6g)\n"
    ]
@@ -114,7 +114,7 @@
     {
      "data": {
       "text/plain": [
-       "'i m a little teapot short and stout tip me over and pour me out '"
+       "'i am a little teapot short and stout tip me over and pour me out'"
       ]
      },
      "execution_count": null,
@@ -124,10 +124,10 @@
    ],
    "source": [
     "#|eval: false\n",
-    "from whisper_normalizer.basic import BasicTextNormalizer\n",
+    "from whisper_normalizer.english import EnglishTextNormalizer\n",
     "\n",
-    "normalizer = BasicTextNormalizer()\n",
-    "normalizer(\"I'm a little teapot, short and stout. Tip me over and pour me out!\")"
+    "english_normalizer = EnglishTextNormalizer()\n",
+    "english_normalizer(\"I'm a little teapot, short and stout. Tip me over and pour me out!\")"
    ]
   },
   {
@@ -138,7 +138,7 @@
     {
      "data": {
       "text/plain": [
-       "'i am a little teapot short and stout tip me over and pour me out'"
+       "'i m a little teapot short and stout tip me over and pour me out '"
       ]
      },
      "execution_count": null,
@@ -148,17 +148,26 @@
    ],
    "source": [
     "#|eval: false\n",
-    "from whisper_normalizer.english import EnglishTextNormalizer\n",
+    "from whisper_normalizer.basic import BasicTextNormalizer\n",
     "\n",
-    "english_normalizer = EnglishTextNormalizer()\n",
-    "english_normalizer(\"I'm a little teapot, short and stout. Tip me over and pour me out!\")"
+    "normalizer = BasicTextNormalizer()\n",
+    "normalizer(\"I'm a little teapot, short and stout. Tip me over and pour me out!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Using BasicTextNormalizer in your mother tongue might be a bad idea\n",
+    "\n",
+    "Whisper Text Normalizer is not always recommended to be used. [Dr Kavya Manohar](https://www.linkedin.com/in/kavya-manohar/) has written a blogpost on why it might be a bad idea on her [blopost titled Indian Languages and Text Normalization: Part 1](https://kavyamanohar.com/post/indic-normalizer/)."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### This model extends Whisper_normalizer to support Indic languages as well. \n",
+    "## This model extends Whisper_normalizer to support Indic languages as well. \n",
     "\n",
     "The logic for normalization in Indic languages is derived from [indic-nlp-library](https://github.com/anoopkunchukuttan/indic_nlp_library). The logic for Malayalam normalization is expanded beyond the Indic NLP library by `MalayalamNormalizer`."
    ]