John Snow Labs Spark-NLP 3.3.1: New EntityRuler annotator, better integration with TokenClassification annotators, new state-of-the-art XLM-RoBERTa models in African Languages, and bug fixes! #6317

maziyarpanahi · 2021-10-18T11:18:16Z

maziyarpanahi
Oct 18, 2021
Maintainer

Overview

We are pleased to release Spark NLP 🚀 3.3.1! This release comes with a new EntityRuler annotator, better compatibility between TokenClassification annotators and other annotators in Spark NLP pipeline, new state-of-the-art XLM-RoBERTa models in African Languages, and bug fixes!

As always, we would like to thank our community for their feedback, questions, and feature requests.

New Features

Introducing EntityRuler annotators to receive either a JSON or CSV ontology file that maps entities to patterns. You can implement a purely rule-based entity recognition system by using EntityRuler, it can be saved as a Model and reused in other pipelines to annotate your document against your knowledge base.

Access EntityRuler Documentation

Bug Fixes

Fix compatibility issue between NerOverwriter and AlbertForTokenClassification, BertForTokenClassification, DistilBertForTokenClassification, LongformerForTokenClassification, RoBertaForTokenClassification, XlmRoBertaForTokenClassification, XlnetForTokenClassification annotators
Fix a bug in ContextSpellCheckerApproach annotator failing to find an appropriate TF graph
Fix a bug in ContextSpellCheckerModel not being able to load a trained model
Fix token alignment with token pieces in BertEmbeddings resulting in missing vectors with Unicode characters
Add the missing pretrained NER models for the XlmRoBertaForTokenClassification annotator
Add the missing pretrained NER models for the LongformerForTokenClassification annotator

Backward compatibility

Renaming YakeModel to YakeKeywordExtraction to represent the actual purpose of this annotator more clearly.

Models and Pipelines

New state-of-the-art XLM-RoBERTa models in Luganda, Naija, Yoruba, Hausa, Kinyarwanda, Wolof, Igbo, Amharic, Swahili, and Luo.

New Transformer Models

Model	Name	Build	Lang
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_yoruba	`3.3.1`	`yo`
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_wolof	`3.3.1`	`wo`
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_naija	`3.3.1`	`pcm`
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_swahili	`3.3.1`	`sw`
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_luganda	`3.3.1`	`lg`
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_kinyarwanda	`3.3.1`	`rw`
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_hausa	`3.3.1`	`ha`
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_igbo	`3.3.1`	`ig`
XlmRoBertaSentenceEmbeddings	sent_xlm_roberta_base_finetuned_amharic	`3.3.1`	`am`
XlmRoBertaEmbeddings	xlm_roberta_base_finetuned_yoruba	`3.3.1`	`yo`
XlmRoBertaEmbeddings	xlm_roberta_base_finetuned_wolof	`3.3.1`	`wo`
XlmRoBertaEmbeddings	xlm_roberta_base_finetuned_swahili	`3.3.1`	`sw`
XlmRoBertaEmbeddings	xlm_roberta_base_finetuned_naija	`3.3.1`	`pcm`
XlmRoBertaEmbeddings	xlm_roberta_base_finetuned_luo	`3.3.1`	`lou`

The complete list of all 4000+ models & pipelines in 200+ languages is available on Models Hub.

New Notebooks

Spark NLP	Jupyter Notebooks
EntityRuler	EntityRuler
EntityRuler	EntityRuler_LightPipeline
EntityRuler	EntityRuler_Whitout_Storage

Documentation

TF Hub & HuggingFace to Spark NLP
Models Hub with new models
Spark NLP documentation
Spark NLP Scala APIs
Spark NLP Python APIs
Spark NLP Workshop notebooks
Spark NLP publications
Spark NLP in Action
Spark NLP training certification notebooks for Google Colab and Databricks
Spark NLP Display for visualization of different types of annotations
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!

Installation

Python

#PyPI

pip install spark-nlp==3.3.1

Spark Packages

spark-nlp on Apache Spark 3.0.x and 3.1.x (Scala 2.12 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.1

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.1

spark-nlp on Apache Spark 2.4.x (Scala 2.11 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.1

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.1

spark-nlp on Apache Spark 2.3.x (Scala 2.11 only):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.1

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23-gpu_2.11:3.3.1

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.3.1

Maven

spark-nlp on Apache Spark 3.0.x and 3.1.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>3.3.1</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>3.3.1</version>
</dependency>

spark-nlp on Apache Spark 2.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark24_2.11</artifactId>
    <version>3.3.1</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark24_2.11</artifactId>
    <version>3.3.1</version>
</dependency>

spark-nlp on Apache Spark 2.3.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-spark23_2.11</artifactId>
    <version>3.3.1</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu-spark23_2.11</artifactId>
    <version>3.3.1</version>
</dependency>

FAT JARs

CPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-3.3.1.jar
GPU on Apache Spark 3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-3.3.1.jar
CPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark24-assembly-3.3.1.jar
GPU on Apache Spark 2.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark24-assembly-3.3.1.jar
CPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-spark23-assembly-3.3.1.jar
GPU on Apache Spark 2.3.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-spark23-assembly-3.3.1.jar

This discussion was created from the release John Snow Labs Spark-NLP 3.3.1: New EntityRuler annotator, better integration with TokenClassification annotators, new state-of-the-art XLM-RoBERTa models in African Languages, and bug fixes!.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

John Snow Labs Spark-NLP 3.3.1: New EntityRuler annotator, better integration with TokenClassification annotators, new state-of-the-art XLM-RoBERTa models in African Languages, and bug fixes! #6317

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

John Snow Labs Spark-NLP 3.3.1: New EntityRuler annotator, better integration with TokenClassification annotators, new state-of-the-art XLM-RoBERTa models in African Languages, and bug fixes! #6317

maziyarpanahi Oct 18, 2021 Maintainer

Overview

New Features

Bug Fixes

Backward compatibility

Models and Pipelines

New Transformer Models

New Notebooks

Documentation

Installation

Replies: 0 comments

maziyarpanahi
Oct 18, 2021
Maintainer