How can I know on which exact word a model had results? #6253
Replies: 1 comment
-
Hi, If you skip the training part all the way to Prediction Pipeline you can see the example of both: https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/4.NERDL_Training.ipynb (keep in mind if you care about the O then you get B- and the I- and if you care about the entities like PERSON instead of B-PERSON then you don’t want to see the O, this is the correct way but you can always manipulate the output from the both annotators to get the desired results when the defaults are not suitable) |
Beta Was this translation helpful? Give feedback.
-
When I use the CoreNLP library in java and I do an operation like tokenizing or NER on a sentence, the output that will get produced is for example: [PERSON, O, O, O, O, O, O, O]. When I'm using Spark NLP to do to same operations I just won't have any value for the "unmatched" words, and this makes the work pretty hard if you want to use multiple models on the same sentence and get some metrics on each word. I want to be able to understand each word of a sentence which NER matched, the tokenized form, and the lemma. The outputs I currently get are not [PERSON, O, O, O, O, O, O, O] like in CoreNLP, but things like [PERSON], i.e. only the "matched" result.
Is there any way to bind the places of each word by spaces so the results of the models will come back split by commas in a way that all of the words models will return results from the same length of my input text length that I will be able to iterate for a fixed number of times and get the models outputs for each word? i.e. for the sentence "Joe and John like to walk", to be able to iterate each word (separated by spaced for instance), and get for the word "John" the NER of PERSON, and also its relative token?
Beta Was this translation helpful? Give feedback.
All reactions