You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, all. Has there been any followup on this issue? I am seeing it as well. Out of a dataset of 320,000 names, probablepeople had trouble parsing about 19,000 of them, and 11,000 of those were because of this exact issue.
I tried following parserator's instructions for training the model with additional examples--used parserator's label utility to create 11 examples, which I then trained my model with. It says it wrote out an updated .crfsuite file, but I do not see an updated copy of this file anywhere, and the model's behavior has not changed. (The only .crfsuite files I see are the three that were installed with probablepeople, and they have retained their original last-modified timestamps.)
ORIGINAL STRING: Bianchette, Michael David
PARSED TOKENS: [('Bianchette,', 'Surname'), ('Michael', 'GivenName'), ('David', 'Surname')]
UNCERTAIN LABEL: Surname
When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly
To report an error in labeling a valid name, open an issue at https://github.com/datamade/probablepeople/issues/new - it'll help us continue to improve probablepeople!
The text was updated successfully, but these errors were encountered: