namaco

namaco is a library for character-based Named Entity Recognition. namaco will especially focus on Japanese and Chinese named entity recognition.

Demo

The following demo shows Chinese Named Entity Recognition:

Feature Support

namaco would provide following features:

learning model by your data.
tagging sentences by learned model.

Install

To install namaco, simply run:

$ pip install namaco

Data format

The data must be in the following format(tsv):

安	B-PERSON
倍	E-PERSON
首	O
相	O
が	O
訪	O
米	S-LOC
し	O
た	O
 
本	B-DATE
日	E-DATE

Get Started

Import

First, import the necessary modules:

import os
import namaco
from namaco.data.reader import load_data_and_labels
from namaco.data.preprocess import prepare_preprocessor
from namaco.config import ModelConfig, TrainingConfig
from namaco.models import CharNER

They include loading modules, a preprocessor and configs.

Then, set parameters to use later:

DATA_ROOT = 'data/ja/ner'
SAVE_ROOT = './models'  # trained model
LOG_ROOT = './logs'     # checkpoint, tensorboard
model_file = os.path.join(SAVE_ROOT, 'model.h5')
model_config = ModelConfig()
training_config = TrainingConfig()

Loading data

After importing the modules, read data for training and validation:

train_path = os.path.join(DATA_ROOT, 'train.txt')
valid_path = os.path.join(DATA_ROOT, 'valid.txt')
x_train, y_train = load_data_and_labels(train_path)
x_valid, y_valid = load_data_and_labels(valid_path)

After reading the data, prepare preprocessor and model:

p = prepare_preprocessor(x_train, y_train)
model = CharNER(model_config, p.vocab_size(), p.tag_size())

Now we are ready for training :)

Training a model

Let's train a model. For training a model, we can use Trainer. Trainer manages everything about training. Prepare an instance of Trainer class and give train data and valid data to train method:

trainer = namaco.Trainer(model,
                         model.loss,
                         training_config,
                         log_dir=LOG_ROOT,
                         save_path=model_file,
                         preprocessor=p)
trainer.train(x_train, y_train, x_valid, y_valid)

If training is progressing normally, progress bar would be displayed as follows:

...
Epoch 3/15
702/703 [============================>.] - ETA: 0s - loss: 60.0129 - f1: 89.70
703/703 [==============================] - 319s - loss: 59.9278   
Epoch 4/15
702/703 [============================>.] - ETA: 0s - loss: 59.9268 - f1: 90.03
703/703 [==============================] - 324s - loss: 59.8417   
Epoch 5/15
702/703 [============================>.] - ETA: 0s - loss: 58.9831 - f1: 90.67
703/703 [==============================] - 297s - loss: 58.8993   
...

Tagging a sentence

We can use Tagger for tagging text. Prepare an instance of Tagger class and give text to tag method:

tagger = namaco.Tagger(model_file, preprocessor=p, tokenizer=list)

Let's try to tag a sentence, 安倍首相が訪米した We can do it as follows:

>>> sent = '安倍首相が訪米した'
>>> tagger.analyze(sent)
{
  "language": "jp",
  "text": "安倍首相が訪米した",
  "entities": [
    {
      "text": "安倍",
      "type": "Person",
      "score": 0.972231
      "beginOffset": 0,
      "endOffset": 2,
    },
    {
      "text": "米",
      "type": "Location",
      "score": 0.941431
      "beginOffset": 6,
      "endOffset": 7,
    }
  ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
docs		docs
namaco		namaco
tests		tests
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

namaco

Demo

Feature Support

Install

Data format

Get Started

Import

Loading data

Training a model

Tagging a sentence

About

Releases

Packages

Languages

chakki-works/namaco

Folders and files

Latest commit

History

Repository files navigation

namaco

Demo

Feature Support

Install

Data format

Get Started

Import

Loading data

Training a model

Tagging a sentence

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages