NNSegmentation is a package for Word Segmentation using neural networks based on package LibN3L. It includes different combination of Neural network architectures (TNN, RNN, GatedNN, LSTM and GRNN) with Objective function(Softmax, CRF Max-Margin, CRF Maximum Likelihood). It also provides the capability of combination of Sparse feature along with above models. In addition, this package can easily support various user-defined neural network structures.
Please read Table 4 in LibN3L: A lightweight Package for Neural NLP.
- Download LibN3L library and compile it.
- Open CMakeLists.txt and change "../LibN3L/" into the directory of your LibN3L package.
cmake .
make
This example shows how to train three Chinese word segmentation models for the pku corpus of the Sighan Bakeoff 2005 dataset.
These models are
- SparseCRFMMLabeler which only considers the sparse features and works like a CRF model
- LSTMCRFMMLabeler which only uses neural embeddings as input and employs CRF Maximum Likelihood as training objective.
- SparseLSTMCRFMMLabeler which supports both neural embeddings and sparse features and also employs CRF Maximum Likelihood as training objective.
This example data contains
- Sparse Features "train.feats", "dev.feats" and "test.feats". The training features and dev features are extracted only from a subset of the pku corpus.
- Character Unigram Embedding "char.vec"
- Character Bigram Embedding "bi.vec"
- Character Trigram Embedding "tri.vec"
- Parameter Setting File "sparse" for SparseCRFMMLabeler, "lstm" for LSTMCRFMMLabeler and "sparselstm" for SparseLSTMCRFMMLabeler.
For more details about the example, please read the example "readme.md".