Article
Saedi, Chakaveh, António Branco, João António Rodrigues and João Ricardo Silva, 2018, "WordNet Embeddings", In Proceedings, 3rd Workshop on Representation Learning for Natural Language Processing (RepL4NLP), 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne, Australia.
WordNet used in the above paper
Test sets used in above paper
Please note that the semantic network to semantic space method presented in the above paper includes random-based subprocedures (e.g. selecting one word from a set of words with identical number of outgoing edges). The test scores may present slight fluctuations over different runs of the code.
Models
The best wnet2vec model we have obtained that was ran with 60,000 words using Princeton WordNet 3.0, referred in the article, is available for download here.
How to run wn2vec software
To provide input files to the software the following structure must exist:
|-- main.py
|-- data
| |-- input
| | |-- language_wnet
| | | |-- *wnet_files
| | |-- language_testset
| | | |-- *testset_files
| |-- output
|-- modules
| |-- input_output.py
| |-- sort_rank_remove.py
| |-- vector_accuracy_checker.py
| |-- vector_distance.py
| |-- vector_generator.py
Where language is the language that you are using that must be indicated in main.py in the variable lang. If the language isn't supported by the current path routing in the code, which was mainly use for experiments, you may add the path to the directory in the files input_output.py, vector_generator.py and vector_accuracy_checker.py.
Various variables for the output of the model, such as embedding dimension, can be found in main.py.
To run the software, you will need the following packages:
- Numpy
- progressbar
- keras
- sklearn
- scipy
- gensim
Python3.5 was used for the experimentation.