SO_word2vec

A word2vec model trained on Stack Overflow posts

This repository contains information related to the word2vec model presented in paper 'Word Embeddings for the Software Engineering domain' as published on the data showcase track of MSR'18.

Model file

The the pre-trained model is stored in a .bin file (of approximate size 1.5 GB) which can be accessed at this link: http://doi.org/10.5281/zenodo.1199620

Instructions on how to use the model

Prerequisites

To load the model you will need Python 3.5 and the gensim library.

Loading the model

from gensim.models.keyedvectors import KeyedVectors
word_vect = KeyedVectors.load_word2vec_format("SO_vectors_200.bin", binary=True)

Querying the model

Examples of semantic similarity queries

words=['virus','java','mysql']
for w in words:
    try:
        print(word_vect.most_similar(w))
    except KeyError as e:
            print(e)

print(word_vect.doesnt_match("java c++ python bash".split()))

Examples of analogy queries

print(word_vect.most_similar(positive=['python', 'eclipse'], negative=['java']))

References

The official gensim docs provide further details and comprehensive documentation on how a word2vec model can be used for various NLP tasks.
If you want to use this model please cite Efstathiou, V., Chatzilenas, C., Spinellis, D., 2018. "Word Embeddings for the Software Engineering Domain". In Proceedings of the 15th International Conference on Mining Software Repositories. ACM.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MSR18-w2v.pdf		MSR18-w2v.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SO_word2vec

Model file

Instructions on how to use the model

Prerequisites

Loading the model

Querying the model

References

About

Releases

Packages

vefstathiou/SO_word2vec

Folders and files

Latest commit

History

Repository files navigation

SO_word2vec

Model file

Instructions on how to use the model

Prerequisites

Loading the model

Querying the model

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages