Sentence similarity prediction tool based on the combination of pre-trained Transformers based models.
!pip install https://github.com/anasampa/metaembedding/archive/plusmetatool.zip
from metaembedding.metaembedding import MetaEmbedding
List of possible models: https://www.sbert.net/docs/pretrained_models.html
model1 = 'paraphrase-multilingual-mpnet-base-v2'
model2 = 'paraphrase-multilingual-MiniLM-L12-v2'
model3 = 'distiluse-base-multilingual-cased-v1'
embedding_models = [model1, model2, model3]
model = MetaEmbedding(embedding_models)
model.summary()
model.train_run(X_train,Y_train, epochs=2)
model.predict_model([['I am a sentence','I am another sentence']])
model.save_weights('name_of_file')
weight = 'name_of_file'
embedding_models = [model1, model2, model3]
model = MetaEmbedding(embedding_models)
model.load_weights(weight)
Generally, similarity tasks in Natural Language Processing is a regression multi-text task.
The original LIME text module is only for classification tasks with single text inputs, which prevents its direct use for similarity comparison.
We extended the original LIME to apply it in models that take multiple texts as inputs, such as pairs of sentences, and also for accepting regression models with text entrances.
More about the sentence similarity task and this LIME extension can be found in the "Sentence Similarity Recognition in Portuguese from Multiple Embedding Models" (citation at the end of the readme).
Original LIME (without the extension): https://github.com/marcotcr/lime
s1 = 'I am a sentence'
s2 = 'I am another sentence'
pair = [s1,s2]
model.lime.explain_in_notebook(pair,num_features=30,num_samples=50)
ps: The token [SEP] is diplayed to indicate sentence separation. It is not computed in the predictions of the model.
model.lime.explain_as_list(pair,num_features=30,num_samples=50)
@inproceedings{rodrigues2022sentence,
title={Sentence Similarity Recognition in Portuguese from Multiple Embedding Models},
author={Rodrigues, Ana Carolina and Marcacini, Ricardo M.},
booktitle={2022 21th IEEE international conference on machine learning and applications (ICMLA)},
pages={154--159},
year={2022},
organization={IEEE}
}