This project aims to study the asssessment protocol of the measures of the semantic distance between words and concepts. If you find this helpful, please consider citing:
Hadj Taieb, M.A., Zesch, T. & Ben Aouicha, M. A survey of semantic relatedness evaluation datasets and procedures. Artif Intell Rev (2019). https://doi.org/10.1007/s10462-019-09796-3
This folder conatins the datasets exploited for assessing the semantic simialrity/relatedness measures for different langauges.
Dataset | |pairs| | Year | Type | Ref | Link |
English (EN) | |||||
MayoSRS | 101 | 2011 | Rel | (Pakhomov et al., 2011) | link1 link2 link3 |
UMNSRS | 587 | 2010 | Rel | (Pakhomov et al., 2010) | link1 link2 |
Chinese (CN) | |||||
Words | 240 | 2012 | Rel | (Wang et al.,2012) | link1 link2 |
Dataset | |pairs| | Year | Type | Ref | Link |
English (EN) | |||||
GTRD | 66 | 2018 | Rel | (Chen et al.,2018) | link1 link2 |
RG65 | 65 | Sim | |||
Language | Year | Type | Ref | Link | |
English | EN | link1 link2 |
|||
French | FR | 2011 | Sim | (Joubarne and Inkpen, 2011) | link1 link2 |
French | FR | 2018 | Sim | (Barzegar et al., 2018) | link1 link2 |
Persian | FA | link1 link2 |
|||
Portuguese | PL | 2014 | Sim | (Granada et al., 2014) | link1 link2 |
Spanish | ES | link1 link2 |
|||
Swedish | SE | link1 link2 |
|||
WordSim353 | 353 | Rel | |||
link1 link2 |
|||||
SimLex999 | 999 | Sim | |||
link1 link2 |
Dataset | Langauge 1 | Langauge 2 | |pairs| | Year | Type | Ref | Link |
WordSim353_DE_IT | German | Italian | 589 | 2015 | Rel | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_DE_ES | German | Spanish | 125 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_PT_FA | Portuguese | Persian | 122 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_FR_PT | French | Portuguese | 92 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_FR_FA | French | Persian | 100 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_FR_ES | French | Spanish | 103 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_FR_DE | French | German | 96 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_ES_PT | Spanish | Portuguese | 113 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_ES_FA | Spanish | Persian | 122 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_EN_PT | English | Portuguese | 120 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_EN_FR | English | French | 100 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_EN_FA | English | Persian | 120 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_EN_ES | English | Spanish | 126 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_EN_DE | English | German | 125 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_DE_PT | German | Portuguese | 118 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
RG65_DE_FA | German | Persian | 122 | 2015 | Sim | (Camacho-Collados et al.,2015) | link1 link2 |
- Almarsoomi, F.A., O’Shea, J., Bandar, Z., Crockett, K.A., 2013. AWSS: An Algorithm for Measuring Arabic Word Semantic Similarity
- Saif, A., Aziz, M.J.A., Omar, N., 2014. Evaluating knowledge-based semantic measures on Arabic. International Journal on Communications Antenna and Propagation 4, 180–194.
- Rubenstein, H., Goodenough, J.B., 1965. Contextual Correlates of Synonymy. Commun. ACM 8, 627–633.
- Miller, G.A., Charles, W.G., 1991. Contextual correlates of semantic similarity. Language & Cognitive Processes 6, 1–28.
- Wu, Y., Li, W., 2016. Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement, in: Natural Language Understanding Intelligent Applications - 5th CCF Conference Natural Language Processing Chinese Computing, NLPCC 2016, 24th International Conference Computer Processing Oriental Languages, ICCPOL2016, Kunming, China, December 26, 2016, Proceedings. pp. 828–839.
- Camacho-Collados, J., Pilehvar, M.T., Navigli, R., 2015. A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets, in: ACL(2). The Association for Computer Linguistics, pp. 1–7.
- Gurevych, I., 2005. Using the Structure of a Conceptual Network in Computing Semantic Relatedness, in: Natural Language Processing IJCNLP 2005,Second International Joint Conference,Jeju Island, Korea, October 11-13, 2005, Proceedings. pp. 767–778.
- Joubarne, C., Inkpen, D., 2011. Comparison of Semantic Similarity for Different Languages Using the Google n-gram Corpus and Second-Order Co-occurrence Measures, in: Advances Artificial Intelligence - 24th Canadian Conference Artificial Intelligence, Canadian AI 2011, St.John’s, Canada, May 25-27, 2011. Proceedings. pp. 216–221.
- S.V.S. Pakhomov, T. Pedersen, B. McInnes, G.B. Melton, A. Ruggieri, C.G. Chute: Towards a framework for developing semantic relatedness reference standards J Biomed Inform, 44 (2011), pp. 251-265
- Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., Melton, G.B., 2010. Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. AMIA Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 2010, 572–576.
- Zugang Chen, Jia Song and Yaping Yang: An Approach to Measuring Semantic Relatedness of Geographic Terminologies Using a Thesaurus and Lexical Database Sources, International Journal of Geo-Information, 2018.
- X. Wang, Y. Jia, B. Zhou, Z. Ding, Z. Liang: Computing semantic relatedness using Chinese Wikipedia links and taxonomy, J. Chinese Comput. Syst., 32 (11) (2012), pp. 2237-2242.
- Cramer, I., Finthammer, M., 2008. An Evaluation Procedure for Word Net Based Lexical Chaining: Methods and Issues, in: Proceedings Fourth Global WordNet Conference (GWC 2008). University of Szeged, Department of Informatics, Szeged, Ungarn.
- Felix Hill, Roi Reichart, and Anna Korhonen. 2015. SimLex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 41(4):665–695.
- Ira Leviant and Roi Reichart. 2015. Separated by an un-common language: Towards judgment language informed vector space modeling. arXiv preprint arXiv:1508.00106.
- Radinsky, K., Agichtein, E., Gabrilovich, E., Markovitch, S., 2011. A Word at a Time: Computing Word Relatedness Using Temporal Semantic Analysis, in: Proceedings 20th International Conference World Wide Web, WWW’11. ACM, Hyderabad, India, pp. 337–346.
- Halawi, G., Dror, G., Gabrilovich, E., Koren, Y., 2012. Large-scale Learning of Word Relatedness with Constraints, in: Proceedings 18th ACM SIGKDD International Conference Knowledge Discovery Data Mining, KDD’12. ACM, Beijing, China, pp. 1406–1414.
- Szumlanski, S.R., Gomez, F., Sims, V.K., 2013. A New Set of Norms for Semantic Relatedness Measures., in: ACL(2). The Association for Computer Linguistics, pp. 890–895.
- Siamak Barzegar, Brian Davis, Manel Zarrouk, Siegfried Handschuh, André Freitas: SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages. LREC 2018
- José Camacho-Collados, Mohammad Taher Pilehvar and Roberto Navigli. A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), Short Papers, Beijing, China, July 27-29, 2015.
- Granada, R., Santos, C.T. dos, Vieira, R., 2014. Comparing Semantic Relatedness between Word Pairs in Portuguese Using Wikipedia, in: Computational Processing Portuguese Language - 11th International Conference, PROPOR 2014, São Carlos/SP, Brazil, October 6-8, 2014. Proceedings. pp. 170–175.
- Ziegler, C.-N., Simon, K., Lausen, G., 2006. Automatic Computation of Semantic Proximity Using Taxonomic Knowledge, in: Proceedings 15th ACM International Conference Information Knowledge Management, CIKM’06. ACM, Arlington, Virginia, USA, pp. 465–474.
- Gracia, J., Mena, E., 2008. Web-based Measure of Semantic Relatedness, in: InProc. 9th International Conference Web Information Systems Engineering (WISE2008), Auckland (NewZealand). Springer, pp. 136–150.
- Luong, T., Socher, R., Manning, C., 2013. Better Word Representations with Recursive Neural Networks for Morphology, in: Proceedings Seventeenth Conference Computational Natural Language Learning. IAssociation for Computational Linguistics, Sofia, Bulgaria, pp. 104–113.
- Hassan, S., Mihalcea, R., 2009. Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge, in: Proceedings 2009 Conference Empirical Methods Natural Language Processing: Volume 3 - Volume 3, EMNLP’09. Association for Computational Linguistics, Singapore, pp. 1192–1201.
- Kim Anh Nguyen, Sabine Schulte im Walde and Ngoc Thang Vu. Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HTL). New Orleans, Louisiana, June 2018
- Bui Van Tan, Nguyen Phuong Thai, Pham Van Lam: Construction of a word similarity dataset and evaluation of word similarity techniques for Vietnamese. KSE 2017: Hue, Vietnam, 65-70.
- Ugur Sopaoglu and Gonenc Ercan. Evaluation of Semantic Relatedness Measures for Turkish Language. In Proceedings of the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2016), Konya, Turkey, 2016.
- Gökhan Ercan, Olcay Taner Yildiz: AnlamVer: Semantic Model Evaluation Dataset for Turkish - Word Similarity and Relatedness. COLING 2018: 3819-3836
- Zesch, T., Gurevych, I., 2006. Automatically creating datasets for measures of semantic relatedness, in: COLING/ACL 2006 Workshop Linguistic Distances. Sydney, Australia, pp. 16–24.
- Akhtar, S.S., Gupta, A., Vajpayee, A., Srivastava, A., Shrivastava, M., 2017. Word Similarity Datasets for Indian Languages: Annotation and Baseline Systems, in: LAW@ACL. Association for Computational Linguistics, pp. 91–94.
- Ágoston Tóth: How Similar: Word Similarity Judgments in English and Hungarian, 2013.
- Sakaizawa, Y., Komachi, M., 2017. Construction of a Japanese Word Similarity Dataset. CoRR abs/1703.05916.
- Li, P., Wang, H., Zhu, K.Q., Wang, Z., Wu, X., 2013. Computing Term Similarity by Large Probabilistic is A Knowledge, in: Proceedings 22Nd ACM International Conference Conference Information; Knowledge Management, CIKM’13. ACM, San Francisco, California, USA, pp. 1401–1410
- Yang, D., Powers, D.M.W., 2006. Verb Similarity on the Taxonomy of Wordnet, in: In 3rd International WordNet Conference (GWC-06), Jeju Island, Korea.
- Gerz, D., Vulic, I., Hill, F., Reichart, R., Korhonen, A., 2016. SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity, in: Proceedings 2016 Conference Empirical Methods Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. pp. 2173–2182.
- Resnik, P., Diab, M.: Measuring verb similarity. In: Proceedings of the Twenty-second Annual Conference of the Cognitive Science Society: August 13-15, 2000, Institute for Research in Cognitive Science,University of Pennsylvania, Philadelphia, PA (2000)
- Martinez-Gil, J., Aldana-Montes, J.F. Semantic similarity measurement using historical google search patterns. Information Systems Frontiers 15(3): 399-410 (2013).
- Baker, S., Reichart, R., Korhonen, A., 2014. An Unsupervised Model for Instance Level Subcategorization Acquisition, in: Proceedings 2014 Conference Empirical Methods Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting SIGDAT, Special Interest Group ACL. pp. 278–289.
- Carina Silberer and Mirella Lapata. 2014. Learning Grounded Meaning Representations with Autoencoders. In Proceedings of ACL 2014, Baltimore, MD.
- Cinková, S., 2016. WordSim353 for Czech, in: Sojka, P., Horák, A., Kopecek, I., Pala, K. (Eds.), Text,Speech, Dialogue: 19th International Conference, TSD 2016, Brno , Czech Republic, September 12-16, 2016, Proceedings. Springer International Publishing, Cham, pp. 190–197.