MULTILEXICON - Multilingual Lexicon: English, French, Portuguese
MULTILEXICON
The MULTILEXICON is a multilingual word-based lexicon from English, French, and Brazilian Portuguese languages.
In this first version, all composed translations from Google Translate were excluded.
The focus was on the orthographic information, especially regarding the neighborhood. Four neighborhood measures were used: Coltheart's N, Levenshtein Distance, OLD20, and Uniqueness Point.
First, this categories were derived for mono-lexicons. Second, this categories were derived for bi-lexicons and all-lexicons.
Third, word-language-pairs were derived for Levenshtein Distance, Relative Levenshtein Distance, and Uniqueness Point.
Finally, complementary categories were derived, such as: frequency, word length, cvcv structure, reverse word, among others.
We hope the MULTILEXICON can be a useful tool for stimuli selection and control in psycholinguistic experiments, translation resource, and language modeling database! Enjoy!
Downloads
* MULTILEXICON - Manual ** MULTILEXICON - clean *
MULTILEXICON - raw
MULTILEXICON - base
Subtlex-UK
Subtlex-FR
Subtlex-BP
Subtlex raw: UK-FR-BP
Subtlex clean: UK-FR-BP
Hunspell clean: UK-FR-BP
Google Translations: UK-FR-BP
R script - subtlex cleaning
R script - categories
Hunspell dictionaries
Credits
Authors: Gustavo Estivalet, Maylton Fernandes, Márcio Leitão
Filliation: Federal University of Paraiba, Laboratory of Language Processing, Brazil
Contact: contato@lexicodoportugues.com
Future
Deliver simplified translations from Google Translate
Extend languages: Dutch, German, Italian, Polish, Spanish
Derive phonological categories
