MULTILEXICON

The MULTILEXICON is a multilingual word-based lexicon from English, French, and Brazilian Portuguese languages.

In this first version, all composed translations from Google Translate were excluded.

The focus was on the orthographic information, especially regarding the neighborhood. Four neighborhood measures were used: Coltheart's N, Levenshtein Distance, OLD20, and Uniqueness Point.

First, this categories were derived for mono-lexicons. Second, this categories were derived for bi-lexicons and all-lexicons.

Third, word-language-pairs were derived for Levenshtein Distance, Relative Levenshtein Distance, and Uniqueness Point.

Finally, complementary categories were derived, such as: frequency, word length, cvcv structure, reverse word, among others.

We hope the MULTILEXICON can be a useful tool for stimuli selection and control in psycholinguistic experiments, translation resource, and language modeling database! Enjoy!

Downloads

* MULTILEXICON - Manual *

* MULTILEXICON - clean *
MULTILEXICON - raw
MULTILEXICON - base

Subtlex-UK
Subtlex-FR
Subtlex-BP
Subtlex raw: UK-FR-BP
Subtlex clean: UK-FR-BP
Hunspell clean: UK-FR-BP
Google Translations: UK-FR-BP

R script - subtlex cleaning
R script - categories
Hunspell dictionaries

Credits

Authors: Gustavo Estivalet, Maylton Fernandes, Márcio Leitão

Filliation: Federal University of Paraiba, Laboratory of Language Processing, Brazil

Contact: contato@lexicodoportugues.com

Future

Deliver simplified translations from Google Translate

Extend languages: Dutch, German, Italian, Polish, Spanish

Derive phonological categories