12/24/2019 - migration and implementation of all pages from the Brazilian Portuguese Lexicon - Alpha:
www.lexicodoportugues.com. Updates and checking tests from the functioning of all pages.
03/01/2019 - due to the large number of accesses, overload on the free hosting server http://www.biz.nf, and limitations of a) 250 MB disk space, b) 1 MySQL database, c) database up to 100 MB, and d) 5000 MB of data transfer, a Host Server Website Plan S was purchased from
HostGator. Maintenance period of the Brazilian Portuguese Lexicon - Alpha:
09/20/2015 - writing and availability of the Léxico do Português Brasileiro - Alfa 2 Manual in Brazilian Portuguese and writing and availability of the Brazilain Portuguese Lexicon - Alpha 2 Manual in English. Translation and implementation of all webpages in the Brazilian Portuguese Lexicon in English. Google Translate implementation in all Brazilian Portuguese Lexicon pages for the translation of the site to the various languages available on Google Translate. We suggest to use the Brazilian Portuguese Lexicon in English when using the Google Translator because if it is used in Portuguese, given that the results of the searches are also in Portuguese, Google Translate will translate as well the results found in Portuguese.
03/21/2015 - development of the Linguistic Statistics page with several free open tools and online resources in HTML/PHP to linguistic and statistical analysis: F1, F2, F, minf, Hartley Test, Normalization between 0-1, Reverse Word, Hamming Distance, Levenshtein Distance, Oorthographic Neighbors (Coltheart's N), Average Levenshtein Distance, Relative Entropy, Word frequency, Zipf's Distribution, etc. Writing of a manuscript from the Brazilian Portuguese Lexicon - Alpha for submission in the scientific journal PLOSONE.
03/25/2014 - acquisition of the domain to the Brazilian Portuguese Lexicon:
www.lexicodoportuguês.com in
HostGator. DNS domain configuration to redirect to the
http://portugueselexicon.co.nf/. Inauguration of the new domain of the Brazilian Portuguese Lexicon:
03/23/2014 - complementation of the other webpages of the Brazilian Portuguese Lexicon - Alpha: 1) availability of downloadable files such as corpus, lists, conventions, scripts, etc., 2) organization of the tools, programs, corpora, and literature in linguistics, statistics, and psycholinguistics, 3) description of the Brazilian Portuguese Lexicon development, 4) description of the credits, information, authors, corpus origin, programs used, etc.
03/21/2014 - insertion of four columns with data on pseudoword results: 1) grammatical category according to user-defined category, 2) frequency of pseudoword calculated from the sum of bigrams or trigrams frequencies that comprise the pseudowords, 3) log10 from the calculated pseudoword frequency, and 4) number of letters, calculated from the final pseudoword form generated by the motor. Translation of the main page of the Brazilian Portuguese Lexicon (index.php), containing simple and complex research engines in English.
03/20/2014 - the random pseudoword engine generates pseudowords at random. The user enters four fields: 1) number of letters of the pseudowords to be generated, 2) number of pseudowords to be generated, 3) grammatical category that the pseudowords must belong (all, adj, adv, gram, nom, num, ver), and 4) type criterion for the generation of pseudowords (bigrams or trigrams). The engine builds pseudowords in both directions, from left to right and from right to left, beginning with a bigram or trigram of type "#xx" or "#xx", which defines the pseudoword frontiers, according to the number of letters it will concatenates bigrams or trigrams that share the maximum orthographic information with the previous bigram or trigram (1 letter for bigrams and 2 letters to trigrams). The pseudoword results are presented in two columns, one with pseudowords generated from the left to right and another from right to left.
03/18/2014 - development and implementation of two Brazilian Portuguese pseudoword engines generation in: 1) random and 2) occurrences. Using data from bigrams and trigrams, we performed a computation of the total number of bigrams and trigrams in the different grammatical categories, the overall frequency, and the frequency of each bigram and trigram according to its word position. Thus, we obtained two new database, one with the bigrams and anothe with trigrams.
03/15/2014 - development and implementation of another module in the results area that offer basic statistics of the search: a) mean, b) maximum value, and c) minimum value of the following categories: a) ortho_freq, b) log10_ortho_freq, c) nb_letters, d) ortho_neig, and e) old20. Further, more statistical data will be included. Thus, we consider terminated the development of the Alpha version of the Brazilian Portuguese Lexicon on the site
03/13/2014 - development and implementation of a limiter and navigation algorithm for the results data. Users can choose the number of words presented (50, 100, 200, 500) and have two buttons (next and previous) to navigate between the result pages, relieving MySQL search and result presentation. Next to this result navigation module, we developed a space that presents search results with four informations: 1) total number of words found in the search, 2) total number of pages that comprises the search, 3) interval of words presented, and 4) current result page. Creation of a button to export (Export .csv) the entire search to a dowloadable .csv file by the user.
03/08/2014 - release of the first version of the Brazilian Portuguese Lexicon - Alpha with 215 175 rows of words and 21 columns of information: 1) orthography, 2) gram_cat, 3) gram_inf, 4) ortho_freq, 5) ortho_/M, 6) log10_ortho_freq, 7) nb_letters, 8) nb_homogr, 9) homographs, 10) pu_ortho, 11) ortho_neig, 12) old20, 13) CVCV_ortho, 14) bigrams, 15) trigrams, 16) rev_ortho, 17) rev_CVCV_ortho, 18) rev_bigrams, 19) rev_trigrams, 20) random, and 21) id. This table in .csv format have 45 MB size. Writing and availability of the Brazilian Portuguese Lexicon - Alpha 1 Manual in Brazilian Portuguese.
03/06/2014 - column and development of the algorithm for calculating the orthographic uniqueness point (pu_ortho), columns with the number of orthographic neighbors (Colthear's N) (viz_orto) [coltheart.N(orthographyX, orthography, distance = 1, method = "hamming", parallel = FALSE)] and Average Levenshtein Distance for the 20 closest words (old20) [old20(orthgraphyX, orthography, method = "levenshtein", parallel = FALSE)] with the
vwr package developed by
Emmanuel Keuleers to the
R software.
03/03/2014 - columns containing each letter of each word [substr(orthography, n, n)], transformation of the letters of the Brazilian Portuguese alphabet in the CVCV structure (vowel: V, consonant: C, ponctuation: P, number: N, symbol: S, accent: A), column with the bigrams of words concatenating the letters two-by-two, column with the trigrams of words concatenating the letters three-by-three, concatenation of bigrams and trigrams separated by underline "_" and limited on the left and right frontiers by hashmarks "#", column with a random number between 0 and 1 and eight digits of accuracy [runif(nrow, 0, 1)], development of the algorithm and columns with the reversed orthography (rev_ortho), CVCV_ortho (rev_CVCV_ortho), bigrams (rev_bigrams), and trigrams (rev_trigrams).
02/27/2014 - standardization of the words in lowercase [tolower(orthography)], sum of repeated forms [aggregate(orthography, list(ortho_freq), sum)], column in each file with the grammatical categories (gram_cat) (adj, adv, gram, nom, num, and ver), concatenation of all files in one file [merge(file1, file2, ...)], organization of the words by frequency (from most frequent to least frequent) and alphabetically (a-z) [order(ortho_freq, orthography)], computation of the total number of forms [nrow(orthography)] and the total frequency [sum(ortho_freq)], column with identification number (id) in ascending order [c(1:nrow)], this identification number automatically becomes word position in the lexicon and consequently follows the Zipf's distribution, column with word frequency per million of words (ortho_freq/M) [1000000*ortho_freq/total_freq], column with the log10 of the word frequency [log10(ortho_freq)], column with the word number of letters [nchar(orthography)], exclusion of the forms with more than 30 letters, column with number of homographic forms [aggregate(orthography, list(ortho_freq), sum)], column with the different grammatical categories of the homographic forms.
02/24/2014 - download of the 13 files in .txt format from the
NILC/São Carlos corpus on the
Linguateca website separated by grammatical categories (6 form files: adjectives, adverbs, grammatical, nouns, numerals, and verbs; 7 lemma files: adjectives, adverbs, grammatical, nouns, proper names, numerals, and verbs). Checking and opening of all files on the
R software. Computation of the total number of words and the total number of forms in all the files and comparation to the data provided in
Linguateca, Corpus NILC/São Carlos.
02/18/2014 - implementation and testing of the Brazilian Portuguese Lexicon on the internet with access on
http://portugueselexicon.co.nf domain. Importation of the pilot corpus database of the Brazilian Portuguese Lexicon in .csv format to a MySQL database. Proper functioning of every page. Implementation of wildcards. MySQL itself recognizes the symbols underline "_" to replace any letter and percentage "%" to replace a chain of letters. For numeric fields, development of a PHP algorithm that recognizes the symbols "<" and ">" (greater than and less than) and searches for sets of corresponding numerical caracteristics. Insertion of a block in the right of the main webpage body containing tips for search: 1) wildcard symbols that can be used "_", "%", "<" and ">" and 2) grammatical categories "adj, adv, gram, nom, a ver".
02/15/2014 - search of a free hosting site to host the Brazilian Portuguese Lexicon as an open, free, and public access website. Evaluation of the various sites found according to our needs: a) space of at least 100 MB, b) MySQL database, c) support the PHP language, and 4) free. Selection of the host site
http://www.biz.nf, which offers the following advantages: a) space of 250 MB, b) MySQL 5 database, c) support to PHP 4/5 language, d) free, and even e) 5000 MB of data transfer, f) free domain type
portugueselexicon.co.nf, g) POP3/SMTP webmail, and h) FTP control files.
02/12/2014 - programming of Java algorithms for maintenance of the information entered in the search fields after the submission of HTML forms. Integration of an organizing of result fields with two criteria: 1) selection of result organizing criterion and 2) selection of ascendant or descendant order of the result presentation. Insertion of the "Clear" button in the search engines to clean the data present on the forms. Establishment of a complex search engine with four search fields. Insertion of the button "+ Fields" in the complex search engine to a page (index2.php) with a complex search engine with eight fields of research.
02/11/2014 - development of the pages of the Brazilian Portuguese Lexicon: 1) Lexicon - homepage search in Brazilian Portuguese (index.php), 2) Pseudowords - page with the pseudoword generation engines of the Brazilian Portuguese, 3) Downloads - page with downloadable files of the Brazilian Portuguese Lexicon, 4) Tools - page with a several corpora tools, statistics, psycholinguistics, programs, and literature, 5) Updates - page describing the Brazilian Portuguese Lexicon development and implementation, 6) Credits - page with information, references, author, source, license, and thanks of the Brazilian Portuguese Lexicon, 7) Statistical Linguistics - page with several tools and resources for linguistic, psycholinguistic, and statistical analysis, 8) Linguateca - link to the Linguateca website, and 9) NILC - link to the NILC/São Carlos website.
02/10/2014 - visual configuration of the Brazilian Portuguese Lexicon on the localhost programmed in CSS. Establishment of the header, sidebar with links, body and results, and foot. Definition of the pages of the Brazilian Portuguese Lexicon: 1) Lexicon, 2) Pseudowords, 3) Downloads 4) Tools, 5) Updates, 6) Credits, 7) Statistical Linguistics, 8) Linguateca, and 9) NILC. Language settings for the use of the Latin alphabet and Brazilian Portuguese language. Codification in UTF-8 characters of the HTML page, phpMyAdmin, and MySQL database. Perfect functioning of the Brazilian Portuguese language, avoiding all orthographic problems such as accents, symbols, and special characters.
02/08/2014 - first version in localhost of the pilot of the Brazilian Portuguese Lexicon with two search engines: 1) simple search and 2) complex search. Simple search consists of a text area to search for multiple words. Complex search have two criteria fields for word search. Development of the criteria to be searched, followed by the selection of "YES" or "NO" to define whether or not the criteria should be considered, followed by the field for the insertion of criteria to search for. Each search engine has a "Search" button to start the search and present the results.
01.20.2014 - discussion with
Prof. Dr. Sandra M. Aluísio and
Porf. Dr. Maria das Graças Volpe Nunes about the
NILC/São Carlos corpus, with over 32 million words and 49 MB, and with
Porf. Dr. Tony Berber Sardinha about the
Corpus Brasileiro, with more than 1 billion words and 3.2 GB. We concluded and agreed that the NILC/São Carlos corpus would be the best corpus for the development of Brazilian Portuguese Lexicon following the criteria: a) number of words (about 32 million) consistent with other corpora, b) frequencies already computed, c) number of files (13), file sizes, and total size of the corpus (49 MB), d) processing facilities for the development of Brazilian Portuguese Lexicon, e) organization of the corpus in individual files in .txt by grammatical categories, and forms and lemmas, and f) facilities from the resources and publications already developed by the NILC/São Carlos.
12/21/2013 - construction of the Brazilian Portuguese Lexicon pilot page. Using a localhost with the
XAMPP program, which already has preinstaled the Apache, MySQL, PHP, and Perls modules. Configuration of the
phpMyAdmin to import the previously computed corpus in .csv format to the MySQL database. Use of the
Notepad++ software for programming the HTML/PHP interface page between the users and the MySQL database.
10/22/2013 - construction of the pilot corpus of the Brazilian Portuguese Lexicon from the .txt file of verbal forms from the NILC/São Carlos available in Linguateca. Use of the R software for the development of 10 experimental columns of information: 1) orthography, 2) ortho_freq, 3) ortho_freq/M, 4) log10_ortho_freq, 5) nb_letters, 6) gram_cat, 7) gram_inf, 8) rev_orth, 9) CVCV_ortho, 10) rev_CVCV_ortho.
04/08/2013 - pre-selection in Linguateca the site of the two largest corpora of the Brazilian Portuguese: 1) Corpus Brasileiro and 2) NILC/São Carlos. The Corpus Brasileiro has about 1 billion words and several files totalizing 3.2 GB, the NILC/São Carlos corpus has approximately 32 million words and all 13 files have a total size of 49 MB. From the NILC/São Carlos corpus, we did a small pilot corpus with only verbs (it is the grammatical category which I am researching in my Ph.D), accounting for about 80 000 forms.
06/16/2013 - consolidation and conception of the Brazilian Portuguese Lexicon - LexPorBR idea, need for programming background
HTML webpage development,
MySQL database knowledge, and the programming of the interface interface between user and the database in
PHP language. Need of basic knowledge of Java and CSS programming languages to complement the webpages.
01/04/2013 - looking for a psycholinguistic corpus of the Brazilian Portuguese for selecting words to psycholinguistic experiments in Brazilian Portuguese, we found the
Linguateca website that hosts several corpora of the Portuguese, but no psycholinguistic corpus of the Brazilian Portuguese. I noted in a postit "do the Brazilian Portuguese Lexicon". We realize the tools necessary for the construction and development of a psycholinguistic corpus of the Brazilian Portuguese: access to a large and varied Brazilian Portuguese corpus with form and lemma frequencies already computed, processing of these data in a statistical software
R, avaiability of this corpus as a free, open internet source.