CofeehousePy/mods/stopwords/coffeehousemod_stopwords/data/README

33 lines
909 B
Plaintext

Stopwords Corpus
This corpus contains lists of stop words for several languages. These
are high-frequency grammatical words which are usually ignored in text
retrieval applications.
They were obtained from:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/
The stop words for the Romanian language were obtained from:
http://arlc.ro/resources/
The English list has been augmented
https://github.com/nltk/nltk_data/issues/22
The German list has been corrected
https://github.com/nltk/nltk_data/pull/49
A Kazakh list has been added
https://github.com/nltk/nltk_data/pull/52
A Nepali list has been added
https://github.com/nltk/nltk_data/pull/83
An Azerbaijani list has been added
https://github.com/nltk/nltk_data/pull/100
A Greek list has been added
https://github.com/nltk/nltk_data/pull/103
An Indonesian list has been added
https://github.com/nltk/nltk_data/pull/112