33 lines
909 B
Plaintext
33 lines
909 B
Plaintext
Stopwords Corpus
|
|
|
|
This corpus contains lists of stop words for several languages. These
|
|
are high-frequency grammatical words which are usually ignored in text
|
|
retrieval applications.
|
|
|
|
They were obtained from:
|
|
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/
|
|
|
|
The stop words for the Romanian language were obtained from:
|
|
http://arlc.ro/resources/
|
|
|
|
The English list has been augmented
|
|
https://github.com/nltk/nltk_data/issues/22
|
|
|
|
The German list has been corrected
|
|
https://github.com/nltk/nltk_data/pull/49
|
|
|
|
A Kazakh list has been added
|
|
https://github.com/nltk/nltk_data/pull/52
|
|
|
|
A Nepali list has been added
|
|
https://github.com/nltk/nltk_data/pull/83
|
|
|
|
An Azerbaijani list has been added
|
|
https://github.com/nltk/nltk_data/pull/100
|
|
|
|
A Greek list has been added
|
|
https://github.com/nltk/nltk_data/pull/103
|
|
|
|
An Indonesian list has been added
|
|
https://github.com/nltk/nltk_data/pull/112
|