CofeehousePy/mods/stopwords/coffeehousemod_stopwords/data
Netkas 3294bc4ea6 Added NSFW classification 2021-01-14 02:07:24 -05:00
..
README Added NSFW classification 2021-01-14 02:07:24 -05:00
arabic Added NSFW classification 2021-01-14 02:07:24 -05:00
azerbaijani Added NSFW classification 2021-01-14 02:07:24 -05:00
danish Added NSFW classification 2021-01-14 02:07:24 -05:00
dutch Added NSFW classification 2021-01-14 02:07:24 -05:00
english Added NSFW classification 2021-01-14 02:07:24 -05:00
finnish Added NSFW classification 2021-01-14 02:07:24 -05:00
french Added NSFW classification 2021-01-14 02:07:24 -05:00
german Added NSFW classification 2021-01-14 02:07:24 -05:00
greek Added NSFW classification 2021-01-14 02:07:24 -05:00
hungarian Added NSFW classification 2021-01-14 02:07:24 -05:00
indonesian Added NSFW classification 2021-01-14 02:07:24 -05:00
italian Added NSFW classification 2021-01-14 02:07:24 -05:00
kazakh Added NSFW classification 2021-01-14 02:07:24 -05:00
nepali Added NSFW classification 2021-01-14 02:07:24 -05:00
norwegian Added NSFW classification 2021-01-14 02:07:24 -05:00
portuguese Added NSFW classification 2021-01-14 02:07:24 -05:00
romanian Added NSFW classification 2021-01-14 02:07:24 -05:00
russian Added NSFW classification 2021-01-14 02:07:24 -05:00
slovene Added NSFW classification 2021-01-14 02:07:24 -05:00
spanish Added NSFW classification 2021-01-14 02:07:24 -05:00
swedish Added NSFW classification 2021-01-14 02:07:24 -05:00
tajik Added NSFW classification 2021-01-14 02:07:24 -05:00
turkish Added NSFW classification 2021-01-14 02:07:24 -05:00

README

Stopwords Corpus

This corpus contains lists of stop words for several languages.  These
are high-frequency grammatical words which are usually ignored in text
retrieval applications.

They were obtained from:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/snowball/stopwords/

The stop words for the Romanian language were obtained from:
http://arlc.ro/resources/

The English list has been augmented
https://github.com/nltk/nltk_data/issues/22

The German list has been corrected
https://github.com/nltk/nltk_data/pull/49

A Kazakh list has been added
https://github.com/nltk/nltk_data/pull/52

A Nepali list has been added
https://github.com/nltk/nltk_data/pull/83

An Azerbaijani list has been added
https://github.com/nltk/nltk_data/pull/100

A Greek list has been added
https://github.com/nltk/nltk_data/pull/103

An Indonesian list has been added
https://github.com/nltk/nltk_data/pull/112