CofeehousePy/services/corenlp/doc/tagger/README-Models.txt

Stanford POS Tagger, v4.2.0 - 2020-11-17
Copyright (c) 2002-2020 The Board of Trustees of
The Leland Stanford Junior University. All Rights Reserved.

This document contains (some) information about the models included in
this release and that may be downloaded for the POS tagger website at
http://nlp.stanford.edu/software/tagger.html . All of the models mentioned 
in this document are in the downloaded package in the same directory as this 
readme. All taggers are accompanied by the props files used to create
them; please examine these files for more detailed information about the
creation of the taggers.

For English, the bidirectional taggers are slightly more accurate, but
tag much more slowly; choose the appropriate tagger based on your
speed/performance needs.

English taggers
---------------------------
english-left3words-distsim.tagger
Trained on WSJ sections 0-18 and extra parser training data using the
left3words architecture and includes word shape and distributional
similarity features. Penn tagset. UDv2.0 tokenization standard.

english-bidirectional-distsim.tagger
Trained on WSJ sections 0-18 using a bidirectional architecture and
including word shape and distributional similarity features.
Penn Treebank tagset. UDv2.0 tokenization standard.

english-caseless-left3words-distsim.tagger
Trained on WSJ sections 0-18 and extra parser training data using the
left3words architecture and includes word shape and distributional
similarity features. Penn tagset. Ignores case. UDv2.0 tokenization
standard.


Chinese tagger
---------------------------
chinese-nodistsim.tagger
Trained on a combination of CTB7 texts from Chinese and Hong Kong
sources.
LDC Chinese Treebank POS tag set.

chinese-distsim.tagger
Trained on a combination of CTB7 texts from Chinese and Hong Kong
sources with distributional similarity clusters.
LDC Chinese Treebank POS tag set.

Arabic tagger
---------------------------
arabic.tagger
Trained on the *entire* ATB p1-3.
When trained on the train part of the ATB p1-3 split done for the 2005
JHU Summer Workshop (Diab split), using (augmented) Bies tags, it gets

French tagger
---------------------------
french-ud.tagger
Trained on the French GSD (UDv2.2) data set

German tagger
---------------------------
german-ud.tagger
Trained on the German GSD (UDv2.2) data set

Spanish tagger
--------------------------
spanish-ud.tagger
Trained on the Spanish AnCora (UDv2.0) data set
Added NSFW classification 2021-01-14 08:07:24 +01:00			`Stanford POS Tagger, v4.2.0 - 2020-11-17`
			`Copyright (c) 2002-2020 The Board of Trustees of`
			`The Leland Stanford Junior University. All Rights Reserved.`

			`This document contains (some) information about the models included in`
			`this release and that may be downloaded for the POS tagger website at`
			`http://nlp.stanford.edu/software/tagger.html . All of the models mentioned`
			`in this document are in the downloaded package in the same directory as this`
			`readme. All taggers are accompanied by the props files used to create`
			`them; please examine these files for more detailed information about the`
			`creation of the taggers.`

			`For English, the bidirectional taggers are slightly more accurate, but`
			`tag much more slowly; choose the appropriate tagger based on your`
			`speed/performance needs.`

			`English taggers`
			`---------------------------`
			`english-left3words-distsim.tagger`
			`Trained on WSJ sections 0-18 and extra parser training data using the`
			`left3words architecture and includes word shape and distributional`
			`similarity features. Penn tagset. UDv2.0 tokenization standard.`

			`english-bidirectional-distsim.tagger`
			`Trained on WSJ sections 0-18 using a bidirectional architecture and`
			`including word shape and distributional similarity features.`
			`Penn Treebank tagset. UDv2.0 tokenization standard.`

			`english-caseless-left3words-distsim.tagger`
			`Trained on WSJ sections 0-18 and extra parser training data using the`
			`left3words architecture and includes word shape and distributional`
			`similarity features. Penn tagset. Ignores case. UDv2.0 tokenization`
			`standard.`


			`Chinese tagger`
			`---------------------------`
			`chinese-nodistsim.tagger`
			`Trained on a combination of CTB7 texts from Chinese and Hong Kong`
			`sources.`
			`LDC Chinese Treebank POS tag set.`

			`chinese-distsim.tagger`
			`Trained on a combination of CTB7 texts from Chinese and Hong Kong`
			`sources with distributional similarity clusters.`
			`LDC Chinese Treebank POS tag set.`

			`Arabic tagger`
			`---------------------------`
			`arabic.tagger`
			`Trained on the entire ATB p1-3.`
			`When trained on the train part of the ATB p1-3 split done for the 2005`
			`JHU Summer Workshop (Diab split), using (augmented) Bies tags, it gets`

			`French tagger`
			`---------------------------`
			`french-ud.tagger`
			`Trained on the French GSD (UDv2.2) data set`

			`German tagger`
			`---------------------------`
			`german-ud.tagger`
			`Trained on the German GSD (UDv2.2) data set`

			`Spanish tagger`
			`--------------------------`
			`spanish-ud.tagger`
			`Trained on the Spanish AnCora (UDv2.0) data set`