CofeehousePy/services/corenlp/doc/segmenter/README.txt

95 lines
3.0 KiB
Plaintext

The Stanford segmenter distribution includes tools for segmenting
Chinese and Arabic text. See README-Chinese.txt and README-Arabic.txt
for more details.
------------------------------------
LICENSE
------------------------------------
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/ .
For more information, bug reports, fixes, contact:
Christopher Manning
Dept of Computer Science, Gates 2A
Stanford CA 94305-9020
USA
manning@cs.stanford.edu
------------------------------------
CHANGES
------------------------------------
2020-11-17 4.2.0 Update for compatibility
2020-05-10 4.0.0 New Chinese segmenter trained off of CTB 9.0
2018-10-16 3.9.2 Update for compatibility
2018-02-27 3.9.1 Updated for compatibility
2016-10-31 3.7.0 Update for compatibility
2015-12-09 3.6.0 Update for compatibility
2015-04-20 3.5.2 Update for compatibility
2015-01-29 3.5.1 Update for compatibility
2014-10-26 3.5.0 Upgrade to Java 1.8
2014-08-27 3.4.1 Update for compatibility
2014-06-16 3.4 Update Arabic segmenter
2014-01-04 3.3.1 Bugfix release
2013-11-12 3.3.0 Update for compatibility
2013-06-19 3.2.0 Improve handling of line-by-line input
2013-04-04 1.6.8 ctb7 model, -nthreads option
2012-11-11 1.6.7 Bugfixes to both Arabic and Chinese
segmenters; Chinese segmenter can now load
files from jar
2012-07-09 1.6.6 Improved Arabic model
2012-05-22 1.6.5 Supports stdin.
2012-03-09 1.6.4 Arabic segmenter now included.
2011-12-16 1.6.3 Updated code to maintain compatibility
2011-09-14 1.6.2 Updated code to maintain compatibility
2011-06-15 1.6.1 Updated code to maintain compatibility
2011-05-15 1.6 Updated models, code to be compatible with
other current releases
2008-05-21 1.5 The models in distribution incorporate
training lexicon features. In addition, the
segmenter now supports k-best output.
2006-05-12 1.0 This distribution includes models of two
segmentation standards -- CTB and PK
(Beijing Univ.)
2006-04-10 0.9 Add normalization for punctuation/symbol
characters from the "ASCII" (U+0021-U+0075)
code point range.