The Stanford segmenter distribution includes tools for segmenting Chinese and Arabic text. See README-Chinese.txt and README-Arabic.txt for more details. ------------------------------------ LICENSE ------------------------------------ This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/ . For more information, bug reports, fixes, contact: Christopher Manning Dept of Computer Science, Gates 2A Stanford CA 94305-9020 USA manning@cs.stanford.edu ------------------------------------ CHANGES ------------------------------------ 2020-11-17 4.2.0 Update for compatibility 2020-05-10 4.0.0 New Chinese segmenter trained off of CTB 9.0 2018-10-16 3.9.2 Update for compatibility 2018-02-27 3.9.1 Updated for compatibility 2016-10-31 3.7.0 Update for compatibility 2015-12-09 3.6.0 Update for compatibility 2015-04-20 3.5.2 Update for compatibility 2015-01-29 3.5.1 Update for compatibility 2014-10-26 3.5.0 Upgrade to Java 1.8 2014-08-27 3.4.1 Update for compatibility 2014-06-16 3.4 Update Arabic segmenter 2014-01-04 3.3.1 Bugfix release 2013-11-12 3.3.0 Update for compatibility 2013-06-19 3.2.0 Improve handling of line-by-line input 2013-04-04 1.6.8 ctb7 model, -nthreads option 2012-11-11 1.6.7 Bugfixes to both Arabic and Chinese segmenters; Chinese segmenter can now load files from jar 2012-07-09 1.6.6 Improved Arabic model 2012-05-22 1.6.5 Supports stdin. 2012-03-09 1.6.4 Arabic segmenter now included. 2011-12-16 1.6.3 Updated code to maintain compatibility 2011-09-14 1.6.2 Updated code to maintain compatibility 2011-06-15 1.6.1 Updated code to maintain compatibility 2011-05-15 1.6 Updated models, code to be compatible with other current releases 2008-05-21 1.5 The models in distribution incorporate training lexicon features. In addition, the segmenter now supports k-best output. 2006-05-12 1.0 This distribution includes models of two segmentation standards -- CTB and PK (Beijing Univ.) 2006-04-10 0.9 Add normalization for punctuation/symbol characters from the "ASCII" (U+0021-U+0075) code point range.