146 lines
7.1 KiB
Plaintext
146 lines
7.1 KiB
Plaintext
Stanford Classifier v4.2.0 - 2020-11-17
|
|
-------------------------------------------------
|
|
|
|
Copyright (c) 2003-2012 The Board of Trustees of
|
|
The Leland Stanford Junior University. All Rights Reserved.
|
|
|
|
Original core classifier code and command line interface by Dan Klein
|
|
and Chris Manning. Support code, additional features, etc. by
|
|
Kristina Toutanova, Jenny Finkel, Galen Andrew, Joseph Smarr, Chris
|
|
Cox, Roger Levy, Rajat Raina, Pi-Chuan Chang, Marie-Catherine de
|
|
Marneffe, Eric Yeh, Anna Rafferty, and John Bauer. This release
|
|
prepared by John Bauer.
|
|
|
|
This package contains a maximum entropy classifier.
|
|
|
|
For more information about the classifier, point a web browser at the included javadoc directory, starting at the Package page for the edu.stanford.nlp.classify package, and looking also at the ColumnDataClassifier class documentation therein.
|
|
|
|
This software requires Java 8 (JDK 1.8.0+). (You must have installed it
|
|
separately. Check the command "java -version".)
|
|
|
|
|
|
QUICKSTART
|
|
|
|
COMMAND LINE INTERFACE
|
|
To classify the included example dataset cheeseDisease (in the examples directory), type the following at the command line while in the main classifier directory:
|
|
|
|
java -cp "*:." edu.stanford.nlp.classify.ColumnDataClassifier -prop examples/cheese2007.prop
|
|
|
|
This will classify the included test data, cheeseDisease.test, based on the probability that each example is a cheese or a disease, as calculated by a linear classifier trained on cheeseDisease.train.
|
|
|
|
The cheese2007.prop file demonstrates how features are specified. The first feature in the file, useClassFeature,
|
|
indicates that a feature should be used based on class frequency in the training set. Most other features are
|
|
calculated on specific columns of data in your tab-delimited text file. For example, "1.useNGrams=true" indicates
|
|
that n-gram features should be created for the values in column 1 (numbering begins at 0!). Note that you must
|
|
specify, for example, "true" in "1.useNGrams=true"; "1.useNGrams" alone will not cause n-gram features to be created.
|
|
N-gram features are character subsequences of the string in the column, for example, "t", "h", "e", "th", "he",
|
|
"the" from the word "the". You can also specify various other kinds of features such as just using the string value
|
|
as a categorical feature (1.useString=true) or splitting up a longer string into bag-of-words features
|
|
(1.splitWordsRegexp=[ ] 1.useSplitWords=true). The prop file also allows a choice of printing and optimization
|
|
options, and allows you to specify training and test files (e.g., in cheese2007.prop under the "Training input"
|
|
comment). See the javadoc for ColumnDataClassifier within the edu.stanford.nlp.classify package for more information
|
|
on these and other options.
|
|
|
|
Another included dataset is the iris dataset which uses numerical features to separate types of irises. To specify the use of a real-valued rather than categorical feature, you can use one or more of "realValued", "logTransform", or "logitTransform" for a given column. "realValued" adds the number in the given column as a feature value, while the transform options perform either a log or a logit transform on the value first. The format of these feature options is the same as for categorical features; for instance, iris2007.prop shows the use of real valued features such as "2.realValued=true".
|
|
|
|
CLASSIFYING YOUR OWN DATA FILES
|
|
To classify your own data files, they should be in tab-delimited text from which to make features as shown above, SVMLight format, or as tab-delimited text with the exact feature values you would like. Then specify the train and test files on the command line or in a .prop file with "trainFile=/myPath/myTrainFile.train" and "testFile==/myPath/myTestFile.test". You can also create a serialized classifier using the serializeTo option followed by a file path.
|
|
|
|
CODE EXAMPLES
|
|
You can also directly use the classes in this package to train classifiers within other programs. An example of this is shown in ClassifierExample, in the package edu.stanford.nlp.classify. This class demonstrates how to build a classifier factory, creating a classifier and setting various parameters in the classifier, training the classifier, and finally testing the classifier on a different data set.
|
|
|
|
NO GUI
|
|
This package does not provide a graphical user interface. The
|
|
classifier is accessible only via the command line or programmatically.
|
|
|
|
|
|
LICENSE
|
|
|
|
// Stanford Classifier
|
|
// Copyright (c) 2003-2007 The Board of Trustees of
|
|
// The Leland Stanford Junior University. All Rights Reserved.
|
|
//
|
|
// This program is free software; you can redistribute it and/or
|
|
// modify it under the terms of the GNU General Public License
|
|
// as published by the Free Software Foundation; either version 2
|
|
// of the License, or (at your option) any later version.
|
|
//
|
|
// This program is distributed in the hope that it will be useful,
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
// GNU General Public License for more details.
|
|
//
|
|
// You should have received a copy of the GNU General Public License
|
|
// along with this program. If not, see http://www.gnu.org/licenses/ .
|
|
//
|
|
// For more information, bug reports, fixes, contact:
|
|
// Christopher Manning
|
|
// Dept of Computer Science, Gates 2A
|
|
// Stanford CA 94305-9020
|
|
// USA
|
|
// java-nlp-support@lists.stanford.edu
|
|
// https://nlp.stanford.edu/software/classifier.html
|
|
|
|
|
|
-------------------------
|
|
CHANGES
|
|
-------------------------
|
|
|
|
2020-11-17 4.2.0 Update for compatibility
|
|
|
|
2020-05-10 4.0.0 Update for compatibility
|
|
|
|
2018-10-16 3.9.2 Update for compatibility
|
|
|
|
2018-02-27 3.9.1 Updated for compatibility
|
|
|
|
2016-10-31 3.7.0 Update for compatibility
|
|
|
|
2015-12-09 3.6.0 Update for compatibility
|
|
|
|
2015-04-20 3.5.2 Update for compatibility
|
|
|
|
2015-01-29 3.5.1 New input/output options, support for GloVe
|
|
word vectors
|
|
|
|
2014-10-26 3.5.0 Upgrade to Java 1.8
|
|
|
|
2014-08-27 3.4.1 Update for compatibility
|
|
|
|
2014-06-16 3.4 Update for compatibility
|
|
|
|
2014-01-04 3.3.1 Bugfix release
|
|
|
|
2013-11-12 3.3.0 Update for compatibility
|
|
|
|
2013-06-19 3.2.0 Update for compatibility
|
|
|
|
2013-04-04 2.1.8 Update to maintain compatibility
|
|
|
|
2012-11-11 2.1.7 new pair-of-words features
|
|
|
|
2012-07-09 2.1.6 Minor bug fixes
|
|
|
|
2012-05-22 2.1.5 Re-release to maintain compatibility
|
|
with other releases
|
|
|
|
2012-03-09 2.1.4 Bugfix for svmlight format
|
|
|
|
2011-12-16 2.1.3 Re-release to maintain compatibility
|
|
with other releases
|
|
|
|
2011-09-14 2.1.2 Change ColumnDataClassifier to be an object
|
|
with API rather than static methods;
|
|
ColumnDataClassifier thread safe
|
|
|
|
2011-06-15 2.1.1 Re-release to maintain compatibility
|
|
with other releases
|
|
|
|
2011-05-15 2.1 Updated with more documentation
|
|
|
|
2007-08-15 2.0 New command line interface, substantial
|
|
increase in options and features
|
|
(updated on 2007-09-28 with a bug fix)
|
|
|
|
2003-05-26 1.0 Initial release
|