|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectedu.mit.jwi.morph.SimpleStemmer
public class SimpleStemmer
Provides simple a simple pattern-based stemming facility based on the "Rules
of Detachment" as described in the morphy man page in the Wordnet
distribution, which can be found at
http://wordnet.princeton.edu/man/morphy.7WN.html It also attempts to
strip "ful" endings. It does not search Wordnet to see if stems actually
exist. In particular, quoting from that man page:
The following table shows the rules of detachment used by Morphy. If a word ends with one of the suffixes, it is stripped from the word and the corresponding ending is added. ... No rules are applicable to adverbs.
POS Suffix Ending
Morphy contains code that searches for nouns ending with ful and performs a transformation on the substring preceding it. It then appends 'ful' back onto the resulting string and returns it. For example, if passed the nouns "boxesful", it will return "boxful".
| Field Summary | |
|---|---|
static String |
ENDING_ch
|
static String |
ENDING_e
|
static String |
ENDING_man
|
static String |
ENDING_null
|
static String |
ENDING_s
|
static String |
ENDING_sh
|
static String |
ENDING_x
|
static String |
ENDING_y
|
static String |
ENDING_z
|
static String |
SUFFIX_ches
|
static String |
SUFFIX_ed
|
static String |
SUFFIX_er
|
static String |
SUFFIX_es
|
static String |
SUFFIX_est
|
static String |
SUFFIX_ful
|
static String |
SUFFIX_ies
|
static String |
SUFFIX_ing
|
static String |
SUFFIX_men
|
static String |
SUFFIX_s
|
static String |
SUFFIX_ses
|
static String |
SUFFIX_shes
|
static String |
SUFFIX_xes
|
static String |
SUFFIX_zes
|
static String |
underscore
|
| Constructor Summary | |
|---|---|
SimpleStemmer()
|
|
| Method Summary | |
|---|---|
List<String> |
findStems(String word,
POS pos)
Takes the surface form of a word, as it appears in the text, and the assigned Wordnet part of speech. |
protected List<String> |
getNounCollocationRoots(String composite)
Handles stemming noun collocations. |
protected List<String> |
getVerbCollocationRoots(String composite)
Handles stemming verb collocations. |
protected String |
normalize(String word)
Converts all whitespace runs to single underscores. |
protected List<String> |
stripAdjectiveSuffix(String adj)
Strips suffixes from the specified word according to the adjective rules. |
protected List<String> |
stripNounSuffix(String noun)
Strips suffixes from the specified word according to the noun rules. |
protected List<String> |
stripVerbSuffix(String verb)
Strips suffixes from the specified word according to the verb rules. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String underscore
public static final String SUFFIX_ches
public static final String SUFFIX_ed
public static final String SUFFIX_es
public static final String SUFFIX_est
public static final String SUFFIX_er
public static final String SUFFIX_ful
public static final String SUFFIX_ies
public static final String SUFFIX_ing
public static final String SUFFIX_men
public static final String SUFFIX_s
public static final String SUFFIX_ses
public static final String SUFFIX_shes
public static final String SUFFIX_xes
public static final String SUFFIX_zes
public static final String ENDING_null
public static final String ENDING_ch
public static final String ENDING_e
public static final String ENDING_man
public static final String ENDING_s
public static final String ENDING_sh
public static final String ENDING_x
public static final String ENDING_y
public static final String ENDING_z
| Constructor Detail |
|---|
public SimpleStemmer()
| Method Detail |
|---|
public List<String> findStems(String word,
POS pos)
IStemmernull, which means that all parts of speech should be
considered. Returns a list of stems, in preferred order. No stem should
be repeated in the list. If no stems are found, this call returns an
empty list. It will never return null.
findStems in interface IStemmerword - the surface form of which to find the stempos - the part of speech to find stems for; if null,
find stems for all parts of speech
protected String normalize(String word)
word - the string to be normalized
NullPointerException - if the specified string is null
IllegalArgumentException - if the specified string is empty or all whitespaceprotected List<String> stripNounSuffix(String noun)
noun - the word to be modified
NullPointerException - if the specified word is nullprotected List<String> getNounCollocationRoots(String composite)
composite - the word to be modified
NullPointerException - if the specified word is nullprotected List<String> stripVerbSuffix(String verb)
verb - the word to be modified
NullPointerException - if the specified word is nullprotected List<String> getVerbCollocationRoots(String composite)
composite - the word to be modified
NullPointerException - if the specified word is nullprotected List<String> stripAdjectiveSuffix(String adj)
adj - the word to be modified
NullPointerException - if the specified word is null
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||