public class SimpleStemmer extends Object implements IStemmer
morphy man page in the Wordnet
distribution, which can be found at
http://wordnet.princeton.edu/man/morphy.7WN.html It also attempts to
strip "ful" endings. It does not search Wordnet to see if stems actually
exist. In particular, quoting from that man page:
The following table shows the rules of detachment used by Morphy. If a word ends with one of the suffixes, it is stripped from the word and the corresponding ending is added. ... No rules are applicable to adverbs.
POS Suffix Ending
Morphy contains code that searches for nouns ending with ful and performs a transformation on the substring preceding it. It then appends 'ful' back onto the resulting string and returns it. For example, if passed the nouns "boxesful", it will return "boxful".
| Modifier and Type | Field and Description |
|---|---|
static String |
ENDING_ch |
static String |
ENDING_e |
static String |
ENDING_man |
static String |
ENDING_null |
static String |
ENDING_s |
static String |
ENDING_sh |
static String |
ENDING_x |
static String |
ENDING_y |
static String |
ENDING_z |
static Map<POS,List<StemmingRule>> |
ruleMap |
static String |
SUFFIX_ches |
static String |
SUFFIX_ed |
static String |
SUFFIX_er |
static String |
SUFFIX_es |
static String |
SUFFIX_est |
static String |
SUFFIX_ful |
static String |
SUFFIX_ies |
static String |
SUFFIX_ing |
static String |
SUFFIX_men |
static String |
SUFFIX_s |
static String |
SUFFIX_ses |
static String |
SUFFIX_shes |
static String |
SUFFIX_ss |
static String |
SUFFIX_xes |
static String |
SUFFIX_zes |
static String |
underscore |
| Constructor and Description |
|---|
SimpleStemmer() |
| Modifier and Type | Method and Description |
|---|---|
List<String> |
findStems(String word,
POS pos)
Takes the surface form of a word, as it appears in the text, and the
assigned Wordnet part of speech.
|
protected List<String> |
getNounCollocationRoots(String composite)
Handles stemming noun collocations.
|
Map<POS,List<StemmingRule>> |
getRuleMap()
Returns a set of stemming rules used by this stemmer.
|
protected List<String> |
getVerbCollocationRoots(String composite)
Handles stemming verb collocations.
|
protected String |
normalize(String word)
Converts all whitespace runs to single underscores.
|
protected List<String> |
stripAdjectiveSuffix(String adj)
Strips suffixes from the specified word according to the adjective rules.
|
protected List<String> |
stripNounSuffix(String noun)
Strips suffixes from the specified word according to the noun rules.
|
protected List<String> |
stripVerbSuffix(String verb)
Strips suffixes from the specified word according to the verb rules.
|
public static final String underscore
public static final String SUFFIX_ches
public static final String SUFFIX_ed
public static final String SUFFIX_es
public static final String SUFFIX_est
public static final String SUFFIX_er
public static final String SUFFIX_ful
public static final String SUFFIX_ies
public static final String SUFFIX_ing
public static final String SUFFIX_men
public static final String SUFFIX_s
public static final String SUFFIX_ss
public static final String SUFFIX_ses
public static final String SUFFIX_shes
public static final String SUFFIX_xes
public static final String SUFFIX_zes
public static final String ENDING_null
public static final String ENDING_ch
public static final String ENDING_e
public static final String ENDING_man
public static final String ENDING_s
public static final String ENDING_sh
public static final String ENDING_x
public static final String ENDING_y
public static final String ENDING_z
public static final Map<POS,List<StemmingRule>> ruleMap
public Map<POS,List<StemmingRule>> getRuleMap()
public List<String> findStems(String word, POS pos)
IStemmernull, which means that all parts of speech should be
considered. Returns a list of stems, in preferred order. No stem should
be repeated in the list. If no stems are found, this call returns an
empty list. It will never return null.protected String normalize(String word)
word - the string to be normalizedNullPointerException - if the specified string is nullIllegalArgumentException - if the specified string is empty or all whitespaceprotected List<String> stripNounSuffix(String noun)
noun - the word to be modifiedNullPointerException - if the specified word is nullprotected List<String> getNounCollocationRoots(String composite)
composite - the word to be modifiedNullPointerException - if the specified word is nullprotected List<String> stripVerbSuffix(String verb)
verb - the word to be modifiedNullPointerException - if the specified word is nullprotected List<String> getVerbCollocationRoots(String composite)
composite - the word to be modifiedNullPointerException - if the specified word is nullprotected List<String> stripAdjectiveSuffix(String adj)
adj - the word to be modifiedNullPointerException - if the specified word is nullCopyright © 2018. All rights reserved.