Class CustomPatternCsv
java.lang.Object
net.sansa_stack.hadoop.core.pattern.CustomPatternCsv
- All Implemented Interfaces:
CustomPattern
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclassstatic classstatic classstatic classstatic class -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected intprotected org.aksw.commons.model.csvw.domain.api.Dialectprotected CustomPatternprotected int -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedCustomPatternCsv(org.aksw.commons.model.csvw.domain.api.Dialect dialect, int multilineFieldMaxLines, int cellMaxLength) -
Method Summary
Modifier and TypeMethodDescriptionstatic voidautoDetectStartInQuotedField(CustomPatternCsv.CustomMatcherCsv2 matcher, int rowProbeCount) Attempt to auto-detect whether the csv row matcher is positioned inside of a quoted field.static CustomPatterncreate(int multilineFieldMaxLines) static CustomPatterncreate(org.aksw.commons.model.csvw.domain.api.Dialect dialect, int multilineFieldMaxLines, int cellMaxLength) static PatterncreatePattern(org.aksw.commons.model.csvw.domain.api.Dialect dialect) static booleanisEffectiveQuoteBwd(CharSequence cs, int offset, int minOffset, char quoteChar, char escapeChar) static booleanisEffectiveQuoteFwd(CharSequence cs, int offset, char quoteChar, char escapeChar) static booleanisFollowedByEffectiveQuote(CharSequence cs, int offset, char quoteChar, char escapeChar) Determine whether the next character is an effective quote.static booleanisPrecededByEffectiveQuote(CharSequence cs, int offset, int minOffset, char quoteChar, char escapeChar) Checks whether the previous position has a quote that is not escapedstatic voidmatcher(CharSequence charSequence)
-
Field Details
-
dialect
protected org.aksw.commons.model.csvw.domain.api.Dialect dialect -
fieldSeparatorAndNewlinePattern
-
multilineFieldMaxLines
protected int multilineFieldMaxLines -
cellMaxLength
protected int cellMaxLength
-
-
Constructor Details
-
CustomPatternCsv
protected CustomPatternCsv(org.aksw.commons.model.csvw.domain.api.Dialect dialect, int multilineFieldMaxLines, int cellMaxLength)
-
-
Method Details
-
createPattern
-
create
-
create
public static CustomPattern create(org.aksw.commons.model.csvw.domain.api.Dialect dialect, int multilineFieldMaxLines, int cellMaxLength) -
matcher
- Specified by:
matcherin interfaceCustomPattern
-
isPrecededByEffectiveQuote
public static boolean isPrecededByEffectiveQuote(CharSequence cs, int offset, int minOffset, char quoteChar, char escapeChar) Checks whether the previous position has a quote that is not escaped -
isFollowedByEffectiveQuote
public static boolean isFollowedByEffectiveQuote(CharSequence cs, int offset, char quoteChar, char escapeChar) Determine whether the next character is an effective quote. ."" -> not an effective quote because it is an escaped quote symbol Any odd number of "" implies an effective quote, -
isEffectiveQuoteFwd
public static boolean isEffectiveQuoteFwd(CharSequence cs, int offset, char quoteChar, char escapeChar) -
isEffectiveQuoteBwd
public static boolean isEffectiveQuoteBwd(CharSequence cs, int offset, int minOffset, char quoteChar, char escapeChar) -
mainX
-
autoDetectStartInQuotedField
public static void autoDetectStartInQuotedField(CustomPatternCsv.CustomMatcherCsv2 matcher, int rowProbeCount) Attempt to auto-detect whether the csv row matcher is positioned inside of a quoted field. The matcher is reset and invoked with either assumption (inside of a quoted field and outside of it). The assumption that leads to a consistent sample of rows will take effect. Consistent means that all rows have the same length and that no quote errors are encountered.- Parameters:
matcher-
-