Class CustomPatternCsv

java.lang.Object
net.sansa_stack.hadoop.core.pattern.CustomPatternCsv
All Implemented Interfaces:
CustomPattern

public class CustomPatternCsv extends Object implements CustomPattern
  • Field Details

    • dialect

      protected org.aksw.commons.model.csvw.domain.api.Dialect dialect
    • fieldSeparatorAndNewlinePattern

      protected CustomPattern fieldSeparatorAndNewlinePattern
    • multilineFieldMaxLines

      protected int multilineFieldMaxLines
    • cellMaxLength

      protected int cellMaxLength
  • Constructor Details

    • CustomPatternCsv

      protected CustomPatternCsv(org.aksw.commons.model.csvw.domain.api.Dialect dialect, int multilineFieldMaxLines, int cellMaxLength)
  • Method Details

    • createPattern

      public static Pattern createPattern(org.aksw.commons.model.csvw.domain.api.Dialect dialect)
    • create

      public static CustomPattern create(int multilineFieldMaxLines)
    • create

      public static CustomPattern create(org.aksw.commons.model.csvw.domain.api.Dialect dialect, int multilineFieldMaxLines, int cellMaxLength)
    • matcher

      public CustomPatternCsv.CustomMatcherCsv2 matcher(CharSequence charSequence)
      Specified by:
      matcher in interface CustomPattern
    • isPrecededByEffectiveQuote

      public static boolean isPrecededByEffectiveQuote(CharSequence cs, int offset, int minOffset, char quoteChar, char escapeChar)
      Checks whether the previous position has a quote that is not escaped
    • isFollowedByEffectiveQuote

      public static boolean isFollowedByEffectiveQuote(CharSequence cs, int offset, char quoteChar, char escapeChar)
      Determine whether the next character is an effective quote. ."" -> not an effective quote because it is an escaped quote symbol Any odd number of "" implies an effective quote,
    • isEffectiveQuoteFwd

      public static boolean isEffectiveQuoteFwd(CharSequence cs, int offset, char quoteChar, char escapeChar)
    • isEffectiveQuoteBwd

      public static boolean isEffectiveQuoteBwd(CharSequence cs, int offset, int minOffset, char quoteChar, char escapeChar)
    • mainX

      public static void mainX(String[] args)
    • autoDetectStartInQuotedField

      public static void autoDetectStartInQuotedField(CustomPatternCsv.CustomMatcherCsv2 matcher, int rowProbeCount)
      Attempt to auto-detect whether the csv row matcher is positioned inside of a quoted field. The matcher is reset and invoked with either assumption (inside of a quoted field and outside of it). The assumption that leads to a consistent sample of rows will take effect. Consistent means that all rows have the same length and that no quote errors are encountered.
      Parameters:
      matcher -