com.ibm.sparktc.sparkbench.datageneration
Actually run the workload.
Actually run the workload. Takes an optional DataFrame as input if the user supplies an inputDir, and returns the generated results DataFrame.
Validate that the data set has a correct schema and fix if necessary.
Validate that the data set has a correct schema and fix if necessary. This is to solve issues such as the KMeans load-from-disk pathway returning a DataFrame with all the rows as StringType instead of DoubleType.