0 votes
in Spark Preliminaries by
In Apache Spark RDD, what does SchemaRDD mean?

1 Answer

0 votes
by

SchemaRDD is an RDD made up of row objects, which are just wrappers for basic arrays of strings or integers, and schema information about the data type in each column.

SchemaRDD made it easier for developers to debug code and do unit tests on the SparkSQL core module in their daily work. The idea can be summed up by saying that the data structures inside RDD should be described formally, like a relational database schema. SchemaRDD gives you some simple relational query interface functions that you can use with SparkSQL on top of the essential functions that most RDD APIs offer.

...