What do you understand by Shuffling in Spark?

Question

What do you understand by Shuffling in Spark?

1 Answer

rajeshsharma · Answer 1 · 2024-09-14T23:27:38+0000

Shuffling is a process in Spark that redistributes data across different partitions or even across different nodes in a cluster. It occurs when an operation requires data to be grouped across partitions, such as reduceByKey, groupBy, and join. Shuffling is costly in terms of network I/O, disk I/O, and CPU, as it involves moving large amounts of data across the network.

What do you understand by Shuffling in Spark?

Please log in or register to answer this question.

1 Answer