Post by Prateek Jalgaonkar
Lead Analytics Engineer @ Cigna Evernorth | Building Scalable Healthcare Analytics Systems
Spark Beginner Trap! I thought Spark was slow. Turns out my mental model was wrong. Partition = unit of parallelism. Everything clicked after that. Most Spark jobs are slow not because of bad code, but because of bad partitioning. Think in partitions. Spark will take care of the rest. #PySpark #ApacheSpark #DataEngineering #BigData #LearningInPublic