Post by Prateek Jalgaonkar

Lead Analytics Engineer @ Cigna Evernorth | Building Scalable Healthcare Analytics Systems

❌ β†’ πŸ’‘ β†’ βœ… A small Spark lesson brushed today ❌ Mistake I tried to calculate total amount and order count by chaining aggregations after groupBy(). The idea was right, but Spark didn’t accept it. πŸ’‘ Learning In Spark: groupBy() only defines how data is grouped Aggregations should be defined together using agg() You can’t chain .sum() and .count() the way we do in Pandas βœ… Fix from pyspark.sql.functions import sum, count orders_summary = orders_filter.groupBy("customer_id").agg( sum("amount").alias("total_amount"), count("amount").alias("order_count") ) Small mistake, big clarity. This made Spark aggregations finally click for me. #ApacheSpark #LearningInPublic #SparkBasics #DataEngineering #BigData