Abhik Das

Data Engineer II at SatSure | AWS Certified Data Engineer | Building Scalable Data Platforms

Bengaluru, Karnataka, India

About

Learning something new has always been super fun. I like to do experiment with stuffs, face challenges and solve problems! I'm a data engineer trying to figure out how the work can be done in a more effective way. The world of AI is very fascinating to me. I just love the way AI is making our daily life much easier, so want to contribute my own portion to the industry.

Experience

Data Engineer II at SatSure
May 2026 - Present · 2 mos
Turbolab Technologies (Full-time · 4 yrs 1 mo)
- Python Developer
  Feb 2023 - Jun 2025 · 2 yrs 5 mos
  • Modernized a customer-facing, SLA-critical data pipeline, migrating from pandas to Spark to enable weekly ingestion of 3M+ records to a customer-managed S3 data lake, reducing end-to-end latency by 90%. • Built distributed PySpark transformations processing 4M+ records into 40k+ nested JSON files, joining, validating, and uploading in parallel to GCP data lake, ensuring correctness under immutable storage constraints. • Optimized memory-intensive pandas workflows using lazy evaluation and Modin on Ray, reducing its cloud compute cost by 60% in monthly cloud billing and improving processing efficiency. • Achieved 99% job reliability by automating distributed workloads on legacy Ubuntu clusters through a custom SSH-based orchestration framework, eliminating manual intervention and recurring failures. • Led productivity through code reviews and mentored junior engineers in data engineering and modular design.
- Junior Python Developer
  Jun 2021 - Feb 2023 · 1 yr 9 mos
  • Designed scalable, fault-tolerant batch data ingestion systems, handling 1M+ data points per day from APIs and unstructured web sources, ensuring timely data availability for external consumers. • Partnered with R&D team to reduce ingestion latency by 40% via adaptive IP rotation and distributed execution. • Increased platform reliability and reduced delivery errors by 30% by implementing data quality checks, retries, and alerts to detect job failures and schema drifts.