Seattle, Washington, United States
Tirtha is a highly motivated, versatile Data engineering lead with extensive experience in the Data space spanning across different projects in Big data engineering, data warehousing, data modeling, data governance, and data management, building & maintaining enterprise grade, highly reliable, mission-critical big data systems. Experienced in leading and incubating teams on-shore and off-shore, as well as manage cross-functional relationships with product managers & data scientists. Experience in technical mentorship and recruitment participation • A True Team Player: Strong team player with track record of cross functional collaboration. A willingness to jump in and help when needed, learn and teach new skills, and have the experience and professionalism required to meet objectives. • Domain: A track record of delivering high quality code in complex codebases of key applications and services. • Cross-Cultural Leadership: Collaboration with the Data Center SMEs, Data Scientists, and Program Managers. • Problem Solver: Excellent decision making, analytical and problem solving skills. Design and maintain technical and project documentation. • Leading & management: Exceptional project management skills. Lead small teams. Recruit & mentor team members. • Communicator: Outstanding interpersonal and communication skills. Technical Skillset • Experience with Big Data technologies such as Spark, Map reduce, Cascading, Hadoop, Hive, Presto, Sqoop. • Experience with AWS services such as EMR, EC2, lambda, sns, dynamo db, glue, CFT and other services. • Extensive experience with Data Modeling and Data Architecture around both batch and streaming datasets. • Streaming technologies: Kafka, Flink • Programming: Python. Beginner level Scala and Swift. Prior experience coding in Java and Bash Scripting. • Databases: Trino, Snowflake, Oracle, Teradata. • Orchestration: Airflow/ ControlM • Observability: Datadog, Splunk • CICD using Jenkins, TeamCity • Infrastructure as code using Cloud formation Templates, Terraform, Pulumi. • Version control: Git / Stash • Familiarity with DataScience & ML packages.
I process and analyze large datasets from Siri & Apple Intelligence to understand how asset delivery is impacting these features from availability to usage. I do so in privacy preserving manner, running analytics on device and uploading to server for post processing. I typically work with Pyspark, Flink, Trino, Iceberg, Postgres, Elastic Search, some AWS services, Kubernetes, Docker, Airflow. I also dabble in Swift for writing on device analytics pipelines that runs on billions of apple devices. I am currently a team lead and DRI major data efforts for our org.
Building and supporting petabyte scale complex data pipelines on different data platforms that power critical dashboards, surfacing key insights about Salesforce's product adoption & utilization, that enable Product Managers and other decision-makers across the company to bring together insights and inform our product strategy.
Currently working on data platform for paid search engine marketing partners and building in-house applications to optimize spend analysis by incorporating a feedback cycle of spend and performance for each channel and using realtime bidding intelligence. All applications are built in AWS with focus on performance and delivery. The pipeline also manages feature engineering for data science teams. • Expertise building data pipelines on large complex datasets using Spark & other big data SaaS services. • Programming languages: Python, Java, Bash • Strong SQL skills (Presto, Spark SQL, Hive, Snowflake db) • Cloud platforms: Amazon Web Services, Qubole, Google Cloud Platform • Airflow, ControlM • Version control : Perforce/ Git • CICD using jenkins • Well versed with Google Adwords & Google Ads APIs (SEM & META channels) • Excellent communication skills to collaborate with cross functional partners and independently drive projects and decisions • Currently managing team of 4 offshore devs & qa.
• Reduced AWS EMR cluster cost by 20% by creating an application which would spin off a dynamic cluster based on the volume of data being processed. The application bids for the best price for AWS spot instances • Designed and built the entire Hotwire’s financial metrics data, which is consumed by almost every business unit at Hotwire. Built the logic for the main KPI metrics of Gross profit, Net revenue, Cost. • Pivotal in hiring and ramping up entire Operational team in India and has set up processes to reduce operational cost by 60%. Hired and built a new team of 6 in Gurgaon, India from scratch and performed knowledge transfer and mentoring to the team. • Worked with Online marketing team to support the marketing data needs. Worked on attribution projects. • Designed and developed ETL processes by leveraging old tech stack as well as new big data technologies on AWS to build robust scalable systems. Technologies: • Big Data/ Hadoop tech stack (Hive, Spark, Cascading) on Amazon Web Services • ETL / Informatica PowerCenter • ORACLE Exadata, DB2, Teradata (MPP) • UNIX, Python, Java • ControlM • Version control : Stash / Git • Misc: JIRA , Splunk
Worked on data integration pipelines for UBS Swiss bank. Static reference data is one of the most critical datasets required for any trading system.