Atish Lahiri

Database Architect at TxGov Project

Austin, Texas, United States

About

• 20+ years of experience in the IT industry covering various facets of Data Engineering comprising Application Development, Database Design, Performance Tuning, ETL/ELT & Data Pipeline Architectures • Python evangelist (successfully popularized usage in what was a Java-only VLDB project), Opensource enthusiast (keep decisionmakers updated with free, practical, permissive-license alternatives to expensive closed-source products) • Seasoned Information Solutions Architect covering VLDB design, VLDB performance optimization, Data Warehouse Architecture, ETL Design & Development and Big Data Solutions • Re-architected and highly optimized Oracle VLDB (> 10 TB originally, now > 55 TB) resulting in an average performance improvement of over 30 times, dramatic SLA (Monthly Processing Times) improvements (drop from 4 Days to 20 Hours) and scalability to handle 6 times more load • Explored cost reduction initiatives by leveraging commodity hardware and open source / free enterprise grade software stacks covering Big Data, Analytics and NoSQL products • Familiar with Snowflake (Snowpark , Snowflake SQL, Snowflake Scripting), Informatica (Transformations and Mappings) and have working knowledge of Data Orchestration/Workflow Management frameworks like Dagster • Adept at handling high-risk projects with high-visibility to executive management • Highly accomplished technical problem solver, educator and communicator Specialties: Data Engineering, Oracle VLDB Batch Performance Optimization, Data Integration, Data Analysis, Data Warehousing • Prefect (like Airflow), Spark, Kafka, Python, Redis, Javascript, MQ Series, Perl, XSLT, ZeroMQ • Data Analytics and Big Data: Polars, Pandas, NoSQL, Map-Reduce, Hadoop, Pig, MongoDB, Hive

Experience

  • Database Architect at Allied Consultants, Inc.
    Apr 2024 - Aug 2025 · 1 yr 5 mos

    Rewriting complex Business Rules which are applied to billions of different types of records every month. Using Python, Jupyter Notebook, Pandas, Polars and Awk for Exploratory Data Analysis of the historical records to identify Business Rule patterns, prune Legacy Rules which are not used, transform New Business Rules text specifications into individual categorical and value ranges, designing an advanced method to define the New Business Rules to help with both understanding their intent and performance, extracting data from Prod to Dev under extreme space and other constraints, iterative testing. Constructed Polars queries with a mix of SQL+Python constructs which were applied on Dataframes to identify Rule patterns.

  • Data Engineer at Midsize Fintech Company
    Oct 2022 - Feb 2024 · 1 yr 5 mos

    Working in Data Services team with Python, Pandas, Json, Boto3, Redshift, MySQL, git, Pycharm, Datagrip, S3 and Lambda in AWS. Creating and enhancing data pipelines using custom in-house legacy Python framework. Optimizing complex pandas pipeline. Adding features to in-house Python framework. Suggesting improvements using new Python features and modules. Using a mix of shell scripting, Awk and Python/Pandas for code/configuration generation/checking and for data exploration. Rewriting SAS scripts/procedures in Redshift SQL.

  • Data Engineer/DBA/Oracle Expert at Technosoft Corporation
    Mar 2013 - Oct 2022 · 9 yrs 8 mos

    Writing very advanced programs to speed up huge batch processes in a $400 million, multi-Terabyte Java and Oracle 10gR2/11gR2/19c Government Benefits project. Recent highlights: • Designed Parallel Data Pipeline using Python and Prefect workflow framework (like Airflow). • Designed solution for Privileged Identity/Access Management to allow creation and use of PLSQL objects without having to give PIM users the very dangerous EXECUTE ANY PROCEDURE privilege. • Rewrote a Java-based update process in SQL/Analytic SQL and compressed timing from 48+ hours to 3 minutes. Due to the extreme speed of the new solution, very rapid testing was possible, leading to the unearthing of 8 additional business rules not implemented (or even known about) when the Java solution was written. After adding the 8 extra business rules, the final solution still performed within 3 minutes. • Optimizing demanding processes in huge data loading activity involving very complex business rules and massive amounts of data (millions of records). Implemented stable, scalable and maintainable solutions using SQL/Analytic SQL/innovative partitioning. Successfully used Python Multiprocessing to handle launching and monitoring of concurrent sessions loading data from individual partitions. • Used Python with XLSXWriter module to extract information from database and create advanced Excel reports. • Used Python with Pyparsing module to parse and transform Loadrunner Vugen scripts according to end-user rules.

  • Data Engineer/DBA/Oracle Expert at Waiting for Visa paperwork
    Dec 2012 - Feb 2013 · 3 mos

    Was waiting for visa paperwork to come through and spent this time learning about Big Data/NoSQL practical problems, solutions and architectures used in successful operations.

  • Data Engineer/DBA/Oracle Expert at RGM Advisors, LLC
    Sep 2012 - Nov 2012 · 3 mos

    • Developing complex trading compliance and profit/loss reports • Extending underlying advanced Oracle-based infrastructure using SQL, Analytic SQL, PL/SQL and Python