Belo Horizonte, Minas Gerais, Brazil
Data Engineer I have 3+ years of experience developing the full lifecycle of Data and AI projects, implementing Continuous Training techniques along with CI/CD practices, Data Mining, Cloud Deployment, RAG and Scalable Data Pipelines Proficient in designing and optimizing ETL workflows, integrating LLM/ML models into production environments. Fluent in English (Cambridge C1 certified) and experienced in working with international teams. Technical Abilities: Python | Golang | SQL | JavaScript | Shell Script | PyTorch | TensorFlow | Scikit-Learn | FastAPI | Scrapy | LLMs | NLPs | Machine Learning | AWS | Apache Kafka | Apache Airflow | Grafana | RabbitMQ | Docker | Redis | MongoDB | PostgreSQL | Elasticsearch | CI/CD | Big Data Architectures.
Recognized by Forbes as one of the Top 50 AI Companies in 2024. - One of the main developers in the Data Gathering team, responsible for developing large data extraction pipelines to support ML initiatives. - Designed an end-to-end architecture for data extraction, ingestion, enhancement and deployment to feed ML models (using Kafka, Redis, Docker, PostgreSQL, AWS, Grafana, Python, HTTPX, Playwright and LLMs). - Responsible for creating APIs, PostgreSQL/Redis databases and AWS S3 Data Lakes. - Created data enhancement pipelines with AI Agents and NLP tools to ensure data quality. - Developed an infrastructure of scrapers to handle diverse websites, bypassing JavaScript, Cloudflare, Captchas, Browser Fingerprint blocks, etc.
- Volunteered as a project developer on GenAI initiatives led by professors from the Federal University of São Paulo (USP). - Designed and enhanced a Text-to-Audio AI model using XTTS-v2 and Tacotron 2, enabling expressive emotion synthesis. - Used OpenAI Whisper and CTranslate2 to optimize inference speed and accuracy in speech processing tasks. - Built and deployed a full data pipeline for collection, preprocessing, and training using Docker, AWS, Shell Script, Go, and Python.
- Developed and deployed AI models using TensorFlow and Scikit-learn to enhance medical equipment calibration processes with previously collected data, significantly improving accuracy and reliability. - Personally developed a chatbot using a pre-trained AI model integrated with AWS RDS, PostgreSQL, Redis, D3.js, and Grafana, enabling an intelligent agent to generate custom dashboards that meet client requirements. - Built and deployed complete ETL pipelines for processing and analyzing healthcare data from Brazil’s largest hospitals, utilizing tools such as Apache Kafka, Apache Spark, Apache Airflow, Docker, and AWS. - Collaborated with clients from Austria and Portugal to design and implement custom data automation workflows and pipelines, streamlining the preparation of medical data for analysis and AI model training. - Designed machine learning models to detect and analyze anomalous log behavior using AWS CloudWatch data, significantly enhancing system security and mitigating DDoS attack risks. Stack used: Python | SQL | Shell Script | AWS | GCP | Apache Kafka | Apache Spark | Apache Airflow | RabbitMQ | LLM | NLP | Machine Learning | Web Scraping | Scrapy | Playwright | Flask | PostgreSQL | Redis | ElasticSearch | Grafana | Docker | n8n | Git
- Created data mining systems using Python and NLP models to extract and analyze valuable insights from large, unstructured datasets. - Managed and optimized SQL and NoSQL databases, including PostgreSQL, Redis, and MongoDB, enhancing query performance, ensuring data integrity, and automating scheduled backups. - Built and maintained robust log analysis systems using Elasticsearch, Kibana, Apache Druid, and AWS CloudWatch to optimize microservices performance and monitor user activity. - Deployed and orchestrated data pipelines using Docker, AWS EC2, RabbitMQ, Apache Kafka, and Shell Script to ensure scalable and efficient data processing. - Designed and automated weekly data analysis reports using Python, Apache Airflow, AWS CloudWatch and AWS Lambda, leveraging performance insights and identifying log errors. Stack used: Python | SQL | Shell Script | AWS | Apache Airflow | RabbitMQ | Machine Learning | Web Scraping | Scrapy | Playwright | Flask | PostgreSQL | Redis | ElasticSearch | Grafana | Docker | Git