Pedro Soares

Data Engineer | AI Engineer | Python | Golang | Machine Learning | LLM | MLOps | LLMOps | GenAI | AWS | ETL Pipelines | Web Scraping

Belo Horizonte, Minas Gerais, Brazil

About

Experience

Data Engineer at TrustScale
Apr 2026 - Present · 4 mos
Machine Learning Engineer at TRACTIAN
May 2025 - Feb 2026 · 10 mos
Recognized by Forbes as one of the Top 50 AI Companies in 2024. - One of the main developers in the Data Gathering team, responsible for developing large data extraction pipelines to support ML initiatives. - Designed an end-to-end architecture for data extraction, ingestion, enhancement and deployment to feed ML models (using Kafka, Redis, Docker, PostgreSQL, AWS, Grafana, Python, HTTPX, Playwright and LLMs). - Responsible for creating APIs, PostgreSQL/Redis databases and AWS S3 Data Lakes. - Created data enhancement pipelines with AI Agents and NLP tools to ensure data quality. - Developed an infrastructure of scrapers to handle diverse websites, bypassing JavaScript, Cloudflare, Captchas, Browser Fingerprint blocks, etc.
GenAI Project Developer at RAIA - Rede de Avanço em Inteligência Artificial
Mar 2025 - Sep 2025 · 7 mos
- Volunteered as a project developer on GenAI initiatives led by professors from the Federal University of São Paulo (USP). - Designed and enhanced a Text-to-Audio AI model using XTTS-v2 and Tacotron 2, enabling expressive emotion synthesis. - Used OpenAI Whisper and CTranslate2 to optimize inference speed and accuracy in speech processing tasks. - Built and deployed a full data pipeline for collection, preprocessing, and training using Docker, AWS, Shell Script, Go, and Python.
Data Engineer at Engineer Access
Sep 2024 - May 2025 · 9 mos
Arkmeds (1 yr 3 mos)
- Junior Data Scientist
  Dec 2023 - Sep 2024 · 10 mos
  - Developed and deployed AI models using TensorFlow and Scikit-learn to enhance medical equipment calibration processes with previously collected data, significantly improving accuracy and reliability. - Personally developed a chatbot using a pre-trained AI model integrated with AWS RDS, PostgreSQL, Redis, D3.js, and Grafana, enabling an intelligent agent to generate custom dashboards that meet client requirements. - Built and deployed complete ETL pipelines for processing and analyzing healthcare data from Brazil’s largest hospitals, utilizing tools such as Apache Kafka, Apache Spark, Apache Airflow, Docker, and AWS. - Collaborated with clients from Austria and Portugal to design and implement custom data automation workflows and pipelines, streamlining the preparation of medical data for analysis and AI model training. - Designed machine learning models to detect and analyze anomalous log behavior using AWS CloudWatch data, significantly enhancing system security and mitigating DDoS attack risks. Stack used: Python | SQL | Shell Script | AWS | GCP | Apache Kafka | Apache Spark | Apache Airflow | RabbitMQ | LLM | NLP | Machine Learning | Web Scraping | Scrapy | Playwright | Flask | PostgreSQL | Redis | ElasticSearch | Grafana | Docker | n8n | Git
- Software Developer Intern
  Jul 2023 - Dec 2023 · 6 mos
  - Created data mining systems using Python and NLP models to extract and analyze valuable insights from large, unstructured datasets. - Managed and optimized SQL and NoSQL databases, including PostgreSQL, Redis, and MongoDB, enhancing query performance, ensuring data integrity, and automating scheduled backups. - Built and maintained robust log analysis systems using Elasticsearch, Kibana, Apache Druid, and AWS CloudWatch to optimize microservices performance and monitor user activity. - Deployed and orchestrated data pipelines using Docker, AWS EC2, RabbitMQ, Apache Kafka, and Shell Script to ensure scalable and efficient data processing. - Designed and automated weekly data analysis reports using Python, Apache Airflow, AWS CloudWatch and AWS Lambda, leveraging performance insights and identifying log errors. Stack used: Python | SQL | Shell Script | AWS | Apache Airflow | RabbitMQ | Machine Learning | Web Scraping | Scrapy | Playwright | Flask | PostgreSQL | Redis | ElasticSearch | Grafana | Docker | Git