Pedro Soares

Data Engineer | AI Engineer | Python | Golang | Machine Learning | LLM | MLOps | LLMOps | GenAI | AWS | ETL Pipelines | Web Scraping

Belo Horizonte, Minas Gerais, Brazil

About

Data Engineer I have 3+ years of experience developing the full lifecycle of Data and AI projects, implementing Continuous Training techniques along with CI/CD practices, Data Mining, Cloud Deployment, RAG and Scalable Data Pipelines Proficient in designing and optimizing ETL workflows, integrating LLM/ML models into production environments. Fluent in English (Cambridge C1 certified) and experienced in working with international teams. Technical Abilities: Python | Golang | SQL | JavaScript | Shell Script | PyTorch | TensorFlow | Scikit-Learn | FastAPI | Scrapy | LLMs | NLPs | Machine Learning | AWS | Apache Kafka | Apache Airflow | Grafana | RabbitMQ | Docker | Redis | MongoDB | PostgreSQL | Elasticsearch | CI/CD | Big Data Architectures.

Experience

  • Data Engineer at TrustScale
    Apr 2026 - Present · 4 mos

  • Machine Learning Engineer at TRACTIAN
    May 2025 - Feb 2026 · 10 mos

    Recognized by Forbes as one of the Top 50 AI Companies in 2024. - One of the main developers in the Data Gathering team, responsible for developing large data extraction pipelines to support ML initiatives. - Designed an end-to-end architecture for data extraction, ingestion, enhancement and deployment to feed ML models (using Kafka, Redis, Docker, PostgreSQL, AWS, Grafana, Python, HTTPX, Playwright and LLMs). - Responsible for creating APIs, PostgreSQL/Redis databases and AWS S3 Data Lakes. - Created data enhancement pipelines with AI Agents and NLP tools to ensure data quality. - Developed an infrastructure of scrapers to handle diverse websites, bypassing JavaScript, Cloudflare, Captchas, Browser Fingerprint blocks, etc.

  • GenAI Project Developer at RAIA - Rede de Avanço em Inteligência Artificial
    Mar 2025 - Sep 2025 · 7 mos

    - Volunteered as a project developer on GenAI initiatives led by professors from the Federal University of São Paulo (USP). - Designed and enhanced a Text-to-Audio AI model using XTTS-v2 and Tacotron 2, enabling expressive emotion synthesis. - Used OpenAI Whisper and CTranslate2 to optimize inference speed and accuracy in speech processing tasks. - Built and deployed a full data pipeline for collection, preprocessing, and training using Docker, AWS, Shell Script, Go, and Python.

  • Data Engineer at Engineer Access
    Sep 2024 - May 2025 · 9 mos

  • Arkmeds (1 yr 3 mos)
    • Junior Data Scientist
      Dec 2023 - Sep 2024 · 10 mos

      - Developed and deployed AI models using TensorFlow and Scikit-learn to enhance medical equipment calibration processes with previously collected data, significantly improving accuracy and reliability. - Personally developed a chatbot using a pre-trained AI model integrated with AWS RDS, PostgreSQL, Redis, D3.js, and Grafana, enabling an intelligent agent to generate custom dashboards that meet client requirements. - Built and deployed complete ETL pipelines for processing and analyzing healthcare data from Brazil’s largest hospitals, utilizing tools such as Apache Kafka, Apache Spark, Apache Airflow, Docker, and AWS. - Collaborated with clients from Austria and Portugal to design and implement custom data automation workflows and pipelines, streamlining the preparation of medical data for analysis and AI model training. - Designed machine learning models to detect and analyze anomalous log behavior using AWS CloudWatch data, significantly enhancing system security and mitigating DDoS attack risks. Stack used: Python | SQL | Shell Script | AWS | GCP | Apache Kafka | Apache Spark | Apache Airflow | RabbitMQ | LLM | NLP | Machine Learning | Web Scraping | Scrapy | Playwright | Flask | PostgreSQL | Redis | ElasticSearch | Grafana | Docker | n8n | Git

    • Software Developer Intern
      Jul 2023 - Dec 2023 · 6 mos

      - Created data mining systems using Python and NLP models to extract and analyze valuable insights from large, unstructured datasets. - Managed and optimized SQL and NoSQL databases, including PostgreSQL, Redis, and MongoDB, enhancing query performance, ensuring data integrity, and automating scheduled backups. - Built and maintained robust log analysis systems using Elasticsearch, Kibana, Apache Druid, and AWS CloudWatch to optimize microservices performance and monitor user activity. - Deployed and orchestrated data pipelines using Docker, AWS EC2, RabbitMQ, Apache Kafka, and Shell Script to ensure scalable and efficient data processing. - Designed and automated weekly data analysis reports using Python, Apache Airflow, AWS CloudWatch and AWS Lambda, leveraging performance insights and identifying log errors. Stack used: Python | SQL | Shell Script | AWS | Apache Airflow | RabbitMQ | Machine Learning | Web Scraping | Scrapy | Playwright | Flask | PostgreSQL | Redis | ElasticSearch | Grafana | Docker | Git