San Francisco, California, United States
Cooking. Social: 📄 Website - https://www.aleksagordic.com/ (old one: https://gordicaleksa.com/) 📺 YouTube - https://www.youtube.com/c/TheAiEpiphany 🐦 Twitter - https://twitter.com/gordic_aleksa 👨👩👧👦 Discord - https://discord.gg/peBrCpheKE 📚 Medium - https://gordicaleksa.medium.com/ 💻 GitHub - https://github.com/gordicaleksa 📢 AI Newsletter - https://aiepiphany.substack.com/ 💰 Patreon - https://www.patreon.com/theaiepiphany NOTE: I get a lot of DMs and so If I don't answer it's because I, unfortunately, can't possibly answer every DM I receive. --- about me in O(1) --- ex-software/ML engineer at Microsoft & DeepMind with a broad background: electronics & embedded programming on uCs (bachelor) , software engineering, algorithms, deep learning (computer vision, natural language processing (NLP), geometric DL, reinforcement learning (RL)...), web, mobile, Chrome extensions, etc (I had a lot of personal projects :)). I try to spend every waking hour learning, improving, creating, pushing myself out of my comfort zone, and sharing the stuff I learn in public. I'm constantly revising and improving my productivity, health & learning processes (at the same time making sure it's not procrastination in disguise). Before I started working for Microsoft I got my bachelor's in EE and I did 2 internships: 1) In a German startup - Telocate - where I was working as an Android dev (autodidact) developing precise localization using ultrasound. 2) In Brazil (UFOP university) where I worked on researching various heuristics for the multi-pile vehicle routing problem. Ended up more of a life experience: I lived with 11 Brazilians and learned Portuguese. At Microsoft, I got to work directly with talented engineers & researchers from Microsoft Research Cambridge, and from Seattle on the HoloLens family of devices. While I was working at MSFT I was also part of the organization and a lecturer at PSI:ML (ML camp). I worked as a research engineer at Google DeepMind on visual language models (VLMs) and multimodal learning in general, where I led a bulk inference project that brought VLMs (Flamingo) to various Google/YouTube products. Since I left DeepMind I tried a few things, failed many times, still cooking though. Some non-tech hobbies of mine: * Learning (human) languages. I learned 5 in my free time (mostly in my high-school days, it gets really easy once you learn the first two) * Powerlifting/calisthenics (my max was 120 kg bench press, 5 sets 3 reps ;)).
Tech lead on the STEM {pre,mid,post}-training team working directly with Ashish Vaswani (1st author of the Transformer paper) and a small group of cracked technical people on shipping SOTA open-source LLMs with a focus on code, math, and stem. We released the best USA open-source LLM (dense transformer) in the 8B category - fully {pre,mid,post}-trained from scratch on zettaflops of AMD and TPU compute: Rnj-1: https://essential.ai/research/rnj-1 We released a follow-up long context (32k -> 160k) model, Rnj 1.5: https://huggingface.co/EssentialAI/rnj-1.5-instruct I led and co-led end-to-end efforts across data (including building an internal PDF OCR pipeline, and data mixing), long context, training experiments, evaluations, and infra/compute efficiency.
If you are looking for an angel and you are building a consequential company, dm me. :) I care about AI -> model layer, (ML) infra, AI accelerators, energy + robotics / physical AI. bonus: you are based in SF / bay area.
I previously co-founded the company with an ex CTO of Airbus. The goal was to build an AI system that would help us design physical systems in the world. We raised $23M from Radical Ventures and few great angels (Google's Jeff Dean and Zak Stone, OpenAI's Peter Welinder, etc.).
Built Ortus AI (a YouTube assistant), replicated Meta's NLLB in open (for low resource Balkan languages), founded Runa AI (trained SOTA LLMs (large language models) for low-resource languages and offered to enterprise & gov in the Balkan region), built Jarvis for Images, Cracked Engineers, and llm.c together with Andrej Karpathy, Arun, and Erik. Learned a lot from some failures. :)
Led the integration of vision language models (VLMs, Flamingo 🦩) into Google’s products (YouTube ads, YouTube shorts, internal demos, etc.) through the development of a bulk inference tool. This tool facilitated the deployment of Flamingo across a massive number of TPUs. Contributed to the Flamingo project and paper by assisting with data integration, addressing memory issues, and providing critical feedback (unfortunately joined the project at a later stage so only got an acknowledgement in the paper as opposed to a co-author role). Through my external education work in the AI space I was responsible for hiring 15-20 DeepMinders (who explicitly reached out to me when they joined to thank me). I was never internally awarded for this but is something I am very proud of. :) Flamingo: https://arxiv.org/abs/2204.14198