a little about me

I work where LLMs meet product, and refuse to pick a side.

I'm Vanshu — currently in Bengaluru, currently at EMA. I spend most of my days reading LLM outputs and asking why they went wrong, then helping the team turn those answers into shipped fixes.

📍 Bengaluru, India

💼 AI Eval Analyst @ EMA

🌐 Remote-first, open to onsites

profile.json

{
  "name": "vanshu saini",
  "role": "ai eval analyst",
  "focus": "llm × product",
  "location": "Delhi / bengaluru, india",
  "open_to": "remote + onsites",
  "stack": [
    "evals",
    "prompt eng",
    "sql",
    "product sense"
  ],
  "currently": "@ema"▌
}

how i got here

2018

The foundation

I started in computer applications at Ambedkar Institute of Technology in Delhi. Halfway through, I realised I was less interested in writing software and more interested in why people use it (and stop using it). That curiosity got me into the Plaksha Tech Leaders Fellowship — 60 students, scholarship, a year of intense exposure to AI, design, and product alongside UC Berkeley and Purdue.

2022

The PM detour

From there it was the NextLeap PM Fellowship (top 4% of 500+), an internship at Blink X where I owned a stock-education app end to end, and then Phenom — where I learned what it actually feels like to defend a churn metric to a CS team that needs the answer yesterday.

now

The current chapter

Now I'm at EMA, building the evaluation layer for a fleet of AI agents that real enterprises are putting real money behind. There's no playbook for QA-ing a 50-agent fleet that hallucinates differently each day. We're writing it.

🎯

What I'm good at

Looking at messy data — LLM outputs, user funnels, behavioural signals — and finding the one pattern that explains most of the noise. Building tools nobody asked for that quietly become the thing the team can't live without. Writing things down clearly enough that an engineer, a sales lead, and a VP can all agree on what we're doing.

🔬

What I'm still figuring out

How to balance speed and rigor when the LLM ecosystem moves faster than the evaluation literature. How to design eval frameworks that survive a model upgrade. How to explain to non-technical stakeholders that "the AI is wrong sometimes" isn't a bug — it's the entire product surface.

outside of work

I read a lot about how products and people fail — startup post-mortems, behavioural psychology, the occasional Karpathy lecture. I built Freese in a weekend because a friend's PCOS conversation wouldn't leave my head. I will, given any opportunity, talk about why product teams underweight evaluation. Then I'll talk about it some more.

the facts

based Bengaluru, India
role AI Evaluation Analyst at EMA · since Oct 2024
edu Plaksha Tech Leaders Fellowship UC Berkeley + Purdue collab · 6% selection rate
edu Bachelor of Computer Applications Ambedkar Institute of Technology, Delhi · 2018–2021
won 1st Runner Up · Masters' Union Startup Weekend ₹3,00,000 grant for Freese
won Top Fellow · NextLeap PM Fellowship top 4% of 500+ applicants
won 1st Runner Up · Masters' Union PM Bootcamp 50% scholarship · ₹2.45L
stack Claude Code, SQL, Mixpanel, Figma, LLM eval frameworks and a healthy distrust of mocked tests
currently Open to senior AI Product / Eval roles especially in enterprise AI, agents, evaluation tooling

Want to talk AI, product, or evaluation?

Drop me a line — I'm always up for a conversation about hard product problems.

vanshu.bu@gmail.com → LinkedIn ↗