Blogs
Spans, traces and sessions: the three zoom levels of an AI app
The moment you start tracing an AI app, three words turn up everywhere: span, trace, session. People …
Evals aren't a step at the end. They run the whole way through
There’s a version of building an AI app that goes like this. You build the thing, you get it …
Four ways to run an eval, from a cheap unit test to a full-blown agent
Someone asked me last week how you actually run an eval on an AI app. I gave the honest answer, …
I built a little Claude that dances when it needs me
I’ve got into a bad habit lately. I set Claude Code off on some task - refactor this, write …
How to build an eval you can actually trust
Here’s how most people build an eval. They open a file, write an LLM judge prompt that says …
Two labs started dreaming, and they built two different architectures
Originally published on the Arize AI blog: Two labs started dreaming, and they built two different …
Twenty-seven years ago I nearly put six people out of work with an Excel spreadsheet
Twenty-seven years ago I nearly put six people out of work with an Excel spreadsheet.
Every AI …
Memory is still a missing primitive: Cataloguing what the field is actually shipping
Originally published on the Arize AI blog: Memory is still a missing primitive: Cataloguing what the …
Strong opinions, strongly held - and why I don't care about your tooling debate
I was in a meeting last week where the team was debating which of two tools to use for a job. Both …
3 strikes and you're an AI skill
Back in the day when we wrote actual code instead of poking at an AI, I had a general rule for when …
Enhance GitHub Copilot CLI with skills
Coding agents like GitHub Copilot are pretty cool. You can ask them to do pretty much anything …
Build a Star Wars Copilot in C# - Lesson 8: Agents and Orchestration
Final lesson, and probably my favorite.
We move from a copilot with tools to a system that also uses …











