Posts

GPU Infrastructure: The Five Calculations That Actually Matter New

I was building a GPU recommendation engine — one that maps workload descriptions to specific configurations, primary recommendations, and cost ranges — and kept hitting the same wall: getting the recommendations right meant going deep on every constraint that determines whether a deployment actually works. Not whether it’s affordable. Whether it works at all. VRAM has to fit the full training state, not just the model weights. Training data has to be where the GPUs are. The interconnect has to support the parallelism strategy. None of that shows up in a $/hr comparison. Here are the five calculations that come before it. ...

The Trust Layer: What Separates Good RAG from Enterprise RAG New

I was stress-testing a RAG system built for regulated industries — financial services and life sciences. The grounding was fine. No hallucinations. What I found were subtler failures — the kind that only surface when analysts run the same query twice, compare citations across sessions, and need to explain to a regulator exactly which document an answer came from. In regulated environments, that’s the standard. And the system wasn’t meeting it. ...

The AI PC Buying Problem Every Enterprise Needs to Solve New

Over the last 18 months I have been in a lot of conversations about AI PCs — with enterprises evaluating fleet upgrades, with device vendors making the case for their hardware, and with IT leaders trying to figure out what their employees actually need. The consistent signal: everybody agrees AI PCs matter. Purchases are happening — Windows 10 end-of-support has accelerated that — but the buying is cautious and uneven. Two reasons come up every time: the AI landscape is moving fast enough that enterprises are not confident their requirements will look the same in 12 months, and they do not have a reliable way to evaluate what they are being sold. Most are not even sure what the right criteria should be. ...

MCP in Production, Part 1: Persistent Sessions, Pooling, and Fault Tolerance

MCP in Production · Part 1 of 2 Part 2: Authentication, Observability, and Operational Design → Most MCP client examples open a session, call a tool, and close the session. That pattern is fine for demos. It breaks in production in ways that aren’t obvious until you’re staring at a hung process or a spike in latency. This is Part 1 of a two-part series on what it takes to run an MCP client reliably. I’ll cover the transport layer: sessions, pooling, dead connection recovery, timeouts, and the heartbeat. Part 2 covers the system layer: authentication, observability, and operational design. ...

MCP in Production, Part 2: Authentication, Observability, and Operational Design

MCP in Production · Part 2 of 2 ← Part 1: Persistent Sessions, Pooling, and Fault Tolerance Part 1 covered the transport layer — keeping sessions alive, recovering from failures, and a few edge cases that only surface when you’re running a real pool under real failure conditions. This part covers what I’d call system readiness: the things that separate a working prototype from something I could hand to a client and say “deploy this.” ...

Designing a Professional Digital Twin: The Architecture

Over the last year, I’ve been building production-grade agentic AI systems — LangGraph state machines, multi-agent orchestration, deterministic validation pipelines designed for regulated environments. And somewhere in that work, I noticed something: the architecture I was using to build reliable AI agents was a pretty accurate model of how I actually operate professionally. So I mapped it out. Not as a second brain or a structured resume. As an agent specification — a design exercise in making professional expertise explicit, structured, and transferable. ...

I Used MCP as a Service-to-Service Protocol. Here's What I Learned.

When I designed the architecture for my KYC onboarding orchestrator, I made a deliberate choice: use MCP not as an LLM-to-tool protocol — the way it was originally designed — but as a service-to-service protocol between a LangGraph orchestrator and a set of independently deployable integration servers. It worked. But it came with real tradeoffs I want to document, because I don’t think this pattern is well understood yet. Background: What I Built The system onboards corporate clients through a fixed sequence of checks — entity profile retrieval, credit rating, sanctions screening, PEP check, CRM update, document generation. Each of those integrations runs as a separate MCP server. A LangGraph graph orchestrates the sequence by calling MCP tools directly from its nodes. ...

Why Your AI Agent Demo Looks Great and Your Production System Doesn't

I’ve spent the last several months building agentic AI systems — not demoing them, building them. And I want to share something that took me longer than I’d like to admit to fully internalize. The hype is real. The gap is also real. And the gap is closing — but not in the way most people think. This reflects where I am in March 2026, building on roughly 18 months of hands-on agentic work. The field is moving fast and I expect some of this to age. ...