reading
AI Agent Harness Internals
How the runtime behind AI agents actually works, end to end.
Every AI agent you use runs inside a harness. The harness is the loop that calls the LLM, parses the response, executes tools, and decides what happens next. Most people treat it as a black box. I want to understand every layer of it. How does the agent loop manage execution control flow? How do you stream tool calls over SSE and parse partial JSON mid-stream without waiting for the full response? What detects when an agent is stuck in a doom loop, and how do you break it? What happens when the context window overflows, and how does compaction decide what to keep? How do you retry against rate limits without losing work? How does one harness talk to multiple LLM providers and normalize the differences? How are tools defined, validated against JSON Schema, and gated by permissions per agent role? How do subagents get spawned, tracked, and cleaned up? How does MCP let agents discover tools at runtime instead of hardcoding them? These are the questions. The goal is to understand the full runtime, not just use it.
building
GoSFU: Realtime Voice AI over WebRTC
What actually happens between someone speaking and hearing an AI respond.
You speak into a mic. An AI responds in your ear. The gap between those two moments is where every interesting systems problem lives. The audio travels as RTP packets over WebRTC into a Go SFU built on Pion. The SFU hands it to a speech-to-text service, which streams a transcript to an LLM, which streams tokens to a text-to-speech service, which streams audio back through the SFU to the client. Every stage adds latency. How much? Where exactly? Does streaming STT actually beat batching, or does it just feel like it should? What happens when the LLM is slow and audio chunks pile up? When do you drop stale audio vs queue it? When ICE negotiation fails or the STT service times out mid-sentence, what does graceful degradation actually look like? The goal is not just to make it work. It is to measure every stage, find the bottleneck, and understand why the system behaves the way it does under real conditions.
commits · last year
reading
AI Agent Harness Internals
How the runtime behind AI agents actually works, end to end.
Every AI agent you use runs inside a harness. The harness is the loop that calls the LLM, parses the response, executes tools, and decides what happens next. Most people treat it as a black box. I want to understand every layer of it. How does the agent loop manage execution control flow? How do you stream tool calls over SSE and parse partial JSON mid-stream without waiting for the full response? What detects when an agent is stuck in a doom loop, and how do you break it? What happens when the context window overflows, and how does compaction decide what to keep? How do you retry against rate limits without losing work? How does one harness talk to multiple LLM providers and normalize the differences? How are tools defined, validated against JSON Schema, and gated by permissions per agent role? How do subagents get spawned, tracked, and cleaned up? How does MCP let agents discover tools at runtime instead of hardcoding them? These are the questions. The goal is to understand the full runtime, not just use it.
building
GoSFU: Realtime Voice AI over WebRTC
What actually happens between someone speaking and hearing an AI respond.
You speak into a mic. An AI responds in your ear. The gap between those two moments is where every interesting systems problem lives. The audio travels as RTP packets over WebRTC into a Go SFU built on Pion. The SFU hands it to a speech-to-text service, which streams a transcript to an LLM, which streams tokens to a text-to-speech service, which streams audio back through the SFU to the client. Every stage adds latency. How much? Where exactly? Does streaming STT actually beat batching, or does it just feel like it should? What happens when the LLM is slow and audio chunks pile up? When do you drop stale audio vs queue it? When ICE negotiation fails or the STT service times out mid-sentence, what does graceful degradation actually look like? The goal is not just to make it work. It is to measure every stage, find the bottleneck, and understand why the system behaves the way it does under real conditions.
commits · last year
EX - YC
Who I am
I build real-time AI systems. So far that has meant voice agents, retrieval pipelines, products that shipped and scaled. But I am going deeper, into WebRTC and audio/video ML, trying to understand how packets move and how models hear. That intersection is where I am headed.
VOICE AI · RETRIEVAL · REAL‑TIME
GET IN TOUCH ↗
Senior Software Engineer
San Francisco • Full-time • Remote
Rime was a great place to build real-time systems with a strong team and a lot of ownership. I owned the full voice pipeline, from frontend UX to WebRTC transport with LiveKit and Pipecat. Shipped integrations so customers could get low-latency voice that actually worked. When audio broke in production, I was the one tracing it through the stack and pushing fixes. Got comfortable making tradeoffs between speed, reliability, and cost. The kind of work where you hear the difference when it's done right.

Lead Engineer
New York City, NY, USA • Full-time • Remote
Architected high-performance web apps with Node.js and React. Built a full design system from scratch, dramatically speeding up development. Refactored backend logic and SQL to optimize API response times. Developed LLM-powered AI agents using LangGraph built a role-aware conversational interface and a multi-agent system using RAG and fine-tuned models. Created a OneOnOne Agent that aggregates data for performance reviews and real-time collaboration.

Founding Engineer
San Francisco, California • Full-time • Remote
Found my people here. This team showed me you truly become who you surround yourself with. Wore all the hats as a founding engineer coded frontend to backend, built our JS SDK from scratch, and geeked out deploying LLaMA and Stable Diffusion models. The best part? Talking directly with users, turning their "what if" moments into actual features. Felt like building my own startup, except with brilliant teammates who pushed me daily. Still miss our 3AM debugging sessions and spontaneous whiteboard jams.

Software Engineer
Bangalore, India • Full-time
Full-stack developer at TIFIN Wealth specializing in modern web architecture. Spearheaded frontend development using Next.js and React.js while driving backend integrations to enhance platform performance. Implemented Redux Toolkit for streamlined state management and created an ecosystem of reusable components to improve development efficiency. Integrated GraphQL and React Query for optimized data fetching while modernizing the platform through strategic codebase migrations. Enhanced user experience through responsive design implementation and systematic bug resolution.
