Tracing, evaluation, and debugging for LLM apps. Built by the LangChain team.
LangSmith is a framework-agnostic platform built by the LangChain team for observing, evaluating, and deploying AI agents and large language model (LLM) applications. It captures every step your agent takes at runtime: tool calls, model responses, and intermediate reasoning, then turns that data into something you can actually inspect, measure, and act on. For developers building production AI systems, it closes the gap between “it works in my notebook” and “it works reliably for real users.”
When an LLM or AI agent runs, it does not leave a stack trace the way traditional code does. Inputs come in, decisions happen inside a model, tools get called in unpredictable order, and outputs come out. When something goes wrong, you often have no record of why. LangSmith solves this by wrapping your application in a tracing layer that captures the full execution path as a structured timeline.
Here is what that looks like in practice:
LLM observability is the broader practice of making AI application behavior visible and measurable. LangSmith is one of the most widely used tools in this category because it is purpose-built for agent workflows, not adapted from general-purpose logging infrastructure.
LangSmith is for developers who are past the prototype stage and need systematic control over agent quality. Here is what they actually build with it:
No. LangSmith vs LangChain is a common point of confusion: LangChain is a framework for building agents, while LangSmith is the platform for observing and evaluating them. LangSmith works with any stack, including OpenAI SDK, Anthropic SDK, LlamaIndex, Vercel AI SDK, or a fully custom implementation.
No. The LangSmith SDK sends trace data through an async callback handler running in the background. Your agent keeps executing at full speed, and if LangSmith experiences an outage, your application continues normally. The tracing layer is fully decoupled from your application runtime.
There is a free Developer tier that includes one seat and 5,000 base traces per month. Paid plans (Plus and Enterprise) scale with trace volume and team size. Base traces have a 14-day retention period; extended traces have 400-day retention at a higher cost per trace. LangSmith does not use your data to train models.
Updates from the AI world — what shipped, what we’re using in production, and what’s worth your attention. Two emails a month, no spam.