All Tools
LangSmith logo

LangSmith

Tracing, evaluation, and debugging for LLM apps. Built by the LangChain team.

What is LangSmith?

LangSmith is a framework-agnostic platform built by the LangChain team for observing, evaluating, and deploying AI agents and large language model (LLM) applications. It captures every step your agent takes at runtime: tool calls, model responses, and intermediate reasoning, then turns that data into something you can actually inspect, measure, and act on. For developers building production AI systems, it closes the gap between “it works in my notebook” and “it works reliably for real users.”

How LangSmith works

When an LLM or AI agent runs, it does not leave a stack trace the way traditional code does. Inputs come in, decisions happen inside a model, tools get called in unpredictable order, and outputs come out. When something goes wrong, you often have no record of why. LangSmith solves this by wrapping your application in a tracing layer that captures the full execution path as a structured timeline.

Here is what that looks like in practice:

  • Traces: Every time your application runs, LangSmith records a trace: a complete, step-by-step log of what your agent did, in what order, and with what inputs and outputs.
  • Evaluation (evals): You define what “good” looks like, either through code-based rules, an LLM acting as a judge, or human reviewers. LangSmith runs those evaluators against your traces so you can score agent quality systematically rather than eyeballing outputs.
  • Monitoring dashboards: Once your agent is live, LangSmith tracks cost, latency (P50 and P99), error rates, token usage, and feedback scores in real time. You can set alerts when any metric crosses a threshold.
  • Deployment: LangSmith includes an agent deployment runtime built on durable execution, which means agents can handle long-running tasks, human-in-the-loop approval steps, and multi-agent coordination without losing state.
  • Prompt management: Teams can version, test, and compare prompts directly in LangSmith, so prompt changes are tracked and reviewable rather than buried in code diffs.

LLM observability is the broader practice of making AI application behavior visible and measurable. LangSmith is one of the most widely used tools in this category because it is purpose-built for agent workflows, not adapted from general-purpose logging infrastructure.

What you can build with LangSmith

LangSmith is for developers who are past the prototype stage and need systematic control over agent quality. Here is what they actually build with it:

  • RAG pipeline debugger: A retrieval-augmented generation (RAG) system that pulls documents before generating answers. LangSmith traces each retrieval call and LLM response so you can see exactly where hallucinations or irrelevant results enter the pipeline and fix them at the source.
  • Prompt regression test suite: A set of example inputs and expected outputs stored as a dataset in LangSmith. Every time you change a prompt or swap a model, you run the suite and compare results side-by-side to catch quality regressions before they reach production.
  • Multi-agent monitoring dashboard: A system where multiple AI agents hand off tasks to each other. LangSmith tracks every sub-agent call, every tool invocation, and every intermediate output so you can diagnose failures in complex, branching workflows.
  • Human review queue: An annotation pipeline where domain experts review flagged agent outputs, rate them against a rubric, and feed that signal back into your evaluation framework. LangSmith’s annotation queues support both single-run review and pairwise A/B comparisons.
  • Cost and latency optimization workflow: A process for identifying which prompts, models, or tool calls are driving up cost or slowing response times. LangSmith’s dashboards surface per-trace cost breakdowns so you can optimize with real data, not guesses.
  • Automated eval pipeline for customer support agents: A continuous evaluation loop where every live conversation is scored by an LLM judge against criteria like accuracy, tone, and task completion. Results feed into a dashboard that shows quality trends over time.

Key Features

  • Framework-agnostic tracing via Python, TypeScript, Go, and Java SDKs, plus native OpenTelemetry support
  • Full trace capture including LLM calls, tool calls, retrieval steps, memory reads, and sub-agent delegation
  • LLM-as-judge, code-based, and human-in-the-loop evaluators that run against real production trace data
  • Dataset management for building test suites from production traces and running regression tests before deployment
  • Real-time monitoring dashboards for cost, latency, error rates, token usage, and custom quality metrics
  • Polly, a built-in AI assistant that reads long traces and helps you pinpoint where things went wrong
  • Managed cloud, bring-your-own-cloud (BYOC), and self-hosted deployment options for teams with data residency requirements
  • SOC 2 Type 2, HIPAA, and GDPR compliance
  • Free developer tier with 5,000 base traces per month; paid plans scale with trace volume

FAQ

Is LangSmith only for LangChain users? +

No. LangSmith vs LangChain is a common point of confusion: LangChain is a framework for building agents, while LangSmith is the platform for observing and evaluating them. LangSmith works with any stack, including OpenAI SDK, Anthropic SDK, LlamaIndex, Vercel AI SDK, or a fully custom implementation.

Does LangSmith slow down my application? +

No. The LangSmith SDK sends trace data through an async callback handler running in the background. Your agent keeps executing at full speed, and if LangSmith experiences an outage, your application continues normally. The tracing layer is fully decoupled from your application runtime.

Is LangSmith free to use? +

There is a free Developer tier that includes one seat and 5,000 base traces per month. Paid plans (Plus and Enterprise) scale with trace volume and team size. Base traces have a 14-day retention period; extended traces have 400-day retention at a higher cost per trace. LangSmith does not use your data to train models.

Explore Similar AI Tools

Newsletter

The Twice-Monthly AI Briefing

Updates from the AI world — what shipped, what we’re using in production, and what’s worth your attention. Two emails a month, no spam.