Event schemas belong in your codebase, not your dashboard

A team's product analytics events are usually documented in three places: a wiki nobody reads, a comment from 2022, and the head of whoever wrote the track() call. The major vendors all noticed and built tracking-plan tools to fix it. Mixpanel describes Lexicon as "a data dictionary that stores descriptions of events and their properties." Amplitude rebranded its 2021 Iteratively acquisition into Amplitude Data, positioned as "a central hub for data management." Segment's Protocols Tracking Plan uses JSON Schema to validate events, with the plan itself "living in Segment." Avo built a whole product around the workflow — its current homepage headline is "Govern your data. Your agents depend on it." (We'll come back to that one.)

Every one of these is useful inside the vendor's own dashboard. None of them ship with your code.

The nuance matters. Mixpanel's data dictionary governs events tracked into Mixpanel. Amplitude Data is the schema of record for events flowing into Amplitude. Segment's Tracking Plan validates events going through Segment. The "single source of truth" in each pitch is doing a lot of work — it's the single source within that vendor's product. Move to a different analytics tool, send events to a second destination, or refactor your tracking calls in a fresh editor session and the source of truth doesn't move with you. It can't. It's database state in someone else's product.

The closest thing to an open standard is Snowplow's Iglu, which uses JSON Schema files to describe event payloads and validates them at ingest. Iglu does ship with code (it's just files in a registry), but it's tightly coupled to Snowplow's pipeline, uses the JSON Schema dialect for what's fundamentally a small declarative format, and presupposes the rest of Snowplow's stack. As a portable analytics-event spec, it isn't really portable.

Why this matters more now

The cost of vendor-locked tracking plans was small in 2018. You picked one analytics tool, set up Lexicon or Data, mostly stayed there. The spec lived in the dashboard because the dashboard was where the work happened.

Two things have changed since.

First, multi-destination tracking is normal. Segment's whole pitch is fan-out: one track() call, N destinations. The minute events flow to Mixpanel and GA4 and your warehouse simultaneously, the proprietary tracking plan inside any one of those tools stops being authoritative for the others.

Second, your refactors are increasingly being done by AI agents. This is what Avo's headline is acknowledging: the audience that needs the tracking plan is no longer just the data engineer logging into a dashboard — it's the agent in your editor, working from the file tree. An agent asked to rename cta_click.location to cta_click.placement across the codebase will grep for call sites. If the schema lives in a vendor dashboard, the agent never sees it, never updates it, never even knows it exists. The agent ships your refactor; the dashboard's tracking plan goes stale; your data fragments.

The vendor response, going by Avo's framing, is to make the existing dashboard agent-aware. That's a coherent product move, but the schema still lives in the dashboard. The agent still has to leave the codebase to get to the truth.

Both problems have the same shape. The source of truth needs to be where the work happens. For multi-destination, that means upstream of any specific vendor. For agentic refactors, that means in the repo.

What that looks like

The fix isn't a new tool. It's a file. event-schema.yaml at the repo root, declarative, parseable by anyone. Here's the smallest useful version:

version: "0.1"

events:
  signup_completed:
    intent: Account creation succeeded. Numerator of every funnel that ends at "real user".
    properties:
      plan:
        type: enum
        values: [free, pro, growth]
        required: true
      method:
        type: enum
        values: [email, github]
        required: true

Five property types: string, number, boolean, enum, money. Each property can be required, can carry examples, can carry a description. The optional intent field on each event is one sentence on what the event is for — the decision it informs, the funnel it belongs to. Without intent, a schema is just a typed dictionary; with it, the schema is documentation that survives past the original author.

That's the whole format. The reference is on GitHub.

What you build on top of a file

The reason a file matters is that other tools can read it. Putting the schema upstream of any specific vendor unlocks four things essentially for free.

Codegen. A CLI reads the YAML and emits a TypeScript declaration with one type per event. Pass it to your tracking SDK as a generic and track("signup_compleated", { ... }) becomes a compile-time error before it ships. The pattern mirrors what Prisma, Drizzle, and GraphQL Code Generator already do for databases and APIs — write the schema once, generate the types, get a build-time error when call sites drift.

Schema-aware tooling. Funnel definitions can reference declared property values directly: cta_click[location=hero_primary]. Because the schema knows location is a string and not a number, the predicate routes to the right storage column without auto-detection guesswork. This isn't theoretical; it's how Clamp's funnel tool reads typed predicates.

Discoverability for agents. Anything that reads code can read the schema. The agent about to refactor your tracking calls now has a deterministic answer to "what does cta_click carry?" without grepping. Sidecar agent skills can scan the codebase for missing call sites and propose updates.

Vendor neutrality. The same schema describes events you fire to Clamp, GA4, Mixpanel, PostHog, Segment, or any combination. The spec doesn't know about destinations. The CLI knows nothing about analytics products. They're orthogonal concerns by construction.

What we shipped

@clamp-sh/event-schema is a 35 KB CLI with two commands: validate against the spec's meta-schema, and generate a TypeScript declaration. It auto-discovers event-schema.yaml from any subdirectory and writes the generated .d.ts alongside the source. Run it with no arguments:

npx @clamp-sh/event-schema generate

The format is at version 0.1; we'll move to 1.0 after a few more weeks of dogfooding. We use it for Clamp's own events. Spec lives at github.com/clamp-sh/event-schema, MIT-licensed. Issues and proposals welcome, especially around new property types — that's the part of the format we're still actively shaping.

The next time someone, or something, asks "what does cta_click carry?", the answer is in the repo.