Confidence your AI's changes actually work.
Validity captures your acceptance criteria as a versioned spec, then has your agent verify against it in your real app — running the hard criteria as deterministic checks (real clicks, real assertions), not an LLM grading screenshots. The same spec compiles into a Playwright or Maestro suite you own.
Access opens as we add invitees. We're letting people in by hand for now.
Know what actually works before you merge.
Capture the spec.
Your agent turns the request into a versioned, reviewable spec — the contract for building, verifying, and CI. Edit it anytime.
Build, then prove.
The agent builds, then verifies against the spec in your real app. Hard criteria run as deterministic checks; soft ones are LLM-scored — kept clearly apart.
Keep the proof.
A verdict per criterion — proven vs scored — and the spec compiles to a Playwright or Maestro test you run in CI forever.
Two kinds of "pass." We keep them apart.
Most tools that "check your UI" hand you one number from an LLM staring at a screenshot — an opinion wearing a verdict's clothes. Validity splits the two and labels every result, so you always know which is which.
A real check ran.
It filled the email, clicked Send, watched POST /api/contact come back 200, and counted zero console errors. Deterministic. When it regresses you get the exact failing step — not a shrug — and it re-runs in CI with no LLM in the loop.
An LLM gave its read.
"The link looks on-brand." Worth having — some things only a trained eye catches — and still an opinion. We mark it Scored so a vibe never gets mistaken for a guarantee, and never gets counted as one.
Your spec is the test suite. For real.
Every hard criterion is already a concrete assertion. One command turns the spec into a Playwright spec — or a Maestro flow for native — that lives in your repo and runs in your CI. No Validity in the loop, no key, no LLM. Walk away tomorrow and you keep the tests.
// generated from spec-7f3a@v3 — regenerate, don't hand-edit
import { test, expect } from '@playwright/test';
test('AC-1: user can submit the contact form', async ({ page }) => {
const res = page.waitForResponse(
(r) => r.url().includes('/api/contact') && r.request().method() === 'POST',
);
await page.getByRole('textbox', { name: 'Email' }).fill('a@b.com');
await page.getByRole('button', { name: 'Send' }).click();
expect((await res).status()).toBe(200);
});Generated from the same spec your agent verified against. Soft criteria come through as test.fixme() stubs — visible, never silently dropped.
For developers who don't trust "looks good to me" from the agent.
“Done” should mean actually done.
Validity is in a free, invite-only alpha. Sign in to claim your spot. We're letting people in by hand and will send the install command when yours opens.