View the frozen task prompt
Build a static web app that estimates clothing fit from a person's body photo — and gets smarter
every time a customer returns an item. You have ONE attempt: a single agentic run, no follow-up
questions, no second chances. You are judged on whether it actually works under an automated
harness, on the accuracy of your estimates, on whether it genuinely learns from returns, on
respecting the user's privacy absolutely, and on the craft of communicating an uncertain,
probabilistic result to a real human. Treat the requirements as the floor. Exceed them.
THE PRODUCT:
From a user-uploaded BODY PHOTO plus a HEIGHT input (for real-world scale), estimate five
fashion-relevant garment metrics — chest, waist, hip (circumferences, cm), inseam and shoulder
(linear, cm) — and recommend a SIZE from a provided brand size chart. Show the user how confident
you are. This is a measuring tape that runs in a browser tab; treat it with the seriousness of
one.
ABSOLUTE PRIVACY (non-negotiable, and verified):
- The photo MUST be processed fully ON-DEVICE and MUST NEVER leave the browser — no uploads, no
analytics beacons, no "just the embedding". The harness records all network traffic; ANY image
egress, or any request to a host outside the allowlist, is both a privacy failure and a
contract violation that zeroes the run.
- The no-external-libraries rule is RELAXED for this task ONLY so you can run a real model: you
MAY download ONE pinned pose/segmentation model + its runtime from the allowlisted hosts
(`cdn.jsdelivr.net`, `storage.googleapis.com`) and you MUST record exactly what you pinned in
`fixtures/sources.lock` (URL + version). After the model has loaded, the app MUST work fully
OFFLINE. WebGPU is the best tier; a WASM/CPU fallback is a fully valid tier (no GPU is
assumed). Detect what is available and degrade gracefully — never crash, never hang.
LEARN FROM RETURNS (the hard part — this is the core of the task):
A size chart is a guess; real fit is learned from outcomes. You will be fed a stream of labeled
return events ("ordered M, returned too small"). Use them to update your size recommender so that
your error on a held-out set of customers measurably DROPS. The brand's true fit may differ
systematically from its label sizing — discover and correct that offset from the data. This must
be a real, measured improvement, not a cosmetic "we learn!" badge.
ROBUST INPUT HANDLING:
Handle a photo with no detectable person, a corrupt/garbage image, and a file that is not an
image at all — each gracefully and distinctly, with a clear message and zero crashes. Never
fabricate a measurement for an input you cannot actually read.
PROBABILISTIC-OUTPUT UX (judged):
- Communicate uncertainty honestly: a confidence indicator and plausible ranges (not just a hard
single number), so a user understands an estimate is an estimate.
- Explain "why this size" — make the recommendation legible (which measurements drove it, how a
return-informed adjustment changed it).
- Earn trust: explicit, up-front consent / privacy messaging that the photo stays on the device.
MANDATORY HARNESS HOOKS (exact contract — the evaluator depends on these verbatim):
At eval time the harness serves your app at `/` and the task fixtures at `/fixtures/` on the SAME
origin. It will NOT click your UI; it calls these global hooks on `window`:
1) `window.__estimate(imageId)` → `Promise<{ measurements: { chest, waist, hip, inseam, shoulder },
size, confidence }>`
- `imageId` is an entry in `/fixtures/bodies/manifest.json` (each entry:
`{ id, file, type, heightCm }`). Resolve it, load `/fixtures/bodies/<file>` ON-DEVICE, and use
that entry's `heightCm` as the height/scale input (in the live UI a human types this).
- `measurements` are numbers in cm; `size` is one of the strings in the chart's `order`;
`confidence` is a number in `[0,1]`.
- For a no-person / unreadable / non-image input, DO NOT reject. RESOLVE with
`{ measurements: null, size: null, confidence: 0, error: "no_person" | "bad_image" |
"non_image" }`.
2) `window.__applyReturns(batch)` → `Promise<void>`
- `batch` is an array of labeled return events: `{ imageId, orderedSize, outcome:
"too_small" | "too_large" | "fit", trueSize }`. Ingest them and update your recommender so
subsequent `__estimate` calls return better-fitting sizes. May be called more than once
(treat it as a stream).
3) `window.__modelInfo` (optional but recommended) → `{ backend: "webgpu"|"wasm"|"cpu"|"stub",
model: <pinned url or name> }`. Declare your runtime tier and model honestly; it informs tiering.
4) Signal readiness: expose `window.__ready` as a Promise that resolves once the model is loaded
and the hooks are callable (or set `window.__fitReady = true`). The harness awaits this before
calling `__estimate`.
The provided brand size chart is at `/fixtures/size-chart.json` (its `order` array lists the valid
size strings). Base recommendations on it.
OUTPUT CONTRACT (mandatory):
- One runnable, static web app in the current working directory; entry point is `index.html` in
the root. Your own CSS/JS (separate files fine, relatively linked). No build step — it must run
by serving the folder statically.
- A REAL, accessible UI for humans (photo upload + height input + results with confidence/ranges
and the "why this size" explanation), in addition to the harness hooks.
- Besides the ONE pinned model/runtime from the allowlisted hosts, NO other external libraries,
CSS frameworks or CDN dependencies. After model load, fully offline.
There are no follow-up questions and only a single run. Start implementing immediately.