Task A · Frontend & Commerce Craft · version v3 · weight 20 · static-app

Premium Storefront

View the frozen task prompt
Build the most advanced commerce experience you believe could realistically define premium
eCommerce in 2030. You have ONE attempt — a single agentic run, no follow-up questions, no second
chances. Your work will be judged not only on correctness, but on originality, engineering
quality, commercial viability, accessibility, resilience, adaptability, AI-readiness, and your
ability to anticipate requirements that were never explicitly stated. Treat everything below as
the floor, not the ceiling. Exceed it. Surprise us.

Pick a fictional premium brand and product, and a small but believable catalog (one hero product
+ several related products with real data: prices, ratings, stock, attributes, copy). Give it a
point of view — a brand world a real customer would remember and buy into.

FLOOR — INTERACTIVE 3D PRODUCT CONFIGURATOR (WEBGL):
- Render the hero product in an interactive 3D configurator using the raw WebGL (or WebGL2) API.
  Write your own shaders. NO 3D/graphics libraries (no three.js, babylon, regl, etc.).
- It must be animated (subtle motion / rotate / shader effect) AND interactive (drag to orbit,
  move or change the light).
- A real configurator: at least three option groups (e.g. color/material, a component/part, and a
  finish or add-on such as engraving). Each choice updates the WebGL scene/material in real time.
- The configuration is the source of truth: it drives the displayed price (per-option deltas),
  availability, gallery and what gets added to the cart (reflected in the cart line item).
- Respect prefers-reduced-motion (calm static fallback) and keep the canvas accessible (labelled,
  not a keyboard trap, non-WebGL fallback image and equivalent non-3D controls for every option).

FLOOR — REAL COMMERCE LOGIC (not just static markup):
- A working cart (add/remove, quantities, line items, live subtotal, tax, shipping rule with a
  free-shipping threshold) shown in a mini-cart/drawer, and it MUST persist across a full reload.
- Variant-driven state: color/size updates price, availability, gallery and the 3D material;
  handle out-of-stock and disabled combinations correctly.
- Dynamic pricing & promotions: discount with strikethrough, a "frequently bought together"
  bundle with a combined/discounted price, at least one quantity break, and a coupon/voucher field
  that validates codes and correctly adjusts the order total (accept valid, reject invalid).
- A size / fit recommendation computed from user input. Instant client-side search/filtering.
- Multi-currency AND multi-locale: switch at least two currencies and two languages with correct,
  locale-aware price, number and date formatting (Intl APIs).

FLOOR — MULTI-STEP CHECKOUT:
- A real flow with distinct steps: cart review → shipping address → shipping/payment method
  (mocked, no real payment) → order review → confirmation with an order summary and order number.
- Validate inputs (required fields, email/postal format) with inline, accessible error messages.
- The checkout total must exactly match the cart (items, discounts, coupon, tax, shipping).

FLOOR — AGENTIC & AI-READY:
- A built-in AI shopping assistant that performs MULTI-STEP actions in the UI and shows its steps
  live (e.g. "build a bundle under <budget>, apply the best discount and add it to the cart", or
  "recommend a size and check out"). It must take real actions (configure, add to cart, apply
  coupon, advance checkout), not a scripted chat bubble, and the user must be able to take over /
  undo at any point.
- Agentic-commerce ready: comprehensive, valid JSON-LD structured data (at least Product, Offer,
  AggregateRating, BreadcrumbList — add more where it helps) plus semantic HTML and clear,
  machine-readable state, so an EXTERNAL shopping agent could understand the catalog, prices and
  availability and transact on the user's behalf. Design for both humans and agents.

RAISE THE CEILING — EXPERIENCE, DELIGHT & GAMIFICATION:
- Make this the greatest shopping experience a user has felt: emotional, premium, story-driven.
- Weave in tasteful gamification that fits a premium brand (e.g. rewards/loyalty, progress toward
  a perk or free shipping, unlockables, streaks, a reveal/spin moment, badges, referral hooks) —
  it must add genuine motivation and joy, never feel cheap, and never harm UX, performance or
  accessibility.
- Add surprising, delightful details and micro-moments that a world-class team would ship.

RAISE THE CEILING — ANTICIPATE THE UNSTATED:
- Add what a top commerce team would obviously include even though it isn't listed here (think
  trust signals, returns/shipping clarity, sustainability, social proof, post-purchase, support,
  edge-case handling, empty/loading/error states). Inferring the right unstated requirements is
  part of the score.

QUALITY BAR (hard gates, not nice-to-haves):
- Truly mobile-first: it must work flawlessly on a phone (touch targets, thumb reach, no overflow)
  — not merely responsive. Fluid micro-animations, Dark Mode, full WCAG AA accessibility.
- Performance despite WebGL: target Lighthouse > 95 for Performance, Accessibility, Best Practices
  AND SEO; cumulative layout shift < 0.1; no console or network errors.

OUTPUT CONTRACT (mandatory):
- Deliver a single runnable, static web app in the current working directory. Entry point is an
  `index.html` in the root. Use your own CSS and JS (separate files are fine, relatively linked);
  a multi-page or client-routed flow is fine as long as `index.html` is the entry point.
- NO external libraries, UI/CSS frameworks, or CDN dependencies of any kind (no Tailwind,
  Bootstrap, Shadcn, jQuery, three.js, font CDNs, etc.). Everything must run fully offline.
- No build step: the app must work by serving the folder with a static web server.

There are no follow-up questions and only a single run. Start implementing immediately.

Task score (0–100, base 100 + bonus 20 − penalties)

  • Claude Opus 4.8 (manual)104.5
  • GLM 5.2102.3
  • Claude Sonnet 4.6 (high)100.6
  • GPT-5.593.7
  • Kimi K2.592.4
  • Grok Build 0.189.9
  • Cursor Composer 2.585.9
  • Gemini 3.1 Pro78.1

Leaderboard

#ModelScoreTierAgentAutoFunctionalPerfSEOContract
1Claude Opus 4.8 (manual)104.5Full7325/2520/2088100
2GLM 5.2102.3Full7425/2520/208292
3Claude Sonnet 4.6 (high)100.6Full7724/2520/2069100
4GPT-5.593.7Full6023/2520/2093100
5Kimi K2.592.4Full6025/2520/2081100
6Grok Build 0.189.9High6820/2520/2083100
7Cursor Composer 2.585.9High5921/2520/2087100
8Gemini 3.1 Pro78.1High4118/2520/2093100

Per-model results

Each model's submission with screenshots, an on-demand live preview of the copied app, the harness probe breakdown and key metrics.

#1

Claude Opus 4.8 (manual)

104.5
  • Tier Full
  • Agent 73/100
  • Auto 25/25
  • Contract ✓

ORBE is a floor-complete premium PDP with raw WebGL configuration, persistent cart/pricing, a five-step checkout, rule-based agentic assistant with undo, and rich JSON-LD, plus thoughtful extras like room-fit guidance, engraving, bundles, and loyalty gamification. Visual craft and microcopy are strong and cohesive, but system typography, SVG placeholders, static stock, and locale formatting without UI translation keep it below truly exceptional tier.

Claude Opus 4.8 (manual) on Premium Storefront — Desktop · Light
Desktop · Light
Claude Opus 4.8 (manual) on Premium Storefront — Desktop · Dark
Desktop · Dark
Claude Opus 4.8 (manual) on Premium Storefront — Mobile
Mobile
Live preview — interactive app

Loads the actual app generated by this model.Open in a new tab ↗

Probe breakdown — automated 25/25
  • 3D configurator (WebGL)webgl, text:configure2/2passed
  • Product galleryselector(6), images(14)1/1passed
  • Image zoomselector(2), text:zoom1/1passed
  • Color variantsselector(12)1/1passed
  • Size selectionselector(4), text:size1/1passed
  • Live stockselector(3), text:in stock1/1passed
  • Priceselector(16), text:€1/1passed
  • Discountselector(9), text:%1/1passed
  • Buy boxselector(56), text:add to cart1/1passed
  • Sticky behaviorcss:position: sticky1/1passed
  • Reviewsselector(24), text:review1/1passed
  • Cross-sellingselector(7), text:bought together1/1passed
  • Search / filterselector(4), text:search1/1passed
  • Mobile navigationselector(4)1/1passed
  • Wishlistselector(24), text:save1/1passed
  • Cartselector(13), text:cart1/1passed
  • Multi-step checkoutselector(23), text:checkout2/2passed
  • Currency / locale switchselector(5), text:language1/1passed
  • AI assistantselector(10), text:ai2/2passed
  • Dark modeselector(3), css:[data-theme1/1passed
  • Animationscss:@keyframes1/1passed
  • Accessibility (basics)selector(150), css::focus1/1passed
Full model profile →
#2

GLM 5.2

102.3
  • Tier Full
  • Agent 74/100
  • Auto 25/25
  • Contract ✓

AETHER & CO is a cohesive luxury watch storefront with real raw WebGL2 configuration, persisted cart and pricing logic, five-step checkout, agent-ready JSON-LD, and a rule-based concierge that performs multi-step UI actions. Deductions for abstract 3D visuals, gallery thumbs that do not drive preset views, a misleading zoom hint, and shipping method combined with payment in one checkout step.

GLM 5.2 on Premium Storefront — Desktop · Light
Desktop · Light
GLM 5.2 on Premium Storefront — Desktop · Dark
Desktop · Dark
GLM 5.2 on Premium Storefront — Mobile
Mobile
Live preview — interactive app

Loads the actual app generated by this model.Open in a new tab ↗

Probe breakdown — automated 25/25
  • 3D configurator (WebGL)webgl, text:configure2/2passed
  • Product galleryselector(6)1/1passed
  • Image zoomselector(2), text:zoom1/1passed
  • Color variantsselector(39)1/1passed
  • Size selectionselector(5), text:size1/1passed
  • Live stockselector(16), text:in stock1/1passed
  • Priceselector(24), text:€1/1passed
  • Discounttext:%1/1passed
  • Buy boxselector(42), text:add to cart1/1passed
  • Sticky behaviorcss:position: sticky1/1passed
  • Reviewsselector(97), text:review1/1passed
  • Cross-sellingselector(17), text:you may also like1/1passed
  • Search / filterselector(4)1/1passed
  • Mobile navigationselector(1)1/1passed
  • Wishlistselector(8), text:save1/1passed
  • Cartselector(3), text:cart1/1passed
  • Multi-step checkoutselector(1), text:checkout2/2passed
  • Currency / locale switchselector(5), text:eur1/1passed
  • AI assistantselector(14), text:ai2/2passed
  • Dark modecss:[data-theme1/1passed
  • Animationscss:@keyframes1/1passed
  • Accessibility (basics)selector(106), css::focus1/1passed
Full model profile →
#3

Claude Sonnet 4.6 (high)

100.6
  • Tier Full
  • Agent 77/100
  • Auto 24/25
  • Contract ✓

AURUM MAISON delivers a cohesive luxury pen experience with an impressive raw WebGL2 PBR configurator, persistent cart/pricing, five-step checkout, and a multi-step agentic assistant that performs real UI actions. Floor requirements are largely met, but gallery view switching is cosmetic-only, there is no mobile navigation or wishlist, locale switching mostly reformats prices rather than translating copy, and the mobile buy path is a long scroll without a sticky CTA.

Claude Sonnet 4.6 (high) on Premium Storefront — Desktop · Light
Desktop · Light
Claude Sonnet 4.6 (high) on Premium Storefront — Desktop · Dark
Desktop · Dark
Claude Sonnet 4.6 (high) on Premium Storefront — Mobile
Mobile
Live preview — interactive app

Loads the actual app generated by this model.Open in a new tab ↗

Probe breakdown — automated 24/25
  • 3D configurator (WebGL)webgl, text:configure2/2passed
  • Product galleryselector(5)1/1passed
  • Image zoomtext:zoom1/1passed
  • Color variantsselector(32)1/1passed
  • Size selectionselector(9), text:size1/1passed
  • Live stockselector(1)1/1passed
  • Priceselector(16), text:$1/1passed
  • Discounttext:%1/1passed
  • Buy boxselector(65), text:add to cart1/1passed
  • Sticky behaviorcss:position: sticky1/1passed
  • Reviewsselector(60), text:review1/1passed
  • Cross-sellingselector(22)1/1passed
  • Search / filterselector(14)1/1passed
  • Mobile navigation0/1failed
  • Wishlisttext:save1/1passed
  • Cartselector(23), text:cart1/1passed
  • Multi-step checkoutselector(31), text:checkout2/2passed
  • Currency / locale switchselector(7), text:$1/1passed
  • AI assistantselector(29), text:ai2/2passed
  • Dark modeselector(1), css:prefers-color-scheme1/1passed
  • Animationscss:@keyframes1/1passed
  • Accessibility (basics)selector(243), css::focus1/1passed
Full model profile →
#4

GPT-5.5

93.7
  • Tier Full
  • Agent 60/100
  • Auto 23/25
  • Contract ✓

A cohesive Aurelia Atelier brand world with solid commerce logic, multi-step checkout, agentic assistant, and strong JSON-LD—but screenshots show WebGL fallback, the 3D model is primitive box geometry with collar not reflected in the scene, and gallery/reviews depth is shallow.

GPT-5.5 on Premium Storefront — Desktop · Light
Desktop · Light
GPT-5.5 on Premium Storefront — Desktop · Dark
Desktop · Dark
GPT-5.5 on Premium Storefront — Mobile
Mobile
Live preview — interactive app

Loads the actual app generated by this model.Open in a new tab ↗

Probe breakdown — automated 23/25
  • 3D configurator (WebGL)webgl, text:configure2/2passed
  • Product galleryselector(1)1/1passed
  • Image zoom0/1failed
  • Color variantsselector(18), text:color1/1passed
  • Size selectionselector(3), text:size1/1passed
  • Live stockselector(1), text:in stock1/1passed
  • Priceselector(7), text:eur1/1passed
  • Discounttext:discount1/1passed
  • Buy boxselector(19)1/1passed
  • Sticky behaviorcss:position: sticky1/1passed
  • Reviewstext:review1/1passed
  • Cross-sellingselector(2), text:related1/1passed
  • Search / filterselector(2), text:search1/1passed
  • Mobile navigation0/1failed
  • Wishlisttext:save1/1passed
  • Cartselector(3), text:cart1/1passed
  • Multi-step checkoutselector(4), text:checkout2/2passed
  • Currency / locale switchselector(2), text:language1/1passed
  • AI assistantselector(4), text:ai2/2passed
  • Dark modecss:[data-theme1/1passed
  • Animationscss:transition:1/1passed
  • Accessibility (basics)selector(41), css::focus1/1passed
Full model profile →
#5

Kimi K2.5

92.4
  • Tier Full
  • Agent 60/100
  • Auto 25/25
  • Contract ✓

AURUM delivers a coherent luxury-watch single-page experience with an genuinely impressive raw WebGL2 PBR configurator, working cart persistence, coupons, multi-step checkout, and an action-taking AI concierge. Floor requirements are largely met, but dead affordances (gallery tabs, scroll/pinch zoom, shipping-method totals), partial locale translation, missing review content, and several unwired UI hooks prevent it from feeling truly premium or agent-complete.

Kimi K2.5 on Premium Storefront — Desktop · Light
Desktop · Light
Kimi K2.5 on Premium Storefront — Desktop · Dark
Desktop · Dark
Kimi K2.5 on Premium Storefront — Mobile
Mobile
Live preview — interactive app

Loads the actual app generated by this model.Open in a new tab ↗

Probe breakdown — automated 25/25
  • 3D configurator (WebGL)webgl, text:configure2/2passed
  • Product galleryselector(5)1/1passed
  • Image zoomtext:zoom1/1passed
  • Color variantsselector(29)1/1passed
  • Size selectionselector(5), text:size1/1passed
  • Live stockselector(7), text:in stock1/1passed
  • Priceselector(27), text:€1/1passed
  • Discounttext:%1/1passed
  • Buy boxselector(52), text:add to cart1/1passed
  • Sticky behaviorcss:position: sticky1/1passed
  • Reviewsselector(6), text:review1/1passed
  • Cross-sellingtext:bought together1/1passed
  • Search / filterselector(7)1/1passed
  • Mobile navigationselector(2)1/1passed
  • Wishlisttext:save1/1passed
  • Cartselector(17), text:cart1/1passed
  • Multi-step checkoutselector(42), text:checkout2/2passed
  • Currency / locale switchselector(18), text:usd1/1passed
  • AI assistantselector(27), text:ai2/2passed
  • Dark modecss:prefers-color-scheme1/1passed
  • Animationscss:@keyframes1/1passed
  • Accessibility (basics)selector(194), css::focus1/1passed
Full model profile →
#6

Grok Build 0.1

89.9
  • Tier High
  • Agent 68/100
  • Auto 20/25
  • Contract ✓

AETHER delivers a credible premium single-page shop with genuine raw WebGL2 (custom shaders, variant materials, orbit/light controls), persisted cart/pricing, coupons, bundles, multi-step checkout, and an agentic Concierge that performs real UI actions. Floor requirements are largely met in code, but screenshots show a near-empty 3D viewport, no image gallery/zoom, no theme toggle or sticky buy box, DE locale does not translate copy, and mobile UX is crowded by the fixed assistant panel.

Grok Build 0.1 on Premium Storefront — Desktop · Light
Desktop · Light
Grok Build 0.1 on Premium Storefront — Desktop · Dark
Desktop · Dark
Grok Build 0.1 on Premium Storefront — Mobile
Mobile
Live preview — interactive app

Loads the actual app generated by this model.Open in a new tab ↗

Probe breakdown — automated 20/25
  • 3D configurator (WebGL)webgl, text:configure2/2passed
  • Product gallery0/1failed
  • Image zoom0/1failed
  • Color variants0/1failed
  • Size selectionselector(11), text:size1/1passed
  • Live stockselector(6), text:in stock1/1passed
  • Priceselector(30), text:eur1/1passed
  • Discounttext:%1/1passed
  • Buy boxselector(53), text:add to cart1/1passed
  • Sticky behaviorcss:position: sticky1/1passed
  • Reviewsselector(14)1/1passed
  • Cross-sellingselector(4), text:bought together1/1passed
  • Search / filterselector(2), text:search1/1passed
  • Mobile navigation0/1failed
  • Wishlisttext:save1/1passed
  • Cartselector(7), text:cart1/1passed
  • Multi-step checkoutselector(13), text:checkout2/2passed
  • Currency / locale switchselector(3), text:eur1/1passed
  • AI assistantselector(5), text:ai2/2passed
  • Dark mode0/1failed
  • Animationscss:transition:1/1passed
  • Accessibility (basics)selector(87), css::focus1/1passed
Full model profile →
#7

Cursor Composer 2.5

85.9
  • Tier High
  • Agent 59/100
  • Auto 21/25
  • Contract ✓

VÉRAËON delivers a credible premium brand with a real raw WebGL2 configurator, persistent cart/pricing, five-step checkout, agent-ready JSON-LD, and tasteful Atelier Circle loyalty. Gallery thumbs, sticky buy box, mobile navigation, and zoom are missing or non-functional, and the rule-based assistant lacks the ambition implied by a 2030 benchmark.

Cursor Composer 2.5 on Premium Storefront — Desktop · Light
Desktop · Light
Cursor Composer 2.5 on Premium Storefront — Desktop · Dark
Desktop · Dark
Cursor Composer 2.5 on Premium Storefront — Mobile
Mobile
Live preview — interactive app

Loads the actual app generated by this model.Open in a new tab ↗

Probe breakdown — automated 21/25
  • 3D configurator (WebGL)webgl, text:configure2/2passed
  • Product galleryselector(7)1/1passed
  • Image zoom0/1failed
  • Color variantsselector(17), text:color1/1passed
  • Size selectionselector(6), text:size1/1passed
  • Live stockselector(1), text:in stock1/1passed
  • Priceselector(5), text:eur1/1passed
  • Discounttext:%1/1passed
  • Buy boxselector(16)1/1passed
  • Sticky behaviorcss:position: sticky1/1passed
  • Reviewsselector(6), text:review1/1passed
  • Cross-selling0/1failed
  • Search / filter0/1failed
  • Mobile navigation0/1failed
  • Wishlisttext:save1/1passed
  • Cartselector(5), text:cart1/1passed
  • Multi-step checkoutselector(1), text:checkout2/2passed
  • Currency / locale switchselector(2), text:language1/1passed
  • AI assistantselector(7), text:ai2/2passed
  • Dark modecss:[data-theme1/1passed
  • Animationscss:@keyframes1/1passed
  • Accessibility (basics)selector(58), css::focus1/1passed
Full model profile →
#8

Gemini 3.1 Pro

78.1
  • Tier High
  • Agent 41/100
  • Auto 18/25
  • Contract ✓

Aethel meets many floor requirements in code—raw WebGL2 configurator, persisted cart/pricing with coupons and bundles, multi-step checkout, locale/currency switching, and an agentic assistant that performs real UI actions—but the experience feels mid-tier: primitive 3D geometry, no gallery/reviews/sticky buy box, alert-based fit guidance, incomplete checkout steps, and a sparse premium shell that never delivers 2030-level delight.

Gemini 3.1 Pro on Premium Storefront — Desktop · Light
Desktop · Light
Gemini 3.1 Pro on Premium Storefront — Desktop · Dark
Desktop · Dark
Gemini 3.1 Pro on Premium Storefront — Mobile
Mobile
Live preview — interactive app

Loads the actual app generated by this model.Open in a new tab ↗

Probe breakdown — automated 18/25
  • 3D configurator (WebGL)webgl2/2passed
  • Product gallery0/1failed
  • Image zoom0/1failed
  • Color variants0/1failed
  • Size selectionselector(2), text:size1/1passed
  • Live stockselector(1), text:in stock1/1passed
  • Priceselector(1), text:eur1/1passed
  • Discounttext:discount1/1passed
  • Buy boxselector(29), text:add to cart1/1passed
  • Sticky behavior0/1failed
  • Reviews0/1failed
  • Cross-sellingselector(1)1/1passed
  • Search / filter0/1failed
  • Mobile navigation0/1failed
  • Wishlisttext:save1/1passed
  • Carttext:cart1/1passed
  • Multi-step checkoutselector(8), text:checkout2/2passed
  • Currency / locale switchselector(2), text:eur1/1passed
  • AI assistantselector(9), text:ai2/2passed
  • Dark modecss:prefers-color-scheme1/1passed
  • Animationscss:@keyframes1/1passed
  • Accessibility (basics)selector(57)1/1passed
Full model profile →