Proprietary Tool · Live Demo · Built 2026-05-13
AIVS Scoring Engine + Client Dashboard
A 3-platform AI visibility measurement engine I built in TypeScript on Next.js + SQLite. It asks the same 10 buyer-intent queries to Anthropic Claude, OpenAI ChatGPT, and Perplexity Search, then records who gets cited. This page runs it against the 2025 Webflow Enterprise Partner of the Year and the 4 other finalists.
The question this answers
For B2B services, "am I cited by AI?" is now a real budget question. Every commercial query routed through ChatGPT, Perplexity, or Claude bypasses Google entirely. The agencies that win those queries take the buyer relationship before a search result loads.
The hardest part isn't measuring one platform. It's triangulating three. ChatGPT and Claude rely on training data with a knowledge cutoff. Perplexity does live retrieval against a web index. They disagree often. A score from any single platform is noise. A score across three is signal.
I built this engine to do that triangulation cheaply, on a weekly cron, against any domain I want. Then I pointed it at the agencies who won and were named finalists for the 2025 Webflow Enterprise Partner of the Year award. Including the agency that just won it: BX Studio.
What you see below is real, live, queryable data. The dashboard at /aivs/live reads from the same database.
The headline
10 commercial buyer queries × 3 platforms × 5 agencies = 150 cells. Total "appeared" rate per agency:
| Rank | Agency | Perplexity | OpenAI | Anthropic | Total |
|---|---|---|---|---|---|
| 1 | BX Studio bx.studio · Winner 2025 | 3/10 | 0/10 | 0/10 | 10% |
| 2 | MakeBuild makebuild.studio · Finalist | 1/10 | 0/10 | 0/10 | 3% |
| 3 | N4 Studio n4.studio · Finalist · built webflow.com | 1/10 | 0/10 | 0/10 | 3% |
| 4 | Edgar Allan edgarallan.com · Finalist · 2x Agency of the Year (2022, 2023) | 0/10 | 0/10 | 0/10 | 0% |
| 5 | TAG wearetag.co · Finalist · global | 0/10 | 0/10 | 0/10 | 0% |
Three findings the data forces
- Zero LLM citations across 100 LLM calls. Not one of the 5 agencies surfaces by name in ChatGPT or Claude on any of the 10 buyer queries. Perplexity retrieval is doing 100% of the work.
- BX Studio dominates retrieval, but only on 3 of 10 queries.The award is fresh enough to be in Perplexity's index, and BX appears for "best Webflow development agency" and "top Webflow design and development agency." The agency wins what its niche names. It does not yet win "best Webflow agency for B2B SaaS" - the actual money query.
- Edgar Allan, 2x prior Agency of the Year, scores zero. Historic awards decay out of AI retrieval. Recency outranks legacy in the index.
Per-query retrieval signal (Perplexity)
Which agencies surface for which buyer query in Perplexity's top 10 results. Y = appeared, blank = absent.
| Query | BX Studio | MakeBuild | N4 Studio | Edgar Allan | TAG |
|---|---|---|---|---|---|
| AI visibility consultant for Webflow sites | · | · | · | · | · |
| Answer Engine Optimization agency for B2B technology | · | · | · | · | · |
| Webflow Enterprise Partner of the Year 2025 | Y | Y | · | · | · |
| Webflow agency for Fortune 500 companies | · | · | · | · | · |
| best AEO agency for enterprise | · | · | · | · | · |
| best Webflow agency for B2B SaaS | · | · | · | · | · |
| best Webflow development agency | Y | · | · | · | · |
| best agency for enterprise Webflow migration | · | · | Y | · | · |
| top Webflow Enterprise Partner agency | · | · | · | · | · |
| top Webflow design and development agency | Y | · | · | · | · |
The Layer 1/2 gap
The same engine also runs Layer 1 (technical SEO) and Layer 2 (schema) checks against each domain. This is where the surprise sits.
| Check | BX Studio | MakeBuild | N4 Studio | Edgar Allan | TAG |
|---|---|---|---|---|---|
| robots.txt allows AI bots | ✓ | ✓ | ✓ | ✓ | ✓ |
| HTTPS enforced | ✓ | ✓ | ✓ | ✓ | ✓ |
| /llms.txt present | · | · | · | · | · |
| Organization schema | ✓ | ✓ | ✓ | ✓ | ✓ |
| Person schema (founder) | · | · | · | · | · |
| Service schema | · | · | · | · | · |
| BreadcrumbList | · | · | · | · | · |
| FAQPage | · | · | · | · | · |
| AggregateRating | · | · | · | · | · |
The one finding that stops the presses
All 5 Webflow EPOTY 2025 agencies are missing /llms.txt - the canonical AI-discoverability file that tells LLMs how to surface your brand. Including BX Studio, the agency that won the Answer Engine Optimization category award. Including N4, who built webflow.com itself.
For comparison, the engine reports /llms.txt present on 9 of my own 10 properties (lesli.com, rosebull.com, frederictonlocalseo.com, abra1st.com, pedigreedatabase.ca, bulldoggeregistry.com, bearvalleypuppies.com, visiblevet.com, frederictondirectory.com).
This is not a takedown. It's a 30-minute fix that costs nothing to ship. It is exactly the kind of inside-out work I'd want to be doing on day 1 of a role like the one BX Studio is hiring for.
Architecture
Three-platform triangulation isn't obvious. The reasoning behind each choice:
Platform 1
Anthropic Claude
claude-haiku-4-5 via @anthropic-ai/sdk. Pure LLM citation: does Claude name the brand without retrieval? Tests training-corpus presence.
Platform 2
OpenAI ChatGPT
gpt-4o-mini via openai SDK. Second LLM, different training corpus, similar test. Triangulates against Anthropic to filter single-model bias.
Platform 3
Perplexity Search
Search API (not Sonar, not Agent). Returns raw ranked web results - Perplexity's retrieval index. Tests the gating condition: you can't be cited if you're not in the source pool.
The Perplexity choice is the non-obvious one. Sonar API would add LLM inference cost and noise; Agent API would duplicate the LLM-citation cells from Claude and OpenAI. Search API uniquely tests the retrieval condition every other AI-visibility platform misses.
Stack
The $30 cost-overrun (the part most demos hide)
On 2026-05-09 I ran the first full 3-platform baseline across all 10 of my own sites. 300 score cells plus 100 task cells, executed in about 90 minutes. I opened the Anthropic console the next morning to $30.85 burned on a single key in a single day. 10 sites × 10 queries × claude-haiku-4-5 is a much larger fan-out than I had cost-modeled.
I disabled the key end of day. Before re-enabling, I shipped three changes:
- Per-key daily spend cap set in the Anthropic console, scoped to AIVS work only.
- Result caching on the (site × query × platform × day) tuple so re-runs of the same baseline don't re-charge.
- Moved the production cron from daily to weekly until cost-per-run was profiled.
Today this comparator run cost approximately $1.00 against the same engine. The fix worked. The reason this section is on this page: most demos hide failures. The engineering judgment to ship the fix is the actual signal.
Different AI tiers see different competitive sets
The engine above runs gpt-4o-mini and claude-haiku-4-5. Those are the cron-affordable models. They cost roughly one tenth of the full-tier models. They're the price point sustainable on a weekly automated cadence across many sites. They are also what an AI-visibility tool tests by default.
Spot-checking the same query ("best Webflow agency for B2B SaaS") against four AI surfaces reveals a striking pattern. Almost no overlap.
| AI surface | Recommends |
|---|---|
| ChatGPT web UI (full model + retrieval) | BX Studio, Flow Ninja, Finsweet, Curio Digital, Veza Digital, Crew |
| gpt-4o-mini (cron-affordable, no retrieval) | Finsweet, Flowbase, Pixel Geek, Webflow Experts, Usethe.com |
| claude-haiku-4-5 (cron-affordable, no retrieval) | Relume, Bravo Studio, BANKNOTE, Pixel Poets |
| Webflow EPOTY 2025 finalists (industry award) | BX Studio (winner), Edgar Allan, MakeBuild, N4 Studio, TAG |
The finding that matters
Across these four lists, Finsweet is the only agency that appears in two of them. Not BX Studio, not any of the finalists. Each AI surface has its own canonical answer and the answers barely overlap. That is the entire AEO problem in one screenshot: there is no single AI ranking, there are at least four parallel ones, and most agencies optimize for none of them deliberately.
Honest disclosure on the headline data: the "zero LLM citations" finding above is specifically about the cron-affordable LLM tier surfacing the EPOTY 2025 finalists. ChatGPT's web UI does cite BX Studio. So does Perplexity. The cheap-LLM gap is the gap that compounds at scale, because that's the price point of automated monitoring across hundreds of queries.
How I'd boost BX Studio's AIVS score from 10% to ~33% in 90 days
This is the Earned Media Campaign methodology I run for my own AiViz book launch (Sept-Nov 2026). Same playbook, applied to BX Studio. Four phases, 90 days, with the score lift compounding 6-12 months after.
Phase 1 · Days 1-7 · Foundation
Ship /llms.txt + missing schema
Layer 1/2 work the engine flagged. /llms.txt at bx.studio/llms.txt (reference: lesli.com/llms.txt). Person schema for Jacob Sussman and Nikiya Griffith. Service schema on every offer page. BreadcrumbList sitewide. FAQPage on the SEO and AEO landing pages. ~6 hours of focused work. These changes don't move the score directly, but they make every citation surface that follows readable to the cron-affordable LLMs.
Phase 2 · Days 8-30 · Podcast pitch sprint
Book Jacob Sussman on 6-8 anchor B2B SaaS podcasts
Hook: BX Studio positions itself as an embedded growth partner for high-growth companies, with Webflow development paired with measurable marketing outcomes (per their own llms.txt). ChatGPT's web UI already paraphrases that positioning back as "websites as revenue engines, not brochures" - meaning the canonical AI summary of the brand and the brand's own positioning already align. That's a pitchable narrative for a podcast host: the agency that built websites the way Verifone, Headspace, and Reddit needed them built, then won the AEO category award for doing it.
Verified anchor target list, current as of 2026-05-13:
- The Dave Gerhardt Show (Exit Five) - heavyweight B2B SaaS marketing audience
- Breaking B2B (Sam Dunning) - B2B/demand gen, top-10 ranked
- The B2B Playbook (George Coudounaris + Kevin Chen) - demand gen, "5 BEs" framework
- The SaaS Marketing Show - case-study format, growth-leader interviews
- The Growth Hub Podcast - SaaS marketing leaders
- Demand Gen Report podcast - ABM + demand gen practitioners
- One Webflow-ecosystem show (selected after a current-active-shows check)
- One AEO/AI-search-specific show (same)
Target: 3 booked within 30 days. Each transcript page, episode show notes, and host's newsletter recap = 1-3 Layer-5 citation seeds.
Phase 3 · Days 31-60 · Long-form content flywheel
4 LinkedIn articles + 2 BX Studio insights pieces + 1 case-study deep-dive
Founder-led and director-led content, not anonymous agency posts. Each piece becomes a retrievable citation surface AND syndicates to bx.studio/insights. Topic angles tuned to surface-specific gaps the engine flagged: B2B SaaS Webflow capabilities, AEO methodology comparison, Verifone or Headspace case study with AEO angle, "why most Webflow sites fail at AI visibility" thought piece. Reference pattern: /entity-anchor-method.
Phase 4 · Days 61-90 · Listicle and directory seeding
Roundup placements + Clutch/G2 enrichment
Pitch BX Studio for inclusion in 5 "best Webflow agency for B2B SaaS" / "best AEO agency" roundups. Each accepted = a high-authority third-party citation surface. Audit + enrich Clutch profile (case studies, reviews, awards display) and the Webflow expert directory profile. These directory entries feed LLM training datasets directly, which is the slow but durable path to cron-affordable-LLM citation.
Projected outcome · 90 days
10% → ~33% AIVS, with full lift compounding 6-12 months out
Per-platform forecast:
- Perplexity: 3/10 → 6/10. Retrieval indexes pick up new citation surfaces within 2-6 weeks. Most of the 90-day lift lands here.
- OpenAI gpt-4o-mini: 0/10 → 2/10. Partial. Training-cutoff lag means the cheap-LLM citations only reflect content surfaces that existed at the model's training time. Full lift extends to the next major model retraining (typically 6-12 months).
- Anthropic claude-haiku-4-5: 0/10 → 2/10. Same dynamics as OpenAI mini.
Total: roughly 10% → 33% on the engine's headline metric within 90 days. Honest caveat: this forecast is based on the same methodology applied to my own sites - rosebull.com went from a Layer-5 baseline of 0% to 30% in 4 days after its 2026-05-05 standalone Next.js rebuild. The retrieval-driven platforms move fastest. Cron-affordable LLMs follow as models retrain.
This is the work I want to be doing. The Earned Media Campaign methodology, the AIVS scoring engine that measures whether it worked, and a hiring committee that already specializes in AEO so the methodology lands without translation.
See the engine running
The data above is read live from the same SQLite database that powers my internal admin dashboard. A read-only mirror is exposed publicly so you can click through it.
Built by Lesli Rose· AI Visibility consultant, Harvey NB · Owner of the AI Visibility Stack™ methodology. This page reads live from data/aivs.db. The 5-agency comparator dataset was generated 2026-05-13. The 10 buyer-intent queries are listed under the table above. All scores reproducible.
