Technical

The 50-point technical GEO audit

A complete technical checklist to make a site under 500 pages fully legible to Google, Bing and the AI answer engines - crawlability, rendering, schema, Core Web Vitals and the AI-specific items most audits miss.

Ritik Namdev Ritik Namdev ·Published May 28, 2026 ·Updated Jul 5, 2026 ·16 min read
The short answer

AI legibility is built on top of a technically sound site, not instead of one. This is a 50-point audit across seven categories - crawlability, indexability, architecture, rendering, schema, Core Web Vitals, and AI-specific GEO items. The one rule that separates a GEO audit from an ordinary SEO audit: the major AI crawlers don't render JavaScript, so every load-bearing fact must exist in server-rendered HTML. Work top to bottom; each item takes minutes and closes a gap that can quietly cost you rankings or citations.

Most "SEO audits" optimise for one reader: Googlebot. But your content now has a second audience - ChatGPT, Perplexity, Gemini, Copilot and Google AI Overviews - that reads differently, renders differently, and rewards different things. This checklist covers both. It's built for sites under ~500 pages, where you can realistically fix all 50 items, and every threshold and statistic is sourced to 2025–2026 primary data.

Why a GEO audit is different

Three things change when you audit for AI, not just Google. First, rendering is the dividing line - Google renders JavaScript (with a ~5-second median delay), but GPTBot, ClaudeBot and PerplexityBot do not, so client-side content is invisible to them. Second, the goal shifts from ranking to citation - you're competing to be the extracted, quoted source, which rewards answer-first structure and discrete facts. Third, entity and authority signals dominate - sameAs, author schema and off-site brand mentions weigh more heavily in AI source selection than in classic ranking. Keep those three in mind as you work the list.

A. Crawlability (points 1–8)

If a crawler can't reach a page, nothing else matters. Start here - a single misconfigured line can wall off your whole site.

  1. robots.txt is valid and non-blocking - a stray Disallow: / can hide your entire site; test it in Search Console and Bing Webmaster Tools.
  2. Critical CSS/JS is crawlable - Google renders with headless Chromium; blocking resources makes the rendered page look broken.
  3. XML sitemap submitted and clean - keep it current (≤50,000 URLs / 50MB) so engines know which URLs matter.
  4. No orphan pages - every indexable URL needs at least one internal link, or it's effectively invisible.
  5. Server returns fast, stable 200s - 5xx errors and slow TTFB throttle crawl rate.
  6. Redirect chains and loops eliminated - use single-hop 301s; chains waste crawl budget and dilute signals.
  7. Status-code hygiene - removed pages return 404/410, not soft-404s that keep engines re-crawling dead URLs.
  8. Faceted / parameter URLs controlled - canonicalise or block infinite filter/sort/session-ID combinations.

B. Indexability (points 9–15)

Crawlable isn't the same as indexable. These seven catch the silent index-killers.

  1. No accidental noindex - a stray meta tag or X-Robots-Tag header is the #1 cause of pages vanishing from the index.
  2. Canonical tags self-referential and correct - consolidate duplicate and parameter versions onto one URL.
  3. HTTPS everywhere with a valid cert - a confirmed ranking signal and a trust prerequisite for AI engines.
  4. One resolving hostname - enforce www-vs-non-www and trailing-slash consistency via 301 to avoid splitting signals.
  5. Meta robots directives audited - check for unintended nofollow, nosnippet or max-snippet limits that suppress AI snippets.
  6. Hreflang correct (if multilingual) - missing reciprocals cause the wrong language/region version to index.
  7. Thin / duplicate content consolidated - near-duplicates compete with each other and waste the crawl.

C. Site architecture & internal linking (points 16–22)

Structure is how crawlers and AI models understand what your site is about.

  1. Flat hierarchy (≤3 clicks to any page) - shallow depth ensures every page is reached and prioritised on a small site.
  2. Descriptive, static URL slugs - readable, keyword-bearing URLs help ranking and AI entity extraction.
  3. Descriptive internal anchor text - anchors are a primary relevance signal; "click here" wastes it.
  4. Breadcrumbs with BreadcrumbList schema - clarify hierarchy for users, search and answer engines.
  5. Contextual internal links between related pages - passes authority and helps AI map your topical clusters.
  6. Hub pages / HTML sitemap - topical anchors that improve discovery and demonstrate authority.
  7. Pagination handled cleanly - use crawlable <a href> links, not JS-only, so deep items are discoverable.

D. Rendering & JavaScript (points 23–29)

This is the category most SEO audits skip and most AI-visibility problems hide in. Because the major AI crawlers don't execute JavaScript, anything injected client-side simply doesn't exist to them.

  1. Core content in the initial HTML (SSR/SSG) - server-rendered or static HTML is the only content most AI crawlers can read.
  2. Client-side-rendered content verified - use URL Inspection's "view rendered HTML" to confirm JS content appears to Googlebot.
  3. Links use real <a href> tags - crawlers follow standard anchors, not onclick navigation.
  4. No content locked behind interaction - tabs/accordions requiring JS events may never be seen by crawlers.
  5. Lazy-loaded content is crawlable - use native loading="lazy" and confirm content loads on scroll simulation.
  6. Reasonable JS payload - heavy bundles delay rendering and hurt INP; keep them lean.
  7. Graceful no-JS fallback for key facts - GPTBot, ClaudeBot, PerplexityBot and others read only raw HTML, so primary facts must live there.
The single highest-impact fix

If your site is a client-rendered SPA, moving to server-side rendering or static generation is the biggest GEO win available. View any page's source (Ctrl/Cmd+U): if your headline and key facts aren't in the raw HTML, the AI engines can't read them - and can't cite them.

E. Structured data / schema (points 30–36)

Schema won't single-handedly win you AI citations - but it clarifies your entity and earns Google rich results. Use JSON-LD, validate everything, and prioritise the types below.

  1. Valid JSON-LD (no errors) - check with Google's Rich Results Test and the Schema Markup Validator.
  2. Organization + sameAs - establishes your brand as a recognised entity and links it to authoritative profiles.
  3. Article / BlogPosting with author + datePublished - signals freshness, authorship and E-E-A-T.
  4. Person schema for authors - ties content to a credentialed entity, reinforcing expertise.
  5. FAQPage / QAPage where genuinely applicable - structures Q&A in an extractable, answer-engine-friendly format.
  6. Product / Offer / Review (e-commerce) - powers price, availability and rating rich results.
  7. BreadcrumbList + WebSite / WebPage - reinforces structure and enables sitelinks search box eligibility.

You can generate all of these with the free schema markup generator. One honest caveat: the best controlled study to date found schema doesn't move AI citations on its own - so justify it by rich results and entity clarity, not citation promises.

Schema typeWhat it signals to search + AI engines
Organization + sameAsBrand identity as a recognised entity; disambiguation via authoritative profiles
Article / BlogPostingContent type, headline, publish/modified dates (freshness), publisher
Person (author)Author identity and credentials → E-E-A-T / expertise
FAQPage / QAPageStructured Q&A in an extractable format
BreadcrumbListSite hierarchy and the page's place in it
Product / Offer / ReviewPrice, availability, ratings → rich results

F. Performance / Core Web Vitals (points 37–43)

Speed is a ranking system on Google and a crawl-rate factor everywhere. The three Core Web Vitals, measured at the 75th percentile of real-user loads:

MetricMeasuresGoodNeeds workPoor
LCP (Largest Contentful Paint)Load speed≤ 2.5s2.5–4.0s> 4.0s
INP (Interaction to Next Paint)Responsiveness≤ 200ms200–500ms> 500ms
CLS (Cumulative Layout Shift)Visual stability< 0.10.1–0.25> 0.25
  1. LCP ≤ 2.5s - your largest above-the-fold element loads fast.
  2. INP ≤ 200ms - the metric that replaced FID in March 2024; the page responds quickly to interaction.
  3. CLS < 0.1 - nothing jumps around as the page loads.
  4. Optimised, correctly-sized images (WebP/AVIF) - images are the most common LCP element.
  5. Explicit width/height or aspect-ratio on media - prevents layout shift.
  6. Fast TTFB / caching / CDN - server response underlies both LCP and crawl rate.
  7. Mobile-first verified - Google indexes the mobile version; it must contain all content and pass CWV.

How hard is this in practice? Here's where the web actually stands on mobile in 2025 - note that LCP, not responsiveness, is the main thing holding sites back:

Share of sites with 'good' Core Web Vitals on mobile (2025)
All CWV passing
48%
LCP good
62%
INP good
77%
CLS good
81%
HTTP Archive Web Almanac 2025. Only 48% of sites pass all three on mobile; LCP (62%) is the biggest bottleneck.

G. AI-crawler & GEO specifics (points 44–50)

These seven are what turn a technically-sound site into a citable one. This is the layer ordinary audits don't have.

  1. AI crawlers explicitly configured in robots.txt - decide access for OAI-SearchBot, PerplexityBot, Claude-SearchBot, etc.; blocking the search bots removes you from AI answers.
  2. IndexNow enabled - instantly notifies Bing (and thus ChatGPT/Copilot) of new URLs; Bing typically crawls within 1–3 days.
  3. Answer-first, extractable content - lead with concise, self-contained answers and clear headings so LLMs can lift and cite passages.
  4. Factual, citable formatting - stats, dates, lists and tables with sources are far more quotable for answer engines.
  5. Strong entity / authority signals - E-E-A-T, sameAs, and visible bylines help models trust you enough to cite you.
  6. llms.txt considered (optional, low-cost) - an emerging convention Google has said it won't support; treat it as experimental, not foundational.
  7. Content freshness with clear timestamps - visible "updated" dates and genuine refreshes influence both ranking and AI retrieval.

The tool stack to run it with

You don't need expensive software to run all 50 points - most of it is covered by free, first-party tools. Here's the minimum kit and what each covers:

  • Google Search Console (free) - indexation status, coverage errors, URL Inspection with rendered-HTML view, Core Web Vitals field data, mobile usability. Your primary lens for categories A, B and F.
  • Bing Webmaster Tools (free) - Bing indexation, IndexNow submission, and the crawl/Copilot data that Search Console can't give you. Essential for the GEO layer (category G).
  • A crawler - Screaming Frog (free up to 500 URLs, which is exactly this guide's ceiling) or Sitebulb crawls your whole site to surface orphan pages, redirect chains, missing canonicals, broken status codes, thin content and duplicate titles. Covers most of categories A, B and C in one pass.
  • Rich Results Test + Schema Markup Validator (free) - validates every JSON-LD block (category E).
  • PageSpeed Insights / CrUX (free) - LCP, INP and CLS against the real-user thresholds (category F).
  • Your raw server logs - the only way to confirm which AI crawlers actually visit, and how often. Filter by user-agent and verify by IP (category G) - the same technique detailed in the AI crawler guide.

The one thing none of these do well is tell you whether an AI engine can read your content. For that, the simplest test is free and manual: open a page, view source (Ctrl/Cmd+U), and search for your headline and key facts. If they're in the raw HTML, the AI crawlers can see them. If they only appear in the rendered DOM, you have a rendering problem no other tool will flag as urgent.

How to prioritise the 50 points

Not all 50 points are equal - fix them in dependency order, because early failures make later work pointless. A page an engine can't crawl can't be indexed; a page it can't render can't be read; a page it can't read can't be cited no matter how perfect its schema. So work outside-in:

  1. First, the blockers (categories A & B, points 1–15). A single Disallow: / or stray noindex can zero out everything downstream. Confirm nothing is walling off the site before touching anything else.
  2. Second, rendering (category D, points 23–29). This is the highest-leverage GEO fix and the one most audits skip. If your content isn't in the raw HTML, fixing schema and speed is rearranging deck chairs.
  3. Third, architecture and the GEO layer (C & G). Internal linking, IndexNow, answer-first structure and freshness - the things that turn a legible site into a cited one.
  4. Last, the enhancements (E & F). Schema and Core Web Vitals matter, but they're multipliers on a foundation that already works, not substitutes for it.

On a sub-500-page site, this whole sequence is a focused day or two of work - and unlike content or link-building, it's largely one-and-done. Fix it once, then re-audit on a light quarterly cadence to catch regressions (a redesign that reintroduces client-side rendering, a plugin that adds a rogue noindex, a CMS update that breaks schema).

Why this foundation compounds

The reason to do all of this now is that technical legibility is the one SEO investment that doesn't decay - and it's becoming more valuable, not less, as search fragments across engines. Content gets stale and needs refreshing; links can be lost; rankings fluctuate with every algorithm update. But a site that's crawlable, renderable in raw HTML, well-structured and fast stays legible to every new answer engine that appears - because they all read the same fundamentals. When the next AI search product launches, a technically sound site is eligible for it on day one, with zero extra work.

That's the quiet advantage of a GEO audit over chasing individual tactics. You're not optimising for one engine's current behaviour; you're making your site readable by any machine that shows up to read it. Get the 50 points right, and you've built a foundation that pays out across Google, Bing, ChatGPT, Perplexity, Gemini, Copilot - and whatever comes next.

Fifty points sounds like a lot, but on a site under 500 pages you can work through the whole list in a focused day or two. The order matters: get crawlability and rendering right first (points 1–29), because a page an AI engine can't read or render can never be cited - no amount of schema or freshness will save it. Then layer schema, speed and the GEO specifics on top. That's how you build a site that's legible to every reader that matters: human, Googlebot, and answer engine alike.

§ References

Sources

web.dev - Defining the Core Web Vitals metrics thresholdsweb.dev/articles/defining-core-web-vitals-thresholds web.dev - INP becomes a Core Web Vital on March 12web.dev/blog/inp-cwv-march-12 Google Search Central - Core Web Vitals & Searchdevelopers.google.com/search/docs/appearance/core-web-vitals HTTP Archive - 2025 Web Almanac: Performancealmanac.httparchive.org/en/2025/performance Google Search Central - Large site owner's guide to crawl budgetdevelopers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget Google Search Central - Build and submit a sitemapdevelopers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap Onely - Google's rendering delay is 5 secondswww.onely.com/blog/googles-rendering-delay-5-seconds Onely - Google needs 9× more time to crawl JS than HTMLwww.onely.com/blog/google-needs-9x-more-time-to-crawl-js-than-html Ahrefs - We tracked 1,885 pages adding schemaahrefs.com/blog/schema-ai-citations Cloudflare - From Googlebot to GPTBot: who's crawling in 2025blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025 Semrush - AI Overviews study (prevalence)www.semrush.com/blog/semrush-ai-overviews-study Bing Webmaster Blog - Keeping content discoverable with sitemaps in AI searchblogs.bing.com/webmaster/July-2025/Keeping-Content-Discoverable-with-Sitemaps-in-AI-Powered-Search Momentic - List of top AI search crawlers + user agentsmomenticmarketing.com/blog/ai-search-crawlers-bots Search Engine Land - Crawl budget: what you need to know in 2025searchengineland.com/crawl-budget-what-you-need-to-know-in-2025-448961
FAQ

Frequently asked questions

How is a technical GEO audit different from a normal SEO audit?
A classic SEO audit optimises for Google's crawl-render-index-rank pipeline. A GEO audit adds a second audience - AI answer engines - most of which don't render JavaScript and choose citations rather than ranking blue links. That shifts priority toward raw-HTML content, extractable facts, and entity/authority signals.
Do I need to worry about crawl budget on a 500-page site?
No. Google's own docs say crawl budget only matters for sites over ~1 million weekly-changing pages, or >10,000 daily-changing pages. Sub-500-page sites should focus on content quality, internal linking and rendering instead.
What are the current Core Web Vitals thresholds?
LCP ≤ 2.5s, INP ≤ 200ms, CLS < 0.1, all measured at the 75th percentile of real-user loads. INP replaced FID on March 12, 2024.
Does adding schema markup increase AI citations?
The largest controlled study (Ahrefs, 1,885 pages vs 4,000 controls) found no meaningful uplift - AI Overviews actually dipped slightly. Schema still earns Google rich results and reinforces your entity, so implement it for those reasons, not as a citation lever.
Should I block or allow AI crawlers?
If you want to appear in AI answers, allow the retrieval/search bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot). Blocking them removes you from those engines - see the AI crawler guide for exactly which to allow vs block.
My content loads via JavaScript - is that a problem?
For Google, usually not (it renders JS with a ~5-second median delay). For AI crawlers, yes - GPTBot, ClaudeBot and PerplexityBot don't execute JavaScript, so anything injected client-side is invisible to them. Use server-side rendering (SSR) or static generation (SSG).
Is llms.txt worth implementing?
It's a low-cost experiment at best. Google has said it won't support it and measured AI-crawler usage is negligible - see the llms.txt log test. Prioritise robots.txt, clean HTML and schema first.
How do I get Bing and ChatGPT to see new pages faster?
Enable IndexNow - Bing validates within 24 hours and typically crawls within 1–3 days. Because ChatGPT and Copilot draw on Bing's index, it's the fastest path into those answer engines.
Ritik Namdev
Written by

Ritik Namdev

Growth · SEO · GEO

Growth marketer documenting a brand-new site's climb into Google and the AI engines - in public, with real numbers. Every tactic here is tested on real sites before it's published.

Keep reading

Related guides

The Lab · Weekly

One experiment. Every week.

The field notes in your inbox - one thing I tested, the raw numbers behind it, and what it means for getting cited by AI.

Free forever. Unsubscribe anytime.