What the scan checks
Six signals that actually move AI citations.
Six checks with a direct causal line to whether ChatGPT, Claude, and Perplexity mention you in their answers — the ones that actually move citations, rather than a bloated checklist.
Signal 01
AI Crawler Access
What: Your robots.txt plus a live request with GPTBot's user-agent to confirm your server actually lets the bot through.
Why: Cloudflare blocks AI crawlers by default since July 2025. Plenty of sites are invisible to ChatGPT and don't know it.
What: Organization schema with a sameAs array linking to Wikipedia, LinkedIn, or similar authoritative profiles. If the scanned page does not have it, we also check the homepage (where it usually lives).
Why: LLMs disambiguate your brand by cross-referencing sameAs links. Without them, you are one of many companies with your name.
What: The page's H1 length, hero paragraph length, brand presence, boilerplate patterns, and — via TextRazor — entity density and subjectivity.
Why: LLMs quote the first 40–60 words of a page more than any other section. This signal checks whether those words earn the quote.
Signal 04
Structured Data Coverage
What: Every JSON-LD @type on the scanned page. We flag whether you have the citation-driving types (FAQPage, Article, Product, HowTo).
Why: FAQPage markup alone lifts citation rates measurably. Baseline Organization schema is not enough on its own.
Signal 05
Technical Integrity
What: HTTPS, Strict-Transport-Security header, canonical tag, meta robots (checking for accidental noindex), and whether sitemap.xml exists at the site root.
Why: One accidental noindex or a cross-domain canonical silently kills discoverability. These are the "check the basics before debugging anything fancy" signals.
What: LCP, INP, and CLS for the scanned URL on mobile, pulled from Google's CrUX field data via PageSpeed Insights (real users, not synthetic lab tests).
Why: Slow pages get fewer citations. Google's threshold is not marketing — AI Overviews filter by it.