Skip to content

Research · 2026-05-14

LLM Ranking Factors — what actually moves AI visibility.

The largest public dataset on what drives ChatGPT citations: 145 industries, 1,595 buyer personas, 13 signals, read against four years of our own field work. The headline finding is that the dominant signal swings hard by vertical, so generic ranking-factor advice misses where the work actually is.

TL;DR

  • No single signal dominates. Even the strongest pooled correlation explains under 6% of variance. Anyone selling a universal AI-visibility lever is selling theatre.
  • SERP visibility is the foundation. Search Engine Appearances, Best Rank, and SE Outbound Links cluster strongly in most verticals. The LLM reads what Google has already surfaced.
  • Wikidata is category-dependent. +0.706 in furniture, −0.800 in senior home care. Generic entity advice actively harms 20+ industries.
  • Niche specialists beat generalists. Across 100+ verticals, big-brand sites underperform what their backlink profile would predict. Topical authority does more work than raw size.
  • Homepage keyword matching is noise. ρ=+0.072. Old on-page keyword work does nothing for AI citations.

The dataset covers ChatGPT only. Behaviour across Claude, Perplexity, Gemini, and AI Overviews is measured separately, and covered in the playbook section below.

The thirteen signals

OppAlerts correlated ChatGPT recommendations against 13 web signals using Spearman rank correlation, across 1.1B web pages, 5B Reddit posts, and 4B backlinks. The pooled baseline looks like this.

SignalSpearman ρ
Search Engine Appearances+0.2415.8%
Best Search Engine Rank+0.2385.7%
SE Outbound Links+0.2305.3%
Backlink Count+0.2044.2%
BL Authority+0.2004.0%
PageRank+0.1943.7%
Harmonic Centrality+0.1692.8%
Wikidata (pooled)+0.1201.4%
Wikipedia Citations (pooled)+0.0770.6%
Homepage Keywords+0.0720.5%

Two things are missing from the test list: schema/JSON-LD and llms.txt. They aren't in this dataset because the people who built it didn't expect them to matter. The independent causal evidence agrees. Ahrefs ran 1,885 pages that added JSON-LD against 4,000 matched controls from Aug 2025 to Mar 2026: null effect on ChatGPT and AI Mode, and a small negative effect on Google AI Overviews. Five major LLMs ignored JSON-LD entirely during retrieval. Schema still earns its keep for classic Google rich results, but it does not move AI citations.

Three patterns to calibrate by

Across the 145 industries, three configurations keep showing up. Each one points to a different place to spend.

Pattern A

SERP-dominant

Top signals: Search Engine Appearances, Best SE Rank, SE Outbound Links. Health insurance, wireless carriers, streaming services, food delivery, mortgage lenders — most consumer comparison categories.

Posture: classic SEO is the lever. Internal linking, SERP-feature placement, ranking depth in Google. The LLM reads what Google already surfaces, so the job is to get Google to surface you better.

Pattern B

Entity-dependent

Top signals: Wikidata, Wikipedia Citations, SE Outbound Links. Furniture stores (+0.706 Wikidata), hotels (+0.650), accounting software (+0.515), ERP, retail. The LLM weights structured entity recognition heavily here.

Posture: Wikipedia notability, structured entity disambiguation, sameAs links. Check the sign for the vertical first. Wikidata correlates negatively in 20+ verticals, and recommending it in one of them actively hurts the citation profile.

Pattern C

Authority-dependent

Top signals: PageRank, Backlink Count, Harmonic Centrality. Auto insurance (HC +0.638), hospitals, finance, the regulated trust-critical sectors.

Posture: digital PR, asset-driven link earning, original-research campaigns. Harmonic Centrality outperforming raw backlink count means position in the link graph carries more weight than volume. The work is earning placement alongside category leaders rather than collecting more domains.

Where generic advice does the most damage

Wikidata is the most variable signal in the dataset. The same advice produces opposite outcomes depending on the vertical, and that is where most generic AI-visibility recommendations fall apart.

Wikidata positive — invest

  • Furniture stores+0.706
  • ERP software+0.655
  • Hotels & resorts+0.650
  • Accounting software+0.515
  • Skincare brands+0.428

Wikidata negative — do not invest

  • Senior home care−0.800
  • Garage-door services−0.632
  • Payroll / PEO−0.587
  • Office furniture−0.452
  • Homebuilders−0.442

The working hypothesis on why: in negative-sign verticals, Wikidata entries cluster around aggregators and generalists that LLMs actively deprioritize in favour of vertical specialists. Recommending Wikidata work there does worse than nothing. It pulls the client toward a citation profile the LLM penalises.

How we use this

Five moves we run on every opportunity review where AI visibility is on the scorecard.

  1. 01

    Look up the vertical first.

    Before any recommendation, we match the client's industry to the 145 OppAlerts profiles and pull the top three signals for that vertical. Skipping this step is the failure mode.

  2. 02

    Lead with SERP work when in doubt.

    Search Engine Appearances is the most consistent signal across the 145 verticals. When a new client has no clean vertical match, this is the default opening move.

  3. 03

    Verify the Wikidata sign before recommending entity work.

    This is the most expensive mistake the dataset reveals. We check the sign for the client's industry every time before any entity recommendation goes into the plan.

  4. 04

    Treat Harmonic Centrality as its own signal.

    Where you sit in the link graph carries information that raw link count misses, especially in trust-critical verticals like insurance, finance, and healthcare. The work is earning placement alongside category leaders.

  5. 05

    Measure across every LLM that matters.

    The OppAlerts dataset only covers ChatGPT. Our own multi-LLM citation test runs across ChatGPT, Claude, and Perplexity at engagement start, and again every quarter, so the trajectory we report is the trajectory the client is actually on.

What we won't sell you

Three popular AI-visibility tactics fail under evidence. We leave them off every proposal.

Schema / JSON-LD as an AI-visibility lever

Ahrefs ran a causal study from Aug 2025 to Mar 2026: 1,885 pages adding JSON-LD against 4,000 matched controls. Null effect on ChatGPT, null on Google AI Mode, and a small negative effect on AI Overviews. Five major LLMs extracted only visible HTML during retrieval. Schema still earns its keep for classic Google rich results, but it does nothing for AI citations.

llms.txt

No major AI crawler honours it. There is no measurable effect on citations, indexing, or training inclusion, and it is not an emerging standard. The implementation time goes further on visible HTML and topical authority.

Homepage keyword matching as a primary tactic

Cross-industry ρ=+0.072 makes it the least-correlated signal in the dataset. Old on-page keyword work does nothing for AI visibility.

Want to see where your domain actually stands?

Every opportunity review includes a multi-LLM citation test across ChatGPT, Claude, and Perplexity, scored against the OppAlerts baseline for your industry. You see where you are getting cited, where competitors are getting cited in your place, and what to change first.

Request an opportunity review →