Gist GEO

Where Does ChatGPT Get Its Information? The 2026 Citation Stack Explained

Surreal library scene with swirling equations, clocks, keys, and keyholes forming a dreamlike portal.
Back
Next

THE GIST

ChatGPT pulls answers from three retrieval layers, and each one rewards different authority signals. Parametric memory holds the patterns frozen during training, and rewards cross-citation density and topical specialization. The OAI-SearchBot index is what ChatGPT Search queries when retrieval is needed, and rewards structural clarity and indexable content. ChatGPT-User is the on-demand fetch, and rewards recency and crawlable HTML. Authority for AI search is emergent across signals, never granted by any single one. Run a Gist GEO audit to see your Share of voice, Share of citations, Earned media score, and Citation rate across each of the three layers.

Down the Rabbit Hole

When Alice fell into Wonderland, she wondered where everything came from. That same curiosity happens here. You ask ChatGPT a question, an answer arrives, and somewhere behind a mysterious curtain three very different rooms are pulling threads and weaving it together to make something magical.

One room is quite dim and old. A Caterpillar lives there, smoking a hookah, repeating marvelous patterns it learned long ago. Another is brightly lit and tidy, the kind of striking place that runs on schema and well-set tables. The third has an astonishing mirror you can step through to fetch a single page, right now, just for you.

This is the journey finding (and getting found in) all three.

How ChatGPT Actually Sources Its Wondrous Answers (The Three-Layer Stack)

ChatGPT answers from three retrieval layers, each documented in OpenAI's own crawler docs. Layer 1 is parametric memory: the patterns frozen during training (fed historically by GPTBot). Layer 2 is OpenAI's search index, queried by ChatGPT Search and built by OAI-SearchBot. Layer 3 is on-demand fetch via ChatGPT-User, fired when a user references a URL or ChatGPT decides to retrieve a specific page. Three rooms, then. The Caterpillar's mysterious haze (parametric memory), the well-set and delightful tea table (the OAI-SearchBot index), and the looking-glass fetch (ChatGPT-User). One perfectly blended brew, three striking pots. Per Contently, April 2026, about 60% of ChatGPT responses pour from parametric memory and the remaining 40% involve real-time retrieval (Layers 2 and 3 combined). The point of generative engine optimization (GEO) is to be safely at home in all three rooms, not pleading at a single door.

The Spellbinding Sources of Authority ChatGPT Actually Cares About

ChatGPT does not apply a single "authority score" the way Google PageRank does. Sources of authority are emergent from a handful of signals AI engines weigh during training and retrieval: cross-citation density, topical specialization, recency, structural clarity, and named human authorship (E-E-A-T). Each of these signals maps directly to a Gist GEO Reports metric: Earned media score, Share of voice, Citation rate, Share of found links, and Sentiment.

Wonderland never runs on one ranking, of course. The Queen's Court awards favor by five different rituals, and so does the model.

What does NOT make a source authoritative in this fantastical realm: domain age alone, raw backlink counts alone, or self-promotional brand pages. Alas, Wonderland rules differ from the boring old kingdom of SEO. Is ChatGPT a reliable source? Sometimes, depending on which room answered. For the deeper measurement view, see Goldilocks and the Three Measurement Frameworks.

How to Conjure Authority for Each Layer (4 Tactics)

Match the authority signal to the layer. Earn cross-citations on the domains AI already trusts to land in parametric memory. Allow OAI-SearchBot and structure for retrieval to land in the search index. Refresh and allow ChatGPT-User to land in on-demand fetch. Track Share of citations to verify which layer is moving.

1. Earn cross-citations on authoritative domains (parametric layer)

The clever Caterpillar listens for who else has spoken about you. Ahrefs May 2025 study of 75,000 brands found brand web mentions correlate 0.664 with AI Overview brand visibility, while backlinks correlate 0.218. Mentions are roughly 3x stronger.

Tactic: identify the third-party domains AI cites in your category, then aim PR and guest content at those exact addresses. The list of domains AI already trusts matters more than DR alone. (Tracked via Earned media score and Share of voice.) For a sister angle, see How to Improve Brand Visibility in AI Search Engines.

2. Allow OAI-SearchBot and structure for retrieval (search-index layer)

Per OpenAI's crawler docs: "Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers." Skip this and poof! Layer 2 forgets you exist.

Tactic: confirm OAI-SearchBot is allowed in robots.txt, and Bingbot too (OpenAI's index has historically pulled signal from Bing). Use clear H1, H2, H3 hierarchy. Add FAQ schema. Submit your sitemap. The tea table seats only those who came in tidy. (Tracked via Share of citations and Share of found links.) For more on the extraction craft, see Answer Engine Optimization (the Full Picture).

3. Refresh and allow ChatGPT-User (on-demand fetch layer)

Layer 3 only steps through the looking-glass if the door is open. Per The Register, December 2025, 5.6 million sites have added GPTBot to their disallow list, up from 3.3 million in July 2025. If you are among them, Layers 1 and 3 likely cannot see you either. How awful!

Bots to allow: GPTBot, OAI-SearchBot, ChatGPT-User, OAI-AdsBot, PerplexityBot, Google-Extended, ClaudeBot, and ProRata-AI-Crawler (Gist's crawler docs).

But wait. A second trapdoor: Layer 3 stumbles on JS-heavy pages. If your answer renders client-side, the crawler reaches the room and finds it empty. (Tracked via Citation rate.)

4. Measure with Share of citations (Gist GEO)

Rising Share of citations means your site is landing in Layers 2 and 3, and earning the measly breadcrumb the model leaves behind, not relying solely on Layer 1's frozen memories.

The broader product family in one line: Gist GEO for measurement, Gist Answers for the publisher-side AI answer surface, Gist Ads for AI-native ad placements.

Run a Gist GEO audit and discover your baseline across the three layers.

How to Amazingly Audit Your Citation Stack

Audit each layer separately, against its matching Gist GEO metric. Layer 1 (parametric): track Share of voice with Search disabled. Layer 2 (search index): track Share of citations and Share of found links on prompts that trigger Search. Layer 3 (on-demand): track Citation rate on pages ChatGPT-User has fetched. Together they show which layer is moving and which is stuck.

Audit Layer 1 (parametric memory)

Take your top 20 category prompts and run them in ChatGPT with Search disabled. Log whether your brand appears, whether it appears favorably, and which competing brands appear with you. That is your Share of voice baseline for the Caterpillar's room.

Then run the same prompts with Search enabled. The delta between the two is what Layers 2 and 3 are adding (or failing to add).

Audit Layer 2 (OAI-SearchBot index)

Confirm OAI-SearchBot is allowed in robots.txt. Confirm Bingbot is allowed. Run your top prompts in Bing and check whether your domain sits in the top 3.

A common pitfall: site allows OAI-SearchBot, but the page lacks schema, clean H1/H2/H3 structure, or a submitted sitemap. Rising Share of found links without rising Share of citations means AI is finding your pages and choosing not to link back yet. The room is set; the dish needs work.

Audit Layer 3 (ChatGPT-User on-demand fetch)

Test the looking-glass directly: paste your URL into ChatGPT and ask for a summary. If the answer is "I cannot access that URL" or generic text, ChatGPT-User is being blocked, or the page cannot be crawled. Repeat for your top 10 pages by traffic. The pages that fail are the ones dragging down Citation rate.

Run a Gist GEO audit to track all 9 metrics across the three layers automatically.

Where Most Brands Get This Frightfully Wrong

Three common mistakes block brands from each layer. They optimize for backlinks instead of cross-citations (parametric layer ignores backlinks). They block GPTBot or ChatGPT-User accidentally and never realize it (Layers 1 and 3 silently invisible). They publish content older than 90 days without refreshes (on-demand layer skips stale pages). All three are fixable in a sprint.

1. Optimizing for backlinks instead of mentions. Old-kingdom thinking. The Caterpillar listens for the chorus that talks about you across Wonderland, and Earned media score stays flat when you stockpile backlinks alone.

2. Accidentally blocking GPTBot or ChatGPT-User. A line you never noticed in robots.txt, and Layers 1 and 3 stop seeing you altogether. You're invisible again! Share of citations and Share of found links flatline, and most teams find the line in the first hour of an audit. The fix is one paragraph.

3. Publishing without a refresh cadence. The on-demand layer prefers fresh portraits. Pages older than 90 days fade off the wall like a painting in the Queen's court, and Citation rate drifts down for months before anyone notices. For the deeper measurement view, see Goldilocks and the Three Measurement Frameworks.

FAQ

Is ChatGPT a reliable source?

Depends on the layer. Parametric answers are reliable for general knowledge, but SparkToro 2026 found a less than 1-in-100 chance of identical brand lists across 100 runs of the same prompt. Search-index and on-demand answers are as reliable as the source ChatGPT cites, and far less prone to fabricated citations than parametric-only mode.

Can ChatGPT cite its sources?

Yes, when in Search or browse mode. ChatGPT Search and the "Search the web" toggle force the OAI-SearchBot path, which surfaces real URLs. Pure parametric answers can't cite reliably, and sometimes invent a citation that sounds correct without actually existing. If sourcing matters, force a search-enabled mode and verify each link before using it.

Does ChatGPT make up sources?

In parametric mode, sometimes. The model is pattern-completing rather than retrieving, so a plausible-looking citation may be fabricated. Layers 2 and 3 hallucinate far less, since the answer is anchored to a real fetch. To reduce invented citations, force ChatGPT into a search-enabled mode, or use Gist GEO to audit which prompts trigger which layer.

Does ChatGPT use Bing or Google?

ChatGPT primarily uses OpenAI's own search index, built by OAI-SearchBot. The index has historically pulled signal from Bing's results, meaning a strong Bing ranking gives you a better shot at appearing in ChatGPT Search answers. Google's index is not directly used. Allow OAI-SearchBot and Bingbot in your robots.txt to maximize Layer 2 retrieval coverage.

How can I get my brand mentioned more often in ChatGPT?

Build authority signals across all three retrieval layers. Earn cross-citations on the third-party domains AI already trusts (moves Earned media score). Allow OAI-SearchBot and structure content for retrieval (moves Share of citations and Share of found links). Refresh your top pages and confirm ChatGPT-User can crawl them (moves Citation rate). Run a Gist GEO audit to see which signal is moving and which is flat.

No items found.

Blog

Read up on our latest features by visiting our blog

Fantasy map of mountains, forests, and walled towers drawn in blue ink on parchment.
Gist GEO
How to Rank in Google AI Overviews: A Publisher's 3-Pillar Method
Read
A person in headphones works at a console with monitors showing oscilloscope waveforms and a radio telescope.
Gist GEO
How to Improve Brand Visibility in AI Search Engines (2026 Playbook)

Most brands are invisible in AI answers. Delante's 2026 audit found 73% of brands don't surface in ChatGPT or Perplexity category recommendations. The fix is generative engine optimization.

Read
Noir detective hero illustration for a case file on how AI concierges like ChatGPT and Perplexity replaced online travel agen
Gist GEO
The OTA Is Dead: AI Concierges Replace Online Travel Agents

Travelers once spent five hours across 38 sites. Now they spend one conversation. A noir case file on how AI concierges replaced the OTA and what brands do next.

Read
No items found.