A few months ago I audited how AI tools describe businesses. I picked companies at random, searched for them in ChatGPT and Perplexity, and wrote down what came back. The results were bad. Wrong pricing. Phantom features. One SaaS product described as a meal kit subscription. These weren't obscure companies. They had real sites, real products, real customers. AI just didn't know they existed.
If you're launching something, this is worth paying attention to. More and more people are using AI tools to research products before buying. And if AI can't find you, or gets you wrong, that's a problem with no obvious fix.
This article explains what actually determines whether AI tools cite your site, and what you can do about it from day one.
AI tools don't work like Google. Google crawls your site, indexes your pages, and ranks you based on backlinks and content quality. AI tools pull from a mix of sources: indexed web content, structured data, third-party directories, and in some cases direct API crawls.
The overlap between what Google ranks and what AI cites is only partial. A 16-month BrightEdge study found that AI Overview citations overlapped with organic Google rankings for only 54% of results. [1] A separate Ahrefs analysis found that 80% of LLM citations don't rank in Google's top 100 for the original query. [2]
This cuts both ways. You don't need to dominate Google to get cited by AI. But you do need to give AI systems something to work with.
Research from Princeton University published at KDD 2024 studied 10,000 queries across 9 source types and identified factors that affect whether AI systems cite a page. [3] The most actionable findings:
Statistics and quotations matter. Pages that include data-backed statistics are cited by AI at measurably higher rates than pages without them. Original quotations increase citation probability further. [3]
Brand search volume is the strongest predictor. A 2025 analysis of 680 million citations found that brand search volume correlated with LLM citations more strongly than traditional backlinks. [3] If people are searching for your brand name, AI is more likely to know about you.
Being on multiple platforms multiplies your chances. Brands mentioned on 4 or more platforms are 2.8 times more likely to appear in ChatGPT responses. [3] Product directories, startup listings, GitHub, LinkedIn, Medium posts all count.
Different AI tools cite different sources. Only 11% of domains are cited by both ChatGPT and Perplexity. [3] What works for one doesn't automatically work for the other.
In March 2025, Microsoft's Principal Product Manager stated directly at SMX Munich that "schema markup helps Microsoft's LLMs understand your content." [4] This followed years of Google recommending structured data for AI Overviews. It's now confirmed across both major platforms.
Structured data is code you add to your HTML that tells machines what your content means. It uses a standard called JSON-LD, which Google recommends and which the AI ecosystem has aligned around.
An AccuraCast study analyzed over 2,000 prompts across ChatGPT, Google AI Overviews, and Perplexity and found that 81% of pages that received AI citations included schema markup. [5] BrightEdge analysis found pages with structured data are up to 40% more likely to appear in AI summary positions. [5]
The schema types that matter most for a new startup:
Organization schema tells AI who you are. Your business name, what you do, and links to your profiles on LinkedIn, GitHub, Crunchbase. This builds entity recognition across platforms.
FAQPage schema is the single highest-ROI schema type for AI citations. FAQPage-tagged content is 3.2 times more likely to appear in Google AI Overviews. [4] AI tools are designed to extract Q&A content, and structured FAQ markup makes that trivial.
Article schema on every blog post. Include author, publish date, and headline at minimum.
You can validate your schema with Google's free Rich Results Test at search.google.com/test/rich-results.
llms.txt is a plain text file you place at the root of your domain that describes your site to AI tools, similar to how robots.txt works for search crawlers. It was proposed by Jeremy Howard of Answer.AI in September 2024. [6]
Here's what you need to know before implementing it: as of early 2026, no major AI platform has officially confirmed they read llms.txt files. Server log analysis found that GPTBot, ClaudeBot, PerplexityBot, and Google's AI crawler made zero visits to the llms.txt file across a three-month observation period in 2025. [7]
That said, major companies including Anthropic, Cloudflare, and Stripe have implemented it. Google included it in their Agents to Agents (A2A) protocol. [6]
The honest case for implementing it: it takes under an hour, there's no downside if platforms eventually adopt the standard, and it signals to anyone checking your site that you're thinking about AI readiness. If you have limited time, structured data and third-party presence matter more right now.
The Surfer AI Tracker analyzed 36 million AI Overviews between March and August 2025 and found a consistent pattern: AI trusts institutional authority and community content. [8] Wikipedia, Reddit, YouTube, LinkedIn, and niche directories dominate citations across categories.
For a new startup, you can't get a Wikipedia page on day one. But you can:
Get listed on startup directories. Product Hunt, Indie Hackers, BetaList, AlternativeTo, There's An AI For That, G2. Each listing is a named reference to your brand from a domain that AI systems already cite regularly.
Publish on platforms AI already cites. Medium and Substack both appear in AI citation data. [9] Posts there link back to your domain and create a consistent trail of information about your product.
Answer questions in your niche on Reddit. Reddit is among the most cited sources across ChatGPT, Perplexity, and Google AI Overviews. [8] A genuine, helpful answer that mentions what you're building creates a citation-worthy reference.
Keep your information consistent across platforms. If your pricing is $49/month on your site, it should be $49/month everywhere. Inconsistency is one of the main reasons AI tools get product descriptions wrong.
If you're launching something and want AI tools to find you accurately, here's what to do from day one:
None of these are technically hard. Most founders skip them because they're focused on building. The result is a product that exists but that AI doesn't know about.
Does Google ranking affect whether AI cites me?
For Google AI Overviews, ranking helps. For ChatGPT and Perplexity, much less so. Ahrefs found that ChatGPT primarily cites pages ranking at position 21 or lower about 90% of the time. [2] You don't need to be on page one.
How long does it take to show up in AI responses?
It varies. Perplexity crawls fresh content aggressively. ChatGPT tends to cite older, more established content. Google AI Overviews correlate more with traditional ranking signals, which take months to build.
Does llms.txt actually work right now?
No major AI platform has confirmed they use it, and server logs show AI crawlers aren't fetching the file. It's a reasonable future-proofing step but not a current traffic driver.
Does schema markup affect my Google search ranking?
No. Google's John Mueller confirmed in 2025 that structured data doesn't directly influence rankings. [10] It affects rich snippet display and AI citation probability.
If you'd rather have all of this set up automatically against your existing site, Bonai generates your AEO infrastructure (Organization schema, FAQPage schema, llms.txt) and your initial directory listing content from your URL, then runs a weekly audit across ChatGPT, Perplexity, and Gemini to catch what AI gets wrong and correct it with calibrated content.