For its part, Perplexity said in an updated FAQ that its web crawler, PerplexityBot, will not index the full or partial text content of any site that disallows it using robots.txt code. Robots.txt files are common simple text files stored on a web server to instruct web crawlers about which pages or sections of a website they are allowed to crawl and index.
“PerplexityBot only crawls content in compliance with robots.txt,” the FAQ explained. Perplexity also said it does not build “foundation models,” (also known as large language models), “so your content will not be used for AI model pre-training.”
The bottom line, Yamin said, is that search engines are in a “tricky position” as genAI evolves. “They want to provide the best results to users, which increasingly involves AI-generated or AI-enhanced content. At the same time, they need to protect original creators and maintain the integrity of search results. We’re seeing efforts to strike this balance, but it’s a complex issue that will take time to fully address.”