Cloudflare and AI Changed the Economics of the Internet

All of the artificial intelligence tools are incredible at searching, retrieving, and synthesizing information that they can tap into for any task you want to accomplish. The two key assumptions in this workflow are that the AI tools have the access to the information and that the information is valuable. If search engines have had success scraping information from content providers for decades, then surely AI tools will be okay, right? Wrong. Last week, one of the largest IT infrastructure companies, Cloudflare, announced the following:
“…it is now the first Internet infrastructure provider to block AI crawlers accessing content without permission or compensation, by default. Starting today, website owners can choose if they want AI crawlers to access their content, and decide how AI companies can use it. AI companies can also now clearly state their purpose โ if their crawlers are used for training, inference, or search โ to help website owners decide which crawlers to allow. Cloudflare’s new default setting is the first step toward a more sustainable future for both content creators and AI innovators.”
While search engines (e.g., Google, Bing) would refer users to websites about once for every ten times it scraped a website, the new AI tools (e.g., ChatGPT, Gemini, Anthropic) scrape websites hundreds or thousands of times for every user they refer to your website. This means that they steal content on the internet and rarely encourage users to visit those websites. Without users visiting those websites, the content providers lose out on new subscribers, consumers, and/or advertising income. Cloudflare made a bold statement that they will protect the content providers, like myself, by blocking access in this new AI scraping world.
This new economic agreement for the internet will bifurcate the AI tools* into two distinct archetypes:
- AI tools that pay for access to valuable information
- AI tools that own valuable information, which means that the organization already owns the data required for the AI tool to be useful
I keep referencing valuable information. Before I get into the archetypes, let’s quickly define what can be considered valuable information. This refers to information that contributes to an organization’s ability to make informed decisions, optimize operations, drive innovation, and achieve strategic goals. This information must meet high standards of data quality, which means that it is:
- Accurate โ Free from errors, reflecting the real-world conditions it represents.
- Complete โ Contains all required fields and values necessary for analysis and decision-making.
- Timely (Recent) โ Up to date, reflecting the most current and relevant information available.
- Consistent โ Uniform across systems, with standardized formats and definitions.
- Trusted โ Collected, stored, and managed with integrity and traceability, ensuring it comes from reliable sources.
- Accessible โ Easily retrievable and usable by the right people at the right time.
- Relevant โ Directly aligned with organizational goals, KPIs, and strategic initiatives.
While many of the AI tools have already scraped the internet, blocking AI scrapers will negatively impact most (or all?) of the above data quality standards. One could easily imagine that the timeliness of the information will rapidly deteriorate if content providers start blocking access to AI scrapers. For example, a farmer cares more about their crops’ health from the past few weeks than what occurred several years ago. The farmer would be willing to pay significantly more for data on their crops’ health that is timely and high quality than data that is outdated.
From an investing perspective, archetype #2 is far more interesting than archetype #1 because the data owners have the leverage while the AI models (and possibly tools?) are becoming commoditized in this value chain. The data owner could easily take their valuable information and move it to another AI tool and/or build their own AI tool on top of this information.
Spend time talking with the average investment analyst and you will quickly grasp that most of them have yet to understand this relationship in the AI tool value chain. The market has not priced in the value of organizations that own proprietary, valuable data where the new, commoditized AI models will accelerate the extraction of insights and reduce the payback period.
~ The Data Generalist
Data Science Career Advisor
Disclaimers: Long \$PL, \$EKG. This article is not a recommendation to buy or sell a particular investment as noted in the website disclosure.
Note: Image generated by ChatGPT.