OpenAI and Reddit have inked a partnership agreement enabling OpenAI to leverage Reddit’s data for training its AI models.
In a recent blog post on OpenAI’s press relations site, the company highlighted that this collaboration with Reddit would grant them access to real-time, structured, and unique content such as posts and replies from Reddit. This access will empower OpenAI’s tools and models to better comprehend and showcase Reddit’s content.
The partnership aims to integrate Reddit’s content into ChatGPT, OpenAI’s renowned conversational AI, and develop new AI-powered features for Reddit users and moderators, although the specifics of these features were not disclosed.
Additionally, as part of the agreement, OpenAI will also serve as a Reddit advertising partner.
The partnership underscores Reddit’s commitment to leverage OpenAI’s platform of AI models to enhance user experiences on its platform. This collaboration is in line with OpenAI’s strategy of forging similar licensing deals with various content providers, albeit with a unique twist in this case due to Sam Altman, OpenAI’s CEO, holding a significant stake in Reddit and having previously served on its board of directors.
In an attempt to discourage scrutiny, OpenAI says in its press release that, while Altman remains a Reddit shareholder, the partnership โwas led by OpenAIโs COO [Brad Lightcap]โ and โapproved by [OpenAIโs] independent board of directors.โ (Iโll note here that Altman is a member of OpenAIโs board; he recused himself for this decision, however, an OpenAI spokesperson tells TechCrunch.)
Reddit has made data licensing agreements an increasingly central part of its growth strategy as it navigates the market as a public company.
In its IPO prospectus, Reddit revealed that it has contractual agreements to license its data to customers including Google worth a combined over $200 million. And, in its first earnings report as a public company, Reddit reported a 450% year-over-year increase in non-ad revenue, attributable mainly to those agreements.
Reddit stock was up 11% in extended trading following the announcement of the OpenAI deal.
โThe paradox I see is that, as more content on the internet is written by machines, thereโs an increasing premium on content that comes from real people,โ Reddit CEO Steve Huffman said during the companyโs earnings call in March. โAnd we have nearly two decades of authentic conversation.โ
Redditโs platform โ which has over 1 billion posts and more than 16 billion comments, figures that grow every day thanks to its hundreds of millions of active users โ is a gold mine for generative AI companies, whose models learn from examples of content, like text and images, to generate new, similar content.
But the company could face pushback from users concerned about how itโs monetizing their data.
Itโs instructive to look at Stack Overflow, the Q&A forum for software developers, which recently inked an agreement with OpenAI to supply data for the latterโs model training. In protest, some users deleted their top-rated answers to questions on the community. But Stack Overflow restored the deleted posts and banned those users, claiming that they werenโt in compliance with its terms of service.
Reddit has already voiced its displeasure with one attempt to afford Reddit users greater control over their own data.
Vana, a startup built on the blockchain, is attempting to launch a data โDAOโ (Digital Autonomous Organization) to let Reddit users pool their data and let them decide together how that combined dataโs used (or sold). Reddit banned Vanaโs subreddit dedicated to discussion about the DAO, in a statement to TechCrunch, and accused the company of โexploitingโ its data export controls.