Big Data

Platform lets creators monetize their content for use in LLM training – Computerworld



Wong added that Avail’s Corpus tool “flies in the face” of recent comments made by Mustafa Suleyman, the CEO of Microsoft AI, in an interview at the recent Aspen Ideas Festival. “While attempting to define what kind of content is protected by publishers, he proceeded to say: ‘With respect to content already on the open web, the social contract of that content since the 1990s has been that it is fair use. Anyone can copy it, recreate it, or reproduce it. That has been freeware, if you like; that’s been the understanding.’”

Had the internet had a tool like Corpus available in the 1990s, said Wong, “I am sure content creators would have been properly acknowledged and compensated for their content. Today, the jury is still assessing whether copyright data for LLM training should fall under ‘fair use,’ but accessing data in real-time should be recognized as of value to both users and vendors, and this content should not be considered freeware.”

Today, he said, the US copyright office has not prevented “LLM vendors from using copyrighted data to train their models. The vendors typically state that the use of the copyrighted data falls under the legal concept of ‘fair use,’ which allows people/companies to use limited portions of the work for non-commercial, educational, or transformative uses.”



READ SOURCE

This website uses cookies. By continuing to use this site, you accept our use of cookies.