AI’s Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source

yoasif@fedia.io · 26 days ago

AI’s Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source

illusionist@lemmy.zip · 26 days ago

I may not be up to date

What damage to open source did the big tech ai companies do?
how do they take advantage of us?

ooterness@lemmy.world · 26 days ago

A lot of open-source software uses copyleft licenses like GPL. If a company uses that code to build its own products, then some or all of their new code may also become open source. This is an important part of how open-source projects stay open. Organizations like FSF have taken big companies to court over this and won.

AI companies trained their slop-generators on that open-source code. In many cases, it will reproduce it line-for-line. But courts currently hold that the generated code is no longer subject to the original copyright restrictions. It’s nearly impossible to publish open-source software without being scraped for AI training.

N.E.P.T.R@lemmy.blahaj.zone · 26 days ago

Most “Open source” LLMs are really just open weights, which is useless without the training data. This dilutes the definition of OSS. There is no way to train the model as a normal person (aka not Google or Meta, etc)
LLM producers don’t credit the OSS they trained on, no attribution. Most models violate the licenses of all their training data (eg. GPL).
LLM scraper bots put high stress on server infrastructure, creating a DDOS attack.