TL;DR: The big tech AI company LLMs have gobbled up all of our data, but the damage they have done to open source and free culture communities are particularly insidious. By taking advantage of those who share freely, they destroy the bargain that made free software spread like wildfire.
I may not be up to date
- What damage to open source did the big tech ai companies do?
- how do they take advantage of us?
- Most “Open source” LLMs are really just open weights, which is useless without the training data. This dilutes the definition of OSS. There is no way to train the model as a normal person (aka not Google or Meta, etc)
- LLM producers don’t credit the OSS they trained on, no attribution. Most models violate the licenses of all their training data (eg. GPL).
- LLM scraper bots put high stress on server infrastructure, creating a DDOS attack.
A lot of open-source software uses copyleft licenses like GPL. If a company uses that code to build its own products, then some or all of their new code may also become open source. This is an important part of how open-source projects stay open. Organizations like FSF have taken big companies to court over this and won.
AI companies trained their slop-generators on that open-source code. In many cases, it will reproduce it line-for-line. But courts currently hold that the generated code is no longer subject to the original copyright restrictions. It’s nearly impossible to publish open-source software without being scraped for AI training.
Ok, but Cloud you Elaborate how the scraping destroys that bargain?
It transforms the contribution to no longer be “share alike”.
Several explanations exist. First ShareAlike or GPL require re-sharing of downstream content. But if things are used for training, and later the model produces new code, that isn’t implicated by copyright legislation… And we could discuss whether that’s ethical or moral, of course, and there are various opinions…
I feel like this is similar to the weakness of BSD licenses, Public Domain releases, and CC BY licensing. Someone can come along and take all of your work, polish it a little better, and sell their new service. Then you don’t get recognition or support, they get some contracts from their friends’ companies for a few years, and you feel sad.
The other commonly-remarked angle is the death of the Web. Because so many websites are script-generated copy/paste, and they are tweaked to fit SEO, and Google doesn’t give a fuck, it’s hard for real website authors to get seen, and without any visibility, their work ends up being personal or pointless. This isn’t limited to open source, but it’s closely connected.
Hmm. I think the main damage is done by other factors. I mean even before AI, everything turned into subscriptions and services. We use Office365 these days and the documents are in the cloud. There isn’t much need for Free Software Office Suites or mail clients anymore. Operating systems have less impact because honestly only old people use computers. Everyone else does their stuff on a phone. And then we finally crossed the barrier into a post-privacy world and people don’t care. And on top of that large companies take the nice database projects, libraries etc and monetize products with that. Without caring too much if that’s sustainable. And AI is one negative factor amongst many.
only old people use computers.
I feel assaulted.
m365 has done so much damage to computing. Same deal with google’s equivalent.




