hhmx.de

Timnit Gebru (she/her)

Föderation EN Mi 29.01.2025 06:00:40

Friends, for something to be open source, we need to see

1. The data it was trained and evaluated on

2. The code

3. The model architecture

4. The model weights.

DeepSeek only gives 3, 4. And I'll see the day that anyone gives us #1 without being forced to do so, because all of them are stealing data.

PhDog

Föderation EN Mi 29.01.2025 06:05:06

They didn't publish the code? Lame.

And sus.

@timnitGebru

nonlinear

Föderation EN Mi 29.01.2025 06:06:33

@timnitGebru Openwashing is everywhere. If we don’t protect our concepts, they’ll be taken away from us for profit.

snowyfox

Föderation EN Mi 29.01.2025 06:07:14

@timnitGebru W-Why would they call it open source if 2. isn't met...

Ben Royce 🇺🇦

Föderation EN Mi 29.01.2025 07:18:38

@snowyfox @timnitGebru

people make claims

the rebuttal is muted and late

the crowd has moved on believing the claim

in this way lies permeate everything

the grifters and con artists have figured this out. the rubes and marks don't know, don't care

and the rest of us can only watch the shit show

Steve Holden

Föderation EN Mi 29.01.2025 21:52:52

@benroyce @snowyfox @timnitGebru this isn’t news. How old is the saying “a lie can be halfway round the world before the truth has got its boots on”?

Nowadays: before you can pick up your ‘phone!

LisPi

Föderation · Mi 29.01.2025 08:36:06

@snowyfox @timnitGebru As a joke. In very poor taste.

Le Néandertal sous benzo

Föderation EN Mi 29.01.2025 08:59:36

@snowyfox @timnitGebru I thought the optimisation algorithms for training any network architecture were well known and widely available? Is there a possibility that deepseek made an innovation on this point?

Andreas K

Föderation EN Mi 29.01.2025 14:14:43

@HydrePrever @snowyfox @timnitGebru
Yes and no.

Yes, algorithms to train are known, and actually, you can study them at uni.

No, how to train these beasts exactly, and how to exactly design them, how to tweak them, how to choose the hyperparameters, all these things are what produce hundreds of papers in the field per day.

Le Néandertal sous benzo

Föderation EN Mi 29.01.2025 14:34:08

@yacc143 @snowyfox @timnitGebru so that part might hide some crucial information... Thank you

Andreas K

Föderation EN Mi 29.01.2025 14:56:18

@HydrePrever @snowyfox @timnitGebru
Not might, it does.

Basically the stuff is not reproducible without all 4 parts.

(And reproducibility is as such a topic in these algorithms, you literally start with random start points in your typical setup. In teaching setups, requiring a fixed random seed to make the number somewhat reproducible is the usual way, and still it's not always possible -> different software versions might produce different numerical results.)

Le Néandertal sous benzo

Föderation EN Mi 29.01.2025 18:12:46

@yacc143 @snowyfox @timnitGebru mmh, RNGs *shouldn't* have this problem, although I agree it can unfortunately happen

Steve Holden

Föderation EN Mi 29.01.2025 21:56:25

@yacc143 @HydrePrever @snowyfox @timnitGebru let’s not forget this was all funded by a hedge fund mogul: the huge (and predictable) stock movements following the announcement doubtless yielded huge profits.

The fragility of Western economies continue to expose itself to the light of day.

Ölbaum

Föderation EN Mi 29.01.2025 13:12:01

@snowyfox @timnitGebru Your regular reminder that the other one is called *Open*AI!

Cassandrich

Föderation EN Mi 29.01.2025 14:40:43

@oscherler @snowyfox @timnitGebru Open[A-Z]{2} is a classic shit naming pattern for stuff that's anything but open.

Michel Lind :fedora: :debian:

Föderation EN Mi 29.01.2025 06:21:04

@timnitGebru also, the model is definitely not open source.

Brett Kosinski

Föderation EN Mi 29.01.2025 06:31:29

@timnitGebru It's truly amazing that so many in the tech community have come to interpret "I can download this opaque binary blob and run it myself by cannot modify it" as "open source", when that used to quite literally be the definition of proprietary, closed code.

At some point it seems "open source" just became code for "downloadable for free". I have no idea when or how that happened but it's a damn shame. Personally I blame an entire generation of technologists who grew up with SaaS.

Osma A 🇫🇮🇺🇦

Föderation EN Mi 29.01.2025 06:51:01

Oh, I can tell exactly when that happened. Llama release.
@brettk @timnitGebru

IDe

Föderation EN Mi 29.01.2025 10:35:29

@brettk @timnitGebru
I wish there was a term for permissively-licensed-binary-blobs that could be popularised instead.

Florens Verschelde

Föderation EN Mi 29.01.2025 11:35:12

@ide @brettk @timnitGebru Shareware. (Apologies if I’m just explaining the joke.)

Grey the earthling

Föderation EN Mi 29.01.2025 13:07:59

@ide We used to call it “shareware”. (Notice how your spell-checker remembers the word.)

You're allowed to redistribute the binary, and use it (perhaps with some restrictions until you register a licence and maybe pay), but you have no permission to see or modify the source code.

#💾

Bredroll

Föderation EN Mi 29.01.2025 14:12:20

@greytheearthling @ide sounds more like freeware

Alex@rtnVFRmedia Suffolk UK

Föderation EN Mi 29.01.2025 13:57:45

@ide @brettk @timnitGebru isn't that just freeware? I've encountered plenty of Windows binaries over the years that are free but where the developers refuse to release the source code..

Tom "spot" Callaway :fedora:

Föderation EN Mi 29.01.2025 12:35:06

@brettk @timnitGebru I'm fighting a near-endless fight to ensure that my employer never calls those things "open source". Sometimes it slips past me, but I catch a lot of them.

SlightlyCyberpunk

Föderation EN Mi 29.01.2025 14:00:01

@brettk @timnitGebru I feel like that was kinda the whole point of Open Source being created in the first place -- in opposition to the Free Software movement. Open Source was always about being pro-corporate and capitalist and at this point the term means whatever tho billionaires want it to mean! The ability for you and I to see the source was never the point to those people, it was only a byproduct.

aspragg

Föderation EN Mi 29.01.2025 21:32:28

@admin @brettk @timnitGebru I don't think that's entirely fair. While "Open Source" was meant to be more approachable than "Free Software" to corporate interests, I think that's more accurately due to a genuine (if naïve?) belief that corporate involvement in F/OSS could provide contributions that would benefit everyone - FSF hackers and corporations alike. And the founders of the OSI certainly weren't billionaires. See OSI history for more info, esp. "Further Reading"

opensource.org/history

SlightlyCyberpunk

Föderation EN Mi 29.01.2025 22:01:52

@aspragg @brettk @timnitGebru I never said the founders were billionaires; I said the billionaires have significant influence and benefits from the movement in our present day.

Because how exactly did those founders think that making it more friendly to corporations would help? By getting them more money to develop the code! That's my whole point, they sacrificed the principles of free, democratic, user-controlled software for the promise of cold, hard cash. And now here we are a few decades later seeing that all the remaining principles have *also* been sacrificed in exchange for that cash. Because that is the logical outcome when you build a movement around prioritizing commercial success.

And that was *absolutely* the reason. Look at essays and interviews from people like ESR. I've seen multiple interviews where he starts literally screaming that it's not just about sharing the code, that they aren't "communists" who just want to give things away for free, and insisting that he supports the capitalists. Maybe not *everyone* involved in the movement was doing it to benefit corporate interests but a lot of people who were very influential were extremely clear about that being their motivation.

econads

Föderation EN Mi 29.01.2025 18:24:38

@brettk @timnitGebru
The free as in freedom and not as in beer is a pretty old trope, so apparently it's always been a problem.

Brett Kosinski

Föderation EN Mi 29.01.2025 21:28:12

@econads @timnitGebru That particular trope centered around copyleft (e.g. GPL) vs other OSS licenses, both more or less permissive (MIT, BSD, Apache, etc).

But prior to the AI bubble no one would've suggested "opaque binary blob" as anything other than closed source/proprietary.

Eric Schultz

Föderation EN Mi 29.01.2025 07:23:35

@timnitGebru yes! of course, the rights, which none of them give.

David G. Gándara

Föderation EN Mi 29.01.2025 07:33:05

@timnitGebru What about the euporean ALIA? I couldn't figure out if it was wholly trained with licensed stuff. Some people claim they did.

Bart Janssens 🇧🇪

Föderation EN Mi 29.01.2025 07:39:18

@timnitGebru Eugh, that’s disappointing, but it was to be expected of course. MSM completely misrepresented this again.

Ilkka Tengvall

Föderation EN Mi 29.01.2025 07:43:25

@timnitGebru IBM granite models are said to be open source. They even give cover customer in court for accusation data would be stolen. ibm.com/granite

Ingo Wichmann

Föderation DE Mi 29.01.2025 07:50:48

@timnitGebru the folks at OSI did a good job defining that just in time:
opensource.org/ai

(Medien: 1)

Nicolás Alvarez

Föderation EN Mi 29.01.2025 18:43:25

@ingo_wichmann @timnitGebru OSI's definition doesn't require publishing the actual training data, only giving information on how it was obtained. Describing exactly how you crawled the web for data you had no permission to use would fit OSI's "open" definition.

POSUTtRUmp

Föderation EN Mi 29.01.2025 08:09:23

@timnitGebru All Ai enter the octagon. Only one will emerge victorious.

Adam Wulf

Föderation EN Mi 29.01.2025 08:20:36

@timnitGebru I think the Pleias 1.0 model gets close? unless I've misunderstood what they've open sourced. huggingface.co/blog/Pclanglais

Another imaginary sandwich bar (not as funny)

Föderation EN Mi 29.01.2025 08:34:15

@adamwulf @timnitGebru

I doubt that the media understand the concept of OpenSource - i.e. the code is available for inspection and using. As has been said, they confuse it with free to download the app. ('Free as in beer').

bureauxx

Föderation EN Mi 29.01.2025 09:13:39

@timnitGebru based

@festal

i have never understood what people mean, when they talk about the "stolen data" of AI. every search engine crawls the net, the internet data base backs versions of webpages up, the idea of open source is based on re-using and modifying "data" of others, artists make collages from other images, the whole concept of "knowledge" ist based on the use of other knowledge... so, what exactly does Ai do differently?

Pixdigit

Föderation EN Mi 29.01.2025 09:29:30

@timnitGebru Nextcloud has an "ethical ai rating" which takes the availability of the training data into account. Admittedly there are very few getting the "green" rating but it's not nothing:
docs.nextcloud.com/server/late

netskaven

Föderation EN Mi 29.01.2025 10:02:13

@timnitGebru I don´t need open source stuff, we wanna FREE stuff ;)

Lien Rag

Föderation EN Mi 29.01.2025 10:19:36

@timnitGebru

Sky-T1 is supposed to fill all these criteria, from what I read...

Cai

Föderation EN Mi 29.01.2025 10:33:43

Ellyse

Föderation · Mi 29.01.2025 11:52:22

@timnitGebru@dair-community.social good point! i guess to call the trained model open source then you need the data, but you can consider the model itself open source with just the code (i think). still, i agree with the sentiment, that its not really open source either way you look at it, it's free software at best.

Efi (nap pet) 🦊💤

Föderation EN Mi 29.01.2025 12:08:57

@timnitGebru who called it open source without having the source code? wth?

Angeles

Föderation ES Mi 29.01.2025 12:22:52

@timnitGebru For me, the only valuable thing about Deepseek is that it does not require an infinitely growing amount of resources, reducing enviromental cost. Every other ethical issue is still there, so I still have 0 interest on using it.
Also, watching a lot of unethical people panic always improves my day.

Manish

Föderation EN Mi 29.01.2025 12:58:46

@timnitGebru Most people in various tech communities on the internet don't even know that they can't view the source code of 'opensurce' Deepseek. Makes me wonder if anyone is even reviewing the code of actual opensource projects nowadays to verify their claims.

Woke Leftist Trash

Föderation EN Mi 29.01.2025 14:12:52

@timnitGebru “Please give us the evidence of your crimes”
🤣
“Can’t, it’s uh… proprietary.”

Ramin Honary

Föderation EN Mi 29.01.2025 14:14:20

Friends, for something to be open source, we need to see 1. the data it was trained and evaluated on, 2. the code, 3. the model architeture, 4. the model weights. DeepSeek only gives [the model architecture and the model weights].

@timnitGebru yes, this was very disappointing. I was hoping to try my hand at training a model, but it turns out I have to code my own by reading their paper and recreating the software from their model architecture by myself.

I hope someone more skilled at AI than me does it first and releases their code as actual free/libre open source software.

#tech #software #AI #DeepSeek #LLM #MachineLearning #FLOSS #FreeSoftware

jeremiah

Föderation EN Mi 29.01.2025 15:28:48

@timnitGebru In terms of knowing the costs of a model, would we also want to know the hardware used?

@BjornW@mastodon.social

Föderation EN Mi 29.01.2025 15:55:12

@timnitGebru thank you for sharing.

Quick question have you seen this initiative "European Open-Source AI index"?

osai-index.eu

By @dingemansemark & @andreasliesenfeld from @Radboud_uni

Looks good to me, to help people determine how open an AI model actually is. Are you aware of other initiatives like this?

I'd like to gather these initiatives and share it with @publicspaces so more people learn what to look for to determine if an AI is truly opensource.

Mark Dingemanse

Föderation EN Mi 29.01.2025 17:58:14

@BjornW @timnitGebru 🤫 we're still working out some wrinkles but soft-launching this soon; it is the successor to our better known opening-up-chatgpt.github.io

Philip Kaludercic

Föderation EN Mi 29.01.2025 17:17:26

@timnitGebru What about the reasonable means to reproduce the results?

lordjeff

Föderation EN Mi 29.01.2025 18:05:31

@timnitGebru i believe china doesn't recognise copyright of non-chinese sources

sn 🐦‍⬛

Föderation DE Do 30.01.2025 00:01:36

@timnitGebru have you seen TEUken? It's not SOTA but improving … and they published their training data