· Föderation EN Mi 29.01.2025 08:59:36 @snowyfox @timnitGebru I thought the optimisation algorithms for training any network architecture were well known and widely available? Is there a possibility that deepseek made an innovation on this point? |
Föderation EN Mi 29.01.2025 14:14:43 @HydrePrever @snowyfox @timnitGebru Yes, algorithms to train are known, and actually, you can study them at uni. No, how to train these beasts exactly, and how to exactly design them, how to tweak them, how to choose the hyperparameters, all these things are what produce hundreds of papers in the field per day. |
Föderation EN Mi 29.01.2025 14:34:08 @yacc143 @snowyfox @timnitGebru so that part might hide some crucial information... Thank you |
Föderation EN Mi 29.01.2025 14:56:18 @HydrePrever @snowyfox @timnitGebru Basically the stuff is not reproducible without all 4 parts. (And reproducibility is as such a topic in these algorithms, you literally start with random start points in your typical setup. In teaching setups, requiring a fixed random seed to make the number somewhat reproducible is the usual way, and still it's not always possible -> different software versions might produce different numerical results.) |
Föderation EN Mi 29.01.2025 18:12:46 @yacc143 @snowyfox @timnitGebru mmh, RNGs *shouldn't* have this problem, although I agree it can unfortunately happen |
Föderation EN Mi 29.01.2025 21:56:25 @yacc143 @HydrePrever @snowyfox @timnitGebru let’s not forget this was all funded by a hedge fund mogul: the huge (and predictable) stock movements following the announcement doubtless yielded huge profits. The fragility of Western economies continue to expose itself to the light of day. |