Föderation EN Mo 18.12.2023 14:09:07
How bad are the thousands of new stochastically-generated websites?
Last night I wanted to roast some hazelnuts, and I could not remember the temperature I used last time. So I searched on DuckDuckGo. Every website that I could find was machine-generated with different temps listed. One site had three separate methods listed that were essentially differently worded versions of the same thing. With different temperatures.
So I pulled my copy of Rodale’s Basic Natural Foods Cookbook off the shelf and looked it up there.
I think it may be time to download an archive copy of the 2022 Wikipedia before we lose all of our reference material. It was nice having all the world’s knowledge at my fingertips for a couple of decades, but that time seems to be past.
[ Edit: Since others have mentioned the possibility, I should mention that some of these sites may have been SEO-generated/altered and not generated by an LLM. However, even if that is the case, the fact that the sites are as bad as and indistinguishable from LLM-generated sites means to me they are just as bad and just as likely to be have only a loose resemblance to reality. There are many ways to be a fancy stochastic parrot. ]
Föderation EN Di 19.12.2023 07:19:49
@wikipedia do you currently detect if some of your pages include AI generated data? I share the fear of @bhawthorne (see above) that true data becomes more and more diluted in an ocean of hallucinated AI text.
Föderation EN Do 21.12.2023 03:52:10
@lowrankjack @bhawthorne @ploum I'm not aware of any automated checking, part of the problem is that most models were trained on Wikipedia data, so they spit out text that looks like our articles, confusing supposed detectors.
However editors are tracking this, e.g. https://en.wikipedia.org/wiki/Category:Articles_containing_suspected_AI-generated_texts on the English Wikipedia .