hhmx.de

Matasoft

· Föderation HR Do 23.01.2025 22:28:05

@clive @jasonkoebler @404mediaco I have difficulty to understand why someone wants to protect own web site from scraping. Isn't primarily reason of having a web site desire to be shown to world and particular information spread? I don't understand the fuss.

Frank Heijkamp

Föderation NL Fr 24.01.2025 00:03:23

@matasoft @clive @jasonkoebler @404mediaco If all LLM platforms always provide direct links it would indeed bring people to your site. But the fact is that most LLM's just steal your content without giving any credit. That's what this is for.

tellyworth

Föderation EN Fr 24.01.2025 00:37:20

@matasoft @clive @jasonkoebler @404mediaco there are masses of AI crawler bots that can easily overwhelm a web site with traffic. They don’t throttle or follow limits or respect robots.txt or other conventions. They’ll easily overwhelm available bandwidth and take down a small web site.

It’s 100% reasonable (and necessary) to consider them hostile.

kobajo

Föderation EN Fr 24.01.2025 07:49:36

@matasoft disrespectful scraping for LLM removes attribution, is non-consentual, has no methods for rectification (misinfo), mixes your data with others' (sometime criminally sourced) and often is implemented so badly that it causes load/stability/cost issues to the sites. And this is not "one shot", but happens all the time. No one wants this except the grifters.

econads

Föderation HR Fr 24.01.2025 08:55:38

@matasoft
Guess people want the world to know it's their work. Also there was a post from an admin here complaining that AI scrapers aren't as smart as normal e.g. search engine scrapers and come back every 10 minutes to scrape the exact same data, costing his small instance ridiculous money. And stuff like this here, no I don't need my random musings distributed elsewhere.

Assuming you were asking in good faith.

@clive @jasonkoebler @404mediaco

Clive Thompson

Föderation HR Fr 24.01.2025 17:47:37

@econads @matasoft @jasonkoebler @404mediaco

Yep

and some of the objection to mass-scraping-by-AI-firms is that the AI firms are not helping to provide new audiences for one's online writing

quite the opposite, possibly ...

... given that as more people start "chatting with an LLM AI" instead of searching the web ...

... they become happy with good-enough answer they get from the LLM, and never bother to go to the sites the answers are based upon

Matasoft

Föderation HR Fr 24.01.2025 17:58:39

@clive @econads @jasonkoebler @404mediaco well, Perplexity AI, for example, provides citations links by default.
So, I am not sure that it is wise for website owner to block it. More and more LLMs will replace classical search engines.
On other hand, if you have a secret to hide, why then publishing it on web site in first place?