Hhmx.de * hhmx.de

· Föderation EN Do 23.01.2025 23:28:17

Does anyone know, if these web-crawlers also scrape the content of kindle e-books? Can they enter these kind of products?
Is anything safe from this kind of scraping?
How can we protect internet content against it in general? A standard website will be scraped easily, or?

@jasonkoebler @404mediaco

0x 2 0x

Frank Heijkamp

Frank Heijkamp
@alterelefant@mastodontech.de

Föderation NL Fr 24.01.2025 00:06:41

@piperef @clive @jasonkoebler @404mediaco Bezos probably already sold everything to those AI houses.

0x 0 0x

@StarkRG@myside-yourside.net

Föderation EN Fr 24.01.2025 00:23:59

@piperef @clive @jasonkoebler @404mediaco If it can be read, it can be scraped. You can mitigate the issue (often by putting it behind an account wall), but not eliminate it entirely. The film industry has been desperately trying to stop piracy and I have yet to see a situation where a movie was released but wasn't available on piracy sites.

But also, yeah, if it's kindle, it's probably already part of Amazon's AI dataset.

0x 1 0x

Clive Thompson

Clive Thompson
@clive@saturation.social

Föderation EN Fr 24.01.2025 00:37:06

@StarkRG @piperef @jasonkoebler @404mediaco

yeah, I am sure this is true

I read recently, though I can't find the source (still looking, will update if I can find it) that US AI firms used corpuses of cracked western ebooks that circulate in Russia etc, for training

0x 1 0x

piperef
@piperef@mastodon.social

Föderation EN Fr 24.01.2025 09:57:05

@clive

This is so ironic. Definitely plausible, but also scary.

@StarkRG @jasonkoebler @404mediaco

0x 1 0x

Clive Thompson

Clive Thompson
@clive@saturation.social

Föderation EN Fr 24.01.2025 17:39:36

@piperef @StarkRG @jasonkoebler @404mediaco

yeah, alas

0x 0 0x