Föderation EN Do 23.01.2025 19:06:27 A hacker developed an "infinite maze" to trap web-crawlers/scrapers from AI companies basically, if the server code detects that a web crawler from an AI firm is trying to scrape the site ... ... the code begins spinning up an infinite, nesting warren of new sham pages, filled with random text so the crawler gets stuck crawling and scraping endless and meaningless pages fun @jasonkoebler piece at @404mediaco https://www.404media.co/email/7a39d947-4a4a-42bc-bbcf-3379f112c999/?ref=daily-stories-newsletter |
Föderation EN Do 23.01.2025 19:08:29 @clive what a waste from both sides |
Föderation EN Do 23.01.2025 19:10:11 yep, I think that's basically the point of it |
Föderation EN Fr 24.01.2025 08:38:14 @clive @gagliardi_vale |
Föderation EN Fr 24.01.2025 17:48:48 yes indeed |
Föderation EN Fr 24.01.2025 00:49:24 @gagliardi_vale @clive this is what we're doing, instead of scrambling to salvage our odds of surviving this century as a species |
Föderation EN Fr 24.01.2025 07:19:10 |
Föderation EN Do 23.01.2025 19:16:02 @clive @jasonkoebler @404mediaco |
Föderation EN Do 23.01.2025 19:19:40 @clive @jasonkoebler @404mediaco Was just on a thread a week or so ago about what to do with aggressive AI web scrapers that won't self-limit or respect robots.txt. This is evolution in action. Nature is healing. |
Föderation EN Do 23.01.2025 21:55:27 @tbortels @jasonkoebler @404mediaco it's pretty wild |
Föderation EN Do 23.01.2025 22:47:01 @clive @tbortels @jasonkoebler @404mediaco It's a practical application of "GIGO". Ahh, there's a place for everything—and GIGO has finally found its place! |
Föderation EN Do 23.01.2025 23:47:20 @tbortels Is there even such thing as "non-aggressive AI web scrapers" that will self-limit and respect robots\.txt? At least google's and micro$hit's ignore robots\.txt. It downloaded photos from my gallery, up to 6000 requests a day, more than once… I bet not even 10 of them are legit users. I've only 38 photos… stupid bots download the same photos over and over again… I've blocked 4 IP ranges. It probably includes indexation bots' IP but I don't give an F. |
Föderation EN Do 23.01.2025 23:50:25 @devnull @clive @jasonkoebler @404mediaco I felt obligated to disclaim my fantasy well-behaved AI scrapers just in case. The actual headcount there may well be zero. |
Föderation EN Fr 24.01.2025 08:44:49 @tbortels @devnull @clive @jasonkoebler @404mediaco |
Föderation EN Fr 24.01.2025 09:23:41 @bornach @devnull @clive @jasonkoebler @404mediaco Alas - those scrapers are out of scope because they're not the ones causing problems and driving this conversation. Indeed - if someone licensed content legitimately, the need to scrape the web would be absent - there are far more efficient ways to say "here are all of the new posts in the last N hours". You can safely assume any automation ignoring your robots.txt is a pest to be ruthlessly crushed in whatever manner amuses you most. |
Föderation EN Fr 24.01.2025 17:43:35 @tbortels @bornach @devnull @jasonkoebler @404mediaco yep -- licensing would obviate the hassles of scraping "here's our API, enjoy" |
Föderation EN Fr 24.01.2025 17:42:38 |
Föderation EN Fr 24.01.2025 17:42:23 @devnull @tbortels @jasonkoebler @404mediaco bleah, what a mess! |
Föderation EN Do 23.01.2025 19:59:38 @404mediaco @clive @jasonkoebler Love this! I built a simple #WordPress plugin that garbles your web content to serve them up garbage: |
Föderation EN Do 23.01.2025 21:55:49 @KevinFreitas @404mediaco @jasonkoebler oh damn that is cool |
Föderation EN Do 23.01.2025 20:09:30 @clive if LLMs are going to be half as good as they're promising they should be already, then millions of websites will serve endless LLM-generated content like that. Creating a really expensive infinite loop. Way to spend 500B $. |
Föderation EN Do 23.01.2025 21:56:51 truly |
Föderation EN Fr 24.01.2025 08:52:46 @koos @clive |
Föderation EN Fr 24.01.2025 17:48:36 |
Föderation EN Do 23.01.2025 20:19:51 @clive @jasonkoebler @404mediaco I love that it's called Nepenthes. One of the coolest plant genera! |
Föderation EN Do 23.01.2025 21:01:26 @lcwheeler @clive @jasonkoebler @404mediaco https://starwars.fandom.com/wiki/Nepenth%C3%A9 - Nepenthé is programming fluid for robots |
Föderation EN Do 23.01.2025 21:08:23 @risottobias @clive @jasonkoebler @404mediaco In the article it says "The program, called Nepenthes after the genus of carnivorous pitcher plants which trap and consume their prey, can be deployed by webpage owners to protect their own content from being scraped or can be deployed “offensively” as a honeypot trap to waste AI companies’ resources." It's a fitting name. |
Föderation EN Do 23.01.2025 21:57:26 |
Föderation EN Do 23.01.2025 21:57:14 @risottobias @lcwheeler @jasonkoebler @404mediaco did not know! |
Föderation EN Fr 24.01.2025 00:43:08 @risottobias I'm not much of a Starwars fan, and was unaware of that reference when I named it. |
Föderation EN Do 23.01.2025 21:57:01 @lcwheeler @jasonkoebler @404mediaco yessss |
Föderation EN Do 23.01.2025 20:23:31 @clive @jasonkoebler @404mediaco I've seen stories about people hosting sites that got hit by robots and they had to pay a bunch of money in data costs. I wonder how this works, if it can help in that regard when the whole point is to keep them pointed at your site. I'm all for wasting their time, i just wonder how much it costs. |
Föderation EN Do 23.01.2025 20:52:59 @RnDanger @clive @jasonkoebler @404mediaco yeah, you’d have to host this on a service that doesn’t charge by network traffic |
Föderation · Do 23.01.2025 20:58:44 @RnDanger@infosec.exchange @clive@saturation.social @jasonkoebler@mastodon.social @404mediaco@mastodon.social |
Föderation EN Do 23.01.2025 21:57:52 |
Föderation EN Do 23.01.2025 21:57:44 @RnDanger @jasonkoebler @404mediaco yeah good question! |
Föderation EN Do 23.01.2025 23:36:56 @RnDanger @clive @jasonkoebler @404mediaco This would need to be deployed to a server with a fixed cost, not one with extra costs for execution time or bandwidth |
Föderation EN Do 23.01.2025 20:26:44 |
Föderation EN Do 23.01.2025 20:38:05 @clive @jasonkoebler @404mediaco Tip of the Cub cap to the hacker! |
Föderation EN Do 23.01.2025 20:39:21 @clive @jasonkoebler @404mediaco good idea👍 |
Föderation EN Do 23.01.2025 20:51:32 @clive @jasonkoebler @404mediaco Are we really getting Barrier Mazes from Ghost In the Shell?? |
Föderation EN Do 23.01.2025 21:58:58 @Kamikaze @jasonkoebler @404mediaco it would appear so |
Föderation EN Do 23.01.2025 20:54:03 @clive @jasonkoebler @404mediaco There are a number of "infinite maze" generators like #Nepenthes (https://zadzmo.org/code/nepenthes/) or #Iocaine (https://pages.madhouse-project.org/algernon/infrastructure.org/eru_services_iocaine) that help #poisonthewell for AI companies training their LLMs on your content, complete with guides on integration with #Caddy (https://pages.madhouse-project.org/algernon/infrastructure.org/common_services_caddy_snippets_poison_ai) |
Föderation EN Do 23.01.2025 21:59:16 @mhartle @jasonkoebler @404mediaco aha, damn interesting |
Föderation EN Do 23.01.2025 20:55:08 @clive @jasonkoebler @404mediaco I think this stuff has been catching archiving sites too, though. So you try to archive a site for accountability purposes and it ends up not being able to be archived because it just churns forever. |
Föderation EN Do 23.01.2025 21:59:31 @meganL @jasonkoebler @404mediaco that danger leapt out at me too |
Föderation NL Do 23.01.2025 23:52:25 @clive @meganL @jasonkoebler @404mediaco It depends on the way the maze gets triggered. If the robots.txt explicitly excludes a certain url that is not directly linked from anywhere and that url sees a get request you can be 100% sure you have trapped a bogus crawler. Definitely go to town with it. Most crawler are usually not that stupid so therefore the triggers get slightly less reliable and there is a chance you trap a legitimate crawler like an archiving site for instance. |
Föderation EN Do 23.01.2025 21:00:04 |
Föderation EN Do 23.01.2025 21:03:02 @clive @jasonkoebler @404mediaco What a great idea. |
Föderation EN Do 23.01.2025 21:03:04 @clive @jasonkoebler @404mediaco |
Föderation EN Do 23.01.2025 21:08:47 |
Föderation EN Do 23.01.2025 21:20:01 @clive I once made a webpage that would continually slowly send random words and links to itself, never quite closing the connection. It's honestly not worth the trouble. It'd be nice if it interfered with AI training, though. |
Föderation NL Do 23.01.2025 23:57:19 |
Föderation NL Fr 24.01.2025 00:50:26 @alterelefant That's tricky, since the crawlers have vast amounts of IP addresses. I just set traps to detect web spiders automatically, if traffic gets to be a problem. |
Föderation NL Fr 24.01.2025 07:20:17 @skybrook Don't filter by IP-address, but filter by behavior. I know, that's sometimes easier said than done. The following one is straight forward. A get request to a bogus link in the infinit labyrinth qualifies for a labyrinth response, whether the IP-address is known or a new one. With a labyrinth response I would throw in a random delay between 100 ms and 5 s, and a one in fifty chance of a 30 s delay before responding with a http 503. That should usually be enough to slow down crawlers. |
Föderation NL Fr 24.01.2025 17:53:01 @alterelefant Well right, that's what I meant by "traps to detect." I didn't think of setting it so every URL for any detected IP address would become a labyrinth response... not a bad idea really. |
Föderation EN Fr 24.01.2025 20:43:22 @skybrook Crawler that use multiple endpoints to distribute the crawl load will handout urls to be crawled to those endpoints. Their freshly acquired labyrinth links will make a new endpoint immediately identifiable. |
Föderation EN Do 23.01.2025 21:39:16 @clive @jasonkoebler @404mediaco Might be nice to add something to poison the data, contradictory statements, things that break the tokenizer, maybe subtle statistical tricks to inject gnarly statements. |
Föderation EN Do 23.01.2025 22:29:04 @ThePowerNap @clive @jasonkoebler @404mediaco Like a little Markov chain text generator? Cheaper than an LLM yet maybe good enough to pass for 'real' text in a training set. |
Föderation EN Do 23.01.2025 23:07:45 @libroraptor @clive @jasonkoebler @404mediaco I like where your head is at |
Föderation EN Fr 24.01.2025 02:53:11 @ThePowerNap @clive @jasonkoebler @404mediaco my head's a lot like that pink labyrinth in the picture, to be honest, but with more dimensions and no indication of whether entrance or exit exist or make sense I wish that I still had the brain capacity and physical energy to implement even half of the ideas that I come up with. |
Föderation EN Fr 24.01.2025 03:31:28 @libroraptor @ThePowerNap @jasonkoebler @404mediaco markov mazes |
Föderation EN Fr 24.01.2025 09:03:26 @libroraptor @ThePowerNap @clive @jasonkoebler @404mediaco [f4mi] used wiki pages to which simplistic synonym substitution has been applied using a Python script |
Föderation EN Fr 24.01.2025 09:56:33 @bornach @ThePowerNap @clive @jasonkoebler @404mediaco That's very funny! Turning classic SEO pervert techniques to greater good. Also a good presenter. I rarely manage to listen to youtube talks – too much irrelevant babbling and metatalk, but this person has a clear narrative and stays on track. |
Föderation EN Do 23.01.2025 21:42:35 @clive @jasonkoebler @404mediaco On one hand if you want to protect your art or something similar from being scraped because you sell it or just don't want your style to be stolen it's nice having such tools. But on the other hand if you value human rights and let your values influence your texts and pictures then you can influence AIs with your input. I hate that AI has bad influence on the environment because it needs so much computing resources but we cannot stop AI so this can be a small influence from ourselves. |
Föderation EN Fr 24.01.2025 01:03:51 @chikl There is nothing stopping us, as a species, from deciding this isn't a good technological path and just unplugging it all. We made this thing and we can unmake it. |
Föderation EN Do 23.01.2025 21:52:39 @clive @jasonkoebler @404mediaco the house of leaves but you can't leave |
Föderation EN Do 23.01.2025 23:38:21 @bluecaller @clive @jasonkoebler @404mediaco The Hotel California |
Föderation EN Do 23.01.2025 22:06:16 @404mediaco @clive @jasonkoebler Love this for the bots and scrapers. Choke on it. |
Föderation EN Do 23.01.2025 22:13:09 @clive @jasonkoebler @404mediaco Brilliant idea! 😂 I can just imagine AI scrapers struggling to process an endless stream of random pages. It's like trolling on level 80 — mad respect to the hacker for the creativity! |
Föderation EN Do 23.01.2025 22:14:22 @clive @jasonkoebler @404mediaco This makes me wonder if it would be possible to insert garbage into rendered HTML (to confuse bots) and something like Nightshade into the rendered page (to poison image downloading and screenshot OCR) both in ways that aren't distracting to human readers. |
Föderation HR Do 23.01.2025 22:28:05 @clive @jasonkoebler @404mediaco I have difficulty to understand why someone wants to protect own web site from scraping. Isn't primarily reason of having a web site desire to be shown to world and particular information spread? I don't understand the fuss. |
Föderation NL Fr 24.01.2025 00:03:23 @matasoft @clive @jasonkoebler @404mediaco If all LLM platforms always provide direct links it would indeed bring people to your site. But the fact is that most LLM's just steal your content without giving any credit. That's what this is for. |
Föderation EN Fr 24.01.2025 00:37:20 @matasoft @clive @jasonkoebler @404mediaco there are masses of AI crawler bots that can easily overwhelm a web site with traffic. They don’t throttle or follow limits or respect robots.txt or other conventions. They’ll easily overwhelm available bandwidth and take down a small web site. It’s 100% reasonable (and necessary) to consider them hostile. |
Föderation EN Fr 24.01.2025 07:49:36 @matasoft disrespectful scraping for LLM removes attribution, is non-consentual, has no methods for rectification (misinfo), mixes your data with others' (sometime criminally sourced) and often is implemented so badly that it causes load/stability/cost issues to the sites. And this is not "one shot", but happens all the time. No one wants this except the grifters. |
Föderation HR Fr 24.01.2025 08:55:38 @matasoft Assuming you were asking in good faith. |
Föderation HR Fr 24.01.2025 17:47:37 @econads @matasoft @jasonkoebler @404mediaco Yep and some of the objection to mass-scraping-by-AI-firms is that the AI firms are not helping to provide new audiences for one's online writing quite the opposite, possibly ... ... given that as more people start "chatting with an LLM AI" instead of searching the web ... ... they become happy with good-enough answer they get from the LLM, and never bother to go to the sites the answers are based upon |
Föderation HR Fr 24.01.2025 17:58:39 @clive @econads @jasonkoebler @404mediaco well, Perplexity AI, for example, provides citations links by default. |
Föderation EN Do 23.01.2025 22:29:04 @clive @jasonkoebler @404mediaco not all heroes wear cape |
Föderation EN Do 23.01.2025 22:38:14 @clive @jasonkoebler @404mediaco |
Föderation EN Do 23.01.2025 22:41:19 |
Föderation EN Do 23.01.2025 22:56:50 @clive @jasonkoebler @404mediaco there was a time when people wanted their pages to be scraped and indexed. Balkanization of the Web. The battle for hegemony of information. Now we're injecting poison into the process. It's like chemotherapy. |
Föderation NL Fr 24.01.2025 00:05:14 @Qbitzerre @clive @jasonkoebler @404mediaco Indeed a good analogy, to get rid of the cancer that LLM trainingsets are to copyright. |
Föderation EN Do 23.01.2025 23:02:10 @clive @jasonkoebler @404mediaco Daisy Daisy give me your ans w e r d o o |
Föderation EN Do 23.01.2025 23:28:17 Does anyone know, if these web-crawlers also scrape the content of kindle e-books? Can they enter these kind of products? |
Föderation NL Fr 24.01.2025 00:06:41 @piperef @clive @jasonkoebler @404mediaco Bezos probably already sold everything to those AI houses. |
Föderation EN Fr 24.01.2025 00:23:59 @piperef @clive @jasonkoebler @404mediaco If it can be read, it can be scraped. You can mitigate the issue (often by putting it behind an account wall), but not eliminate it entirely. The film industry has been desperately trying to stop piracy and I have yet to see a situation where a movie was released but wasn't available on piracy sites. But also, yeah, if it's kindle, it's probably already part of Amazon's AI dataset. |
Föderation EN Fr 24.01.2025 00:37:06 @StarkRG @piperef @jasonkoebler @404mediaco yeah, I am sure this is true I read recently, though I can't find the source (still looking, will update if I can find it) that US AI firms used corpuses of cracked western ebooks that circulate in Russia etc, for training |
Föderation EN Fr 24.01.2025 09:57:05 |
Föderation EN Fr 24.01.2025 17:39:36 @piperef @StarkRG @jasonkoebler @404mediaco yeah, alas |
Föderation EN Do 23.01.2025 23:28:59 love it |
Föderation EN Do 23.01.2025 23:47:09 |
Föderation EN Do 23.01.2025 23:59:58 @clive @jasonkoebler @404mediaco @piperef Maybe cross this tech with The Mandelbrot Set to keep the AI web crawlers from detecting that they're being trapped. Is that possible, and would it work? https://duckduckgo.com/?t=ffab&q=The+Mandelbrot+Set&iax=images&ia=images Also, maybe what's needed is a way to make this technology more user friendly and safe for average website owners/developers to use. What if there were a single "container" website used for this purpose, that the links would all point to, which would then have the AI trap hosted on that site? What if the way that site was programmed, it would then point the AI web crawlers to the sites owned by the websites that deployed them, to serve up infinite content from the sites the AI web crawlers were sent from in the first place? |
Föderation EN Fr 24.01.2025 00:09:33 @clive finally some good news |
Föderation EN Fr 24.01.2025 00:18:47 @clive @jasonkoebler @404mediaco And google? Why do some people think that offering their server data to google is ok, so they will be found, but concurrent search engines are evil? |
Föderation EN Fr 24.01.2025 00:22:59 @clive @jasonkoebler @404mediaco A curated list of strategies, offensive methods, and tactics for (algorithmic) sabotage, disruption, and deliberate poisoning. |
Föderation EN Fr 24.01.2025 00:24:04 @peterfr @jasonkoebler @404mediaco damn, I hadn't seen that, super fascinating! thank you for pointing it out |
Föderation EN Fr 24.01.2025 00:23:59 @clive @jasonkoebler @404mediaco i recall a similar device i came across 30+ years ago, which was a device that made dynamic lists of fake email addresses on a page that had lots of internal links, designed to ensnare spam-harvesting bots. this looks very similar... 👍👍 |
Föderation EN Fr 24.01.2025 00:37:44 @HybridElephant @jasonkoebler @404mediaco yes!! Some folks mentioned this elsewhere in the thread I'd not heard about this back in the day, but it makes sense someone did this |
Föderation EN Fr 24.01.2025 00:28:12 @clive "so the crawler gets stuck crawling and scraping endless and meaningless pages" So, like if it landed on the Daily Mail site on any normal day... |
Föderation EN Fr 24.01.2025 00:35:16 lol yes or, really, Tiktok or Facebook or any algorithmically-juked social media feed |
Föderation EN Fr 24.01.2025 00:48:32 @clive @jasonkoebler @404mediaco So, let me get this straight: he created a possible way to stop AIs, that would also stop search engines from indexing his site, even though being indexed is usually what a publicly accessible site wants. Somebody who, by their claim, is an AI CEO says that it'd be easy to avoid, and he counters that Google did not avoid it. So, basically, he succeeded in stopping search engines, but not AI bots. Nice work. |
Föderation EN Fr 24.01.2025 00:55:39 @clive @jasonkoebler @404mediaco Thats a great idea 👍 |
Föderation EN Fr 24.01.2025 00:58:13 @clive @jasonkoebler @404mediaco I just serve them a 20 GB textfile with some sort of Lorem Ipsum. |
Föderation · Fr 24.01.2025 01:21:13 @clive @jasonkoebler @404mediaco what a waste of CPU and bandwidth and electricity |
Föderation EN Fr 24.01.2025 01:33:05 |
Föderation EN Fr 24.01.2025 01:37:39 @clive @jasonkoebler @404mediaco I love this from the perspective of tanking AI, but hate this in the perspective of the massive amounts of water a process like this must use. |
Föderation EN Fr 24.01.2025 01:53:42 @clive @maximum_mew @jasonkoebler @404mediaco I'd like to be able to install this on every website I run. |
Föderation EN Fr 24.01.2025 01:58:28 @mlanger @maximum_mew @jasonkoebler @404mediaco it'd be pretty funny to see it offered as a one-click option on a hosting provider |
Föderation EN Fr 24.01.2025 03:07:37 @clive @maximum_mew @jasonkoebler @404mediaco Sure would. I'd pay to click. |
Föderation EN Fr 24.01.2025 03:31:06 |
Föderation EN Fr 24.01.2025 02:07:10 I did something like this 20+ years ago. Simple perl cgi to generate completely random text. It was just a tiny personal site but occasionally something would walk into it. I'm recreating the site now and just have a small loop of pages. There is a period at the end of one sentence that links into a slightly different loop. Occasionally I see what looks like a real user going through the loop. Other times the hidden url will get hit likely from a web crawler. I have slightly different loops for my default page versus my named web sites. Most scanning seems to come via ip address. A tiny bit of traffic comes using the name in a self signed cert which was briefly used as the default cert. |
Föderation EN Fr 24.01.2025 03:33:40 @sab38 @jasonkoebler @404mediaco right on! |
Föderation EN Fr 24.01.2025 02:36:54 @clive @jasonkoebler @404mediaco How do I get one? I love the mere concept! |
Föderation EN Fr 24.01.2025 03:32:13 @NoctisEqui @jasonkoebler @404mediaco I think it requires some knowledge of server-side coding to really implement these but a hosting provider could, if it wanted to, make it one-click installable |
Föderation EN Mo 27.01.2025 23:23:06 @clive @jasonkoebler @404mediaco I figured that. It is a wonderful notion tho! Here’s to our heroes, fighting on the Tech Front! |
Föderation EN Fr 24.01.2025 03:23:45 @clive @jasonkoebler @404mediaco won't be long before AI tecbros pay the governments to get laws passed to make it illegal to stop them scraping or anyone poisoning their data sets etc except if we try scraping their platforms. Similar to say how it is illegal for us to use copyrighted data, but for them it is a free pass even without any exemptions. |
Föderation EN Fr 24.01.2025 03:30:57 @htpcnz @jasonkoebler @404mediaco yeah, I can see that happening |
Föderation EN Fr 24.01.2025 03:26:06 @clive @jasonkoebler @404mediaco won't be long before AI tecbros pay the governments to get laws passed to make it illegal to stop them scraping sites or anyone poisoning their data sets etc except if we try scraping their platforms. Similar to say how it is illegal for us to use copyrighted data, but for them it is a free pass even without any exemptions. |
Föderation EN Fr 24.01.2025 04:35:18 @clive @jasonkoebler @404mediaco So it's wpoison for AI training bots? Truly there's nothing new under the sun. https://web.archive.org/web/20160821195248/http://www.monkeys.com:80/wpoison/ |
Föderation EN Fr 24.01.2025 17:51:45 |
Föderation EN Fr 24.01.2025 05:20:10 @clive How cool is that! |
Föderation EN Fr 24.01.2025 17:51:39 trippy, eh? |
Föderation EN Fr 24.01.2025 06:00:25 @clive @jasonkoebler @404mediaco If I could love this post one million times, I would. It is pretty clear now that beyond the pathetic and socially awkward filthy rich human oligarchs you saw on TV this week, AI has become the real enemy. Overfeed it and confuse it. Sounds like we're going to have to recreate Jorge Luis Borges' Library of Babel. An infinite word labyrinth where the machine loses its mind, alone and vanquished at last. How poetic. And what a trip. |
Föderation EN Fr 24.01.2025 17:51:10 @maniandthenonos @jasonkoebler @404mediaco this really is poetry, right here digital-age poetry, autogenerated, as a defense mechanism Borges would be giving this stuff a double-shooty-fingers |
Föderation EN Fr 24.01.2025 20:41:21 @clive @jasonkoebler @404mediaco I would love to see this headline, "Borges gives the double bird to Sam Altman from beyond". My grandfather fought against the Nazis. We'll be fighting against AI. |
Föderation EN Fr 24.01.2025 06:35:48 @clive @syncros @jasonkoebler @404mediaco This is our future: road cones set on the hood of waymos |
Föderation EN Fr 24.01.2025 08:55:44 @clive @jasonkoebler @404mediaco I can't find the link right now, but there are tools that instead of that, generate garbage for AI scrapers, feeding them nonsense "human" text. That way, they don't see anything wrong (it doesn't slow or block the spiders) but the data they get is bullshit. |
Föderation EN Fr 24.01.2025 17:44:28 @twit_terrorist @jasonkoebler @404mediaco yes!! There's a cool list of them here: https://saturation.social/@clive/113880275878023908 |
Föderation EN Fr 24.01.2025 18:45:05 |
Föderation EN Fr 24.01.2025 20:36:32 @clive @jasonkoebler @404mediaco Put “No unauthorised access” on the landing page and just sit back as they ignore it. |