· Föderation EN Do 23.01.2025 20:55:08 @clive @jasonkoebler @404mediaco I think this stuff has been catching archiving sites too, though. So you try to archive a site for accountability purposes and it ends up not being able to be archived because it just churns forever. |
Föderation EN Do 23.01.2025 21:59:31 @meganL @jasonkoebler @404mediaco that danger leapt out at me too |
Föderation NL Do 23.01.2025 23:52:25 @clive @meganL @jasonkoebler @404mediaco It depends on the way the maze gets triggered. If the robots.txt explicitly excludes a certain url that is not directly linked from anywhere and that url sees a get request you can be 100% sure you have trapped a bogus crawler. Definitely go to town with it. Most crawler are usually not that stupid so therefore the triggers get slightly less reliable and there is a chance you trap a legitimate crawler like an archiving site for instance. |