Hhmx.de * hhmx.de

Megan Lynch (she/her)

· Föderation EN Do 23.01.2025 20:55:08

@clive @jasonkoebler @404mediaco I think this stuff has been catching archiving sites too, though. So you try to archive a site for accountability purposes and it ends up not being able to be archived because it just churns forever.

0x 1 0x

Clive Thompson

Clive Thompson
@clive@saturation.social

Föderation EN Do 23.01.2025 21:59:31

@meganL @jasonkoebler @404mediaco

that danger leapt out at me too

0x 1 0x

Frank Heijkamp

Frank Heijkamp
@alterelefant@mastodontech.de

Föderation NL Do 23.01.2025 23:52:25

@clive @meganL @jasonkoebler @404mediaco It depends on the way the maze gets triggered. If the robots.txt explicitly excludes a certain url that is not directly linked from anywhere and that url sees a get request you can be 100% sure you have trapped a bogus crawler. Definitely go to town with it. Most crawler are usually not that stupid so therefore the triggers get slightly less reliable and there is a chance you trap a legitimate crawler like an archiving site for instance.

0x 0 0x