· Föderation NL Do 23.01.2025 23:57:19 |
Föderation NL Fr 24.01.2025 00:50:26 @alterelefant That's tricky, since the crawlers have vast amounts of IP addresses. I just set traps to detect web spiders automatically, if traffic gets to be a problem. |
Föderation NL Fr 24.01.2025 07:20:17 @skybrook Don't filter by IP-address, but filter by behavior. I know, that's sometimes easier said than done. The following one is straight forward. A get request to a bogus link in the infinit labyrinth qualifies for a labyrinth response, whether the IP-address is known or a new one. With a labyrinth response I would throw in a random delay between 100 ms and 5 s, and a one in fifty chance of a 30 s delay before responding with a http 503. That should usually be enough to slow down crawlers. |
Föderation NL Fr 24.01.2025 17:53:01 @alterelefant Well right, that's what I meant by "traps to detect." I didn't think of setting it so every URL for any detected IP address would become a labyrinth response... not a bad idea really. |
Föderation EN Fr 24.01.2025 20:43:22 @skybrook Crawler that use multiple endpoints to distribute the crawl load will handout urls to be crawled to those endpoints. Their freshly acquired labyrinth links will make a new endpoint immediately identifiable. |