

A friend (works in IT, but asks me about server related things) of a friend (not in tech at all) has an incredibility low traffic niche forum. It was running really slow (on shared hosting) due to bots. The forum software counts unique visitors per 15 mins and it was about 15k/15 mins for over a week. I told him to add Cloudflare. It dropped to about 6k/15 mins. We excitemented turning Cloudflare off/on and it was pretty consistent. So then I put Anubis on a server I have and they pointed the domain to my server. Traffic drops to less than 10/15 mins. I’ve been experimenting with toggling on/off Anubis/Cloudflare for a couple months now with this forum. I have no idea how the bots haven’t scrapped all of the content by now.
TLDR: in my single isolated test, Cloudflare blocks 60% of crawlers. Anubis blocks presumably all of them.
Also if anyone active on Lemmy runs a low traffic personal site and doesn’t know how or can’t run Anubis (eg shared hosting), I have plenty of excess resources I can run Anubis for you off one of my servers (in a data center) at no charge (probably should have some language about it not being perpetual, I have the right to terminate without cause for any reason and without notice, no SLA, etc). Be aware that it does mean HTTPS is terminated at my Anubis instance, so I could log/monitor your traffic if I wanted as well, so that’s a risk you should be aware of.
It could be, but they seem to get through Cloudflare’s JS. I don’t know if that’s because Cloudflare is failing to flag them for JS verification or if they specifically implement support for Cloudflare’s JS verification since it’s so prevalent. I think it’s probably due to an effective CPU time budget. For example, Google Bot (for search indexing) runs JS for a few seconds and then snapshots the page and indexes it in that snapshot state, so if your JS doesn’t load and run fast enough, you can get broken pages / missing data indexed. At least that’s how it used to work. Anyway, it could be that rather than a time cap, the crawlers have a CPU time cap and Anubis exceeds it whereas Cloudflare’s JS doesn’t – if they did use a cap, they probably set it high enough to bypass Cloudflare given Cloudflare’s popularity.