Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?
Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn’t entail handing over your website to Cloudflare?
If I’m reading your link right, they are using user agents. Granted there’s a lot. Maybe you could whitelist user agents you approve of? Or one of the commenters had a list that you could block. Nginx would be able to handle that.
Thank you for the reply, but at least one commenter claims they’ll impersonate Chrome UAs.