Kurt McKee

lessons learned in production

Friends with robots

Posted 5 March 2020 in website

I installed a static log analyzer, goaccess, and instantly realized that bots and spiders have been trying to access robots.txt...except that file doesn't exist on my site so they're being served a pretty 404 page.

To prevent a bunch of unnecessary HTTP 404 messages and wasted bandwidth I've created a blank robots.txt file and used Pelican's support for static content to put the file in the right place. Here's an excerpt from my Pelican configuration:

STATIC_PATHS = ['static/robots.txt']
    'static/robots.txt': {'path': 'robots.txt'},

With this in place, I should see a drop in 404 errors and a tiny reduction in bandwidth!

Further reading: goaccess, Robots exclusion standard