Kurt McKee

lessons learned in production

Archive

Hey there! This article was written in 2020.

It might not have aged well for any number of reasons, so keep that in mind when reading (or clicking outgoing links!).

Friends with robots

Posted 5 March 2020 in website

I installed a static log analyzer, goaccess, and instantly realized that bots and spiders have been trying to access robots.txt...except that file doesn't exist on my site so they're being served a pretty 404 page.

To prevent a bunch of unnecessary HTTP 404 messages and wasted bandwidth I've created a blank robots.txt file and used Pelican's support for static content to put the file in the right place. Here's an excerpt from my Pelican configuration:

STATIC_PATHS = ['static/robots.txt']
EXTRA_PATH_METADATA = {
    'static/robots.txt': {'path': 'robots.txt'},
}

With this in place, I should see a drop in 404 errors and a tiny reduction in bandwidth!

Further reading: goaccess, Robots exclusion standard

☕ Like my work? I accept tips!