Friends with robots
Posted 5 March 2020 in websiteI installed a static log analyzer, goaccess, and instantly realized that bots and spiders
have been trying to access robots.txt
...except that file doesn't exist on my site so they're
being served a pretty 404 page.
To prevent a bunch of unnecessary HTTP 404 messages and wasted bandwidth I've created a
blank robots.txt
file and used Pelican's support for static content to put the file
in the right place. Here's an excerpt from my Pelican configuration:
STATIC_PATHS = ['static/robots.txt']
EXTRA_PATH_METADATA = {
'static/robots.txt': {'path': 'robots.txt'},
}
With this in place, I should see a drop in 404 errors and a tiny reduction in bandwidth!
Further reading: goaccess, Robots exclusion standard