Normalizing URLsPosted 1 June 2010 in urlnorm
Last weekend I started coding up a URL normalizer. The idea is that it will be able to take a URL and change it to its canonical form, so that
http://KURTMCKEE.ORG:80 will become
http://kurtmckee.org/. Most of that time was spent coding up some proof-of-concept functions. This weekend I added IP address conversion math to accommodate all of the degenerate forms that IP addresses could potentially take. I also kicked out some unit tests and figured it was time to push the code to github. Check it out.
This isn't a release announcement; the software is sketchy at best right now, and won't function properly. The only thing it can competently do right now is manipulate IP addresses! However, my vision is ultimately to have a tool that might could rip out query variables that add visit tracking cruft to the URL, such as those added by Feedburner.