Regularly when we launch a new site we also put up a static scrape of the old site so that users can access content that has been removed from the new site. Commonly this is old news content that the client want’s to keep available for a transition period, but doesn’t want cluttering up the new site.
This is fairly straightforward – feed an application such as SiteSucker a URL, tweak a few settings and let it loose. The resulting files can be dropped into a
/archive/ folder and a link put on the site to direct users there for old content.
The most common hitch we hit, is when the old site was hosted on a Windows server as IIS is, largely, case insensitive. This means that
/FolderOne/image.jpg will both resolve to the same file. Apache however is case sensitive and will treat those 2 as different paths resulting in broken links in our archive. This won’t necessarily be a problem, but the fact IIS doesn’t mind means that the developers and site maintainers can be a little sloppy with their capitalisation, and there are often multiple versions used.
Configuring apache to be case insensitive
I personally consider case sensitivity to be appropriate normally, but it is possible to allow Apache to match case insensitivly using
mod_speling (and no, I haven’t mistyped ‘spelling’…).
Mod_speling is included as part of the standard Apache module bundle, but may not be activated. You can check with
apache2ctl -M and if you don’t see it, enable it with
a2enmod speling and then restart apache with
service apache2 restart (these commands might vary depending on which OS you are running).
Now it’s available for use, it needs to be enabled within your vhost configuration or in .htaccess. Since I only want it to affect
/archive/ I created a
.htaccess file there and added:
<IfModule mod_speling.c> CheckCaseOnly on CheckSpelling on </IfModule>
Counterintuitively this tells mod_speling to only check for case mis-matches and not attempt to correct misspellings – there’s a full explanation of how to use mod_speling in the official documentation for it.
Problems with rewrites
This works fine, unless the root site is using rewrites – as any installation of a CMS will be. This is because
mod_speling from working; but as long as you don’t need rewrites in
/archive/ the solution is simple – turn off the rewrite engine in that directory. Just add
RewriteEngine off to the start of
/archive/.htaccess before the
RewriteEngine off <IfModule mod_speling.c> CheckCaseOnly on CheckSpelling on </IfModule>