Thursday, June 23, 2005

Mod_rewrite makes linking easy

Abbreviate URLs with mod_rewrite

Called the "Swiss Army knife" of Apache modules, mod_rewrite can be used for everything from URL rewriting to load balancing. Where mod_rewrite and its ilk shine is in abbreviating and rewriting URLs. One of the most effective optimization techniques available to web developers, URL abbreviation substitutes short URLs like "r/pg" for longer ones like "/programming" to save space. Apache and IIS, Manilla, and Zope all support this technique. Yahoo.com, WebReference.com, and other popular sites use URL abbreviation to shave anywhere from 20% to 30% off of HTML file size. The more links you have, the more effective this technique.

How mod_rewrite Works

As its name implies mod_rewrite rewrites URLs using regular expression pattern matching. If a URL matches a pattern that you specify, mod_rewrite rewrites it according to the rule conditions that you set. mod_rewrite essentially works as a smart abbreviation expander. Let's take our example above from WebReference.com. To expand "r/pg" into "/programming" Apache requires two directives, one turns on the rewriting machine (RewriteEngine On) and the other specifies the rewrite pattern matching rule (RewriteRule). The RewriteRule syntax looks like this:
RewriteRule
Becomes:RewriteEngine OnRewriteRule ^/r/pg(.*) /programming$1

This regular expression matches a URL that begins with the /r/ (we chose this sequence to signify a redirect to expand) with "pg" following immediately afterwords. The pattern (.*) matches one or more characters after the "pg." So when a request comes in for the URL a href="/r/pg/perl/ the rewrite rule expands this abbreviated URI into a href="/programming/perl/.

RewriteMap for Multiple Abbreviations
That'll work well for a few abbreviations, but what if you have lots of links? That's where the RewriteMap directive comes in. RewriteMaps group multiple lookup keys (abbreviations) and their corresponding expanded values into one tab-delimited file. Here's an example map file snippet from WebReference.com.
d dhtml/
dc dhtml/column
pg programming
h html/
ht html/tools/

The MapName file maps keys to values for a rewrite rule using the following syntax:${ MapName : LookupKey DefaultValue }

MapNames require a generalized RewriteRule using regular expressions. The RewriteRule references the MapName instead of a hard-coded value. If there is a key match, the mapping function substitutes the expanded value into the regular expression. If there's no match, the rule substitutes a default value or a blank string.

To use this MapName we need a RewriteMap directive to show where the mapping file is, and a generalized regular expression for our RewriteRule.
RewriteEngine OnRewriteMap abbr txt:/www/misc/redir/abbr_webref.txtRewriteRule ^/r/([^/]*)/?(.*) $(abbr:$1}$2 [redirect=permanent,last]

The new RewriteMap rule points the rewrite module to the text version of our map file. The revamped RewriteRule looks up the value for matching keys in the map file. The permanent redirect (301 instead of 302) boosts performance by stopping processing once the matching abbreviation is found in the map file.

Binary Hash RewriteMaps

For maximum speed you should convert your text map files into binary *DBM hash file, which is optimized for maximum lookup speed. Then the above RewriteMap line would look like this:RewriteMap abbr txt:/www/misc/redir/abbr_webref
Automating URL Abbreviation
The above URL abbreviation technique works well for URLs that don't change very often. But what about news or blog sites where URLs change every hour or every minute? You can create a shell script that automatically scans and abbreviates incoming URLs or use the free open source script available at WebReference.com (http://www.webreference.com/scripts/) that does just that.

Conclusion
Abbreviating URLs with mod_rewrite is one of the most effective techniques available to optimize HTML files. File size savings can range up to 20% to 30%, depending on the number of links in your HTML page. You can combine this technique with URL Rewriting with Content Negotiation for maximum savings. Best used on high traffic pages like home pages, automated URL abbreviation can squeeze more bytes out of critical pages for server-savvy developers.

1 Comments:

Anonymous Anonymous said...

Hey There great Web Site J Devlin - Black Ops Marketing, great post on Mod_rewrite makes linking easy I have a great site to look at flash, feel free to call anytime, thanks!

7:42 PM  

Post a Comment

<< Home