Thursday, June 23, 2005

Search Engine Optimization using short URLs

Use Short Relative URLs


Brevity is a good thing. Absolute, verbose URLs are out, relative short URLs are in. By carefully naming your files and directories, and judicious use of abbreviation with mod_rewrite and content negotiation with same, you can speed up your pages while maintaining legibility and search engine positioning. Here's an illustrative example of a overextended URL:

[img src="http://www.domain.com/files/images/
single_pixel_transparent_gif.gif" width="1" height="1"]

This URL has a number of problems, including its length and missing ALT attribute. Let's see how we can shorten it into a more reasonable size. But first, a brief URL tutorial.


Anatomy of an URL

A uniform resource locator, or URL, is a unique address that points to where a resource is located on the Internet. The URL consists of the document's name preceded by the directories where it resides, preceded by a domain name, preceded by the protocol required to retrieve the document.

protocol://server_domain_name/pathname


Relative URLs

When possible, use relative URLs rather than absolute URLs. Absolute URLs include the server and protocol so they are unambiguous, but can cause problems when moving your files to another location. They are also unnecessarily long. Relative URLs base their location on the document's base URL, and browsers fill in the rest. Our too-long-at-the-party single pixel GIF could be shortened to:
/files/images/single_pixel_transparent_gif.gif

Abbreviate File and Directory Names

Even better, abbreviate the filename of this non-functional spacer graphic, like this. /files/images/t.gif
You can go even further and use content negotiation to omit the ".gif" part entirely. But why stop there? Move the image, your logo, and other frequently used resources up to the root level of your web server, and use the following minimalist URL.

/t.gif

The Base Element


To make relatively linked pages work offline and shorten your URLs, you can use the [base href] header element. The base tag must appear within the header of your XHTML document. Normally, browsers fill in relative URLs based on the URL of the current document. You can change that behavior with the base element to reference the base URL, not the current document's URL when resolving resources, like this:
[base href="http://www.domain.com/" /]
Now your relative URLs will resolve to this domain's top-level directory and also work from your hard drive.

Relative Base Element

One little-known technique is to use a relative URL as your base href to save space. By referencing a frequently used directory, you can save space within your XHTML files. For our single pixel GIF example you could use the following base element.
[base href="/files/images/"]

Now when you reference an image, you can just refer to the file name itself, without the directory location. For pages with numerous images that you may not want to move to the top level directory, this can save a substantial amount of space.
What About Search Engine Optimization?

Search engine optimizers and information architects naturally encourage descriptive filenames and directories. Search engine spiders feast on keyword-filled URLs, auto-breadcrumb scripts display directories and files, and logical hierarchies help users navigate. To avoid over-abbreviation, some webmasters choose to abbreviate and relocate only frequently used resources on high traffic pages, or use mod_rewrite or the base element for the best of both worlds.

Conclusion


Using short, relative URLs can make your pages download faster, and ease migration headaches. Using the base element and mod_rewrite can alleviate the need for absolute URLs, and save additional space.

Mod_rewrite makes linking easy

Abbreviate URLs with mod_rewrite

Called the "Swiss Army knife" of Apache modules, mod_rewrite can be used for everything from URL rewriting to load balancing. Where mod_rewrite and its ilk shine is in abbreviating and rewriting URLs. One of the most effective optimization techniques available to web developers, URL abbreviation substitutes short URLs like "r/pg" for longer ones like "/programming" to save space. Apache and IIS, Manilla, and Zope all support this technique. Yahoo.com, WebReference.com, and other popular sites use URL abbreviation to shave anywhere from 20% to 30% off of HTML file size. The more links you have, the more effective this technique.

How mod_rewrite Works

As its name implies mod_rewrite rewrites URLs using regular expression pattern matching. If a URL matches a pattern that you specify, mod_rewrite rewrites it according to the rule conditions that you set. mod_rewrite essentially works as a smart abbreviation expander. Let's take our example above from WebReference.com. To expand "r/pg" into "/programming" Apache requires two directives, one turns on the rewriting machine (RewriteEngine On) and the other specifies the rewrite pattern matching rule (RewriteRule). The RewriteRule syntax looks like this:
RewriteRule
Becomes:RewriteEngine OnRewriteRule ^/r/pg(.*) /programming$1

This regular expression matches a URL that begins with the /r/ (we chose this sequence to signify a redirect to expand) with "pg" following immediately afterwords. The pattern (.*) matches one or more characters after the "pg." So when a request comes in for the URL a href="/r/pg/perl/ the rewrite rule expands this abbreviated URI into a href="/programming/perl/.

RewriteMap for Multiple Abbreviations
That'll work well for a few abbreviations, but what if you have lots of links? That's where the RewriteMap directive comes in. RewriteMaps group multiple lookup keys (abbreviations) and their corresponding expanded values into one tab-delimited file. Here's an example map file snippet from WebReference.com.
d dhtml/
dc dhtml/column
pg programming
h html/
ht html/tools/

The MapName file maps keys to values for a rewrite rule using the following syntax:${ MapName : LookupKey DefaultValue }

MapNames require a generalized RewriteRule using regular expressions. The RewriteRule references the MapName instead of a hard-coded value. If there is a key match, the mapping function substitutes the expanded value into the regular expression. If there's no match, the rule substitutes a default value or a blank string.

To use this MapName we need a RewriteMap directive to show where the mapping file is, and a generalized regular expression for our RewriteRule.
RewriteEngine OnRewriteMap abbr txt:/www/misc/redir/abbr_webref.txtRewriteRule ^/r/([^/]*)/?(.*) $(abbr:$1}$2 [redirect=permanent,last]

The new RewriteMap rule points the rewrite module to the text version of our map file. The revamped RewriteRule looks up the value for matching keys in the map file. The permanent redirect (301 instead of 302) boosts performance by stopping processing once the matching abbreviation is found in the map file.

Binary Hash RewriteMaps

For maximum speed you should convert your text map files into binary *DBM hash file, which is optimized for maximum lookup speed. Then the above RewriteMap line would look like this:RewriteMap abbr txt:/www/misc/redir/abbr_webref
Automating URL Abbreviation
The above URL abbreviation technique works well for URLs that don't change very often. But what about news or blog sites where URLs change every hour or every minute? You can create a shell script that automatically scans and abbreviates incoming URLs or use the free open source script available at WebReference.com (http://www.webreference.com/scripts/) that does just that.

Conclusion
Abbreviating URLs with mod_rewrite is one of the most effective techniques available to optimize HTML files. File size savings can range up to 20% to 30%, depending on the number of links in your HTML page. You can combine this technique with URL Rewriting with Content Negotiation for maximum savings. Best used on high traffic pages like home pages, automated URL abbreviation can squeeze more bytes out of critical pages for server-savvy developers.

Wednesday, June 22, 2005

Search Engine Optimization on Google

Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation
and declared war against search engine sp@m in a continuing effort to provide the best search service in the world...
and if you thought you cracked the Google Code and had Google all figured out - guess again.

Google has raised the bar against search engine sp@m and artificial link inflation to unrivaled heights with the
filing of a United States Patent Application 20050071741 on March 31, 2005.

The filing unquestionably provides SEO's with valuable insight into Google's tightly guarded search intelligence and
confirms that Google's information retrieval is based on historical data.

What exactly do these changes mean to you? Your credibility and reputation on-line are going under the Googlescope!
Google has defined their patent abstract as follows:


A system identifies a document and obtains one or more types of history data associated with the document. The system
may generate a score for the document based, at least in part, on the one or more types of history data.

Google's patent specification reveals a significant amount of information both old and new about the possible ways
Google can (and likely does) use your web page updates to determine the ranking of your site in the SERPs.

Unfortunately, the patent filing does not prioritize or conclusively confirm any specific method one way or the
other.

Here's How Google Scores Your Web Pages


In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by
the frequency of page or site updates. What's new and interesting is what Google takes into account in determining
the freshness of a web page.

For example, if a stale page continues to procure incoming links, it will still be considered fresh, even if the
page header (Last-Modified: tells when the file was most recently modified) hasn't changed and the content is not
updated or 'stale'.

According to their patent filing Google records and scores the following web page changes to determine freshness.


  • The frequency of all web page changes.
  • The actual amount of the change itself... whether it is a substantial change, redundant or superfluous.
  • Changes in keyword distribution or density.
  • The actual number of new web pages that link to a web page
  • The change or update of anchor text (the text that is used to link to a web page).
  • The numbers of new links to low trust web sites (for example, a domain may be considered low trust for having
    too many affiliate links on one web page).


Although there is no specific number of links indicated in the patent it might be advisable to limit affiliate
links on new web pages. Caution should also be used in linking to pages with multiple affiliate links.



Developing Your Web Page Augments for Page Freshness


I'm not suggesting that it's always beneficial or advisable to change the content of your web pages regularly, but it is very important to keep your pages fresh regularly and that may not necessarily mean a content change.

Google states that decayed or stale results might be desirable for information that doesn't necessarily need updating, while fresh content is good for results that require it.

How do you unravel that statement and differentiate between the two types of content?

An excellent example of this methodology is the roller coaster ride seasonal results might experience in Google's SERPs based on the actual season of the year.

A page related to winter clothing may rank higher in the winter than the summer... and the geographical area the end user is searching from will likely be considered and factored into the search results.

Likewise, specific vacation destinations might rank higher in the SERPs in certain geographic regions during specific seasons of the year. Google can monitor and score pages by recording click-through rate changes by season.

Google is no stranger to fighting sp@m and is taking serious new measures to crack down on offenders like nevër before. <

Section 0128 of Google's patent filing claims that you shouldn't change the focus of multiple pages at once.

Here's a quote from their rationale:



"A significant change over time in the set of topics associated with a document may indicate that the document
has changed owners and previous document indicators, such as score, anchor text, etc., are no longer reliable.


Similarly, a spike in the number of topics could indicate sp@m. For example, if a particular document is associated
with a set of one or more topics over what may be considered a 'stable' period of time and then a (sudden) spike
occurs in the number of topics associated with the document, this may be an indication that the document has been
taken over as a 'doorway' document.


Another indication may include the sudden disappearance of the original topics associated with the document. If
one or more of these situations are detected, then [Google] may reduce the relative score of such documents and/or
the links, anchor text, or other data associated the document."


Highlight from an article by Lawrence Deon for SEO NEWS

Search Engine Optimization & Google Bourbon Algorithm Update

Google is undergoing some of the most sweeping changes in its short, seven year history. As of next week, Google will have finished sorting what might be its largest algorithm shift ever as the final points of the 3.5 part Bourbon Update were installed last Monday. This update has been staggered into three and a half sections in order to avoid a massive amount of dislocation in established rankings as was seen in previous major updates. While changes stemming from the Bourbon Update have not actually manifested into a full reordering of Google's search engine results pages (SERPs), many individual webmasters have reported fairly significant losses or gains in ranking over the past few days.



For webmasters and SEOs, an examination of the new Google Webmaster Guidelines is a definite must. Google has recently changed its webmaster guidelines which are also considered to be a primer on "ethical SEO" practices in relation to Google placements. Google has recently updated its webmaster guidelines to include information on "supplemental listings", crawling frequencies and prefetching. Google has also posted information on its new Google Sitemaps experiment.



Google Sitemaps is perhaps the most important new feature for SEOs offered by Google in a long time. Said to be an experiment in spidering, Google Sitemaps invites webmasters to feed site data directly to Google through an XML sitemap page. Webmasters and SEOs can now tell Google exactly which sections of their sites to crawl, and providing they are keeping their XML sitemap current,
when and where to look for changes to their sites. This experimental initiative will especially help webmasters working with database driven sites or large Ecommerce sites where documents are subject to frequent change and are often found behind
long-string URLs. Google has been kind enough to provide detailed information on establishing an XML feed and setting priorities for Googlebot.



As it grows, Google appears to be running into the same problem other webmasters with numerous sites or services encounter, the rapid dilution of a domain's unique topic focus. In order to keep themselves accessible, understandable and relevant, Google's teams of engineers, programmers and public relations specialists are involved in what appears to be a massive overhaul of the
interface, public documents and the basic sorting algorithm that produces organic results. As in previous years, how this all plays out in the end is entirely up to the searching public. From the SEO/SEM perspective, it is a good thing Google is in the midst of this update. Web workers have been demanding a greater degree of transparency from Google for some time and perhaps these updates are the beginning of a new commitment to communication from the Googleplex.




Highlights from an article by Jim Hedger for Site Pro News

Welcome to the Search Engine Optimization & PPC Management Blog. It is the goal of this Blog to discuss issues concerning search engine optimization and PPC management. For those new to the field of internet marketing allow me to take a moment and define our two major internet marketing terms:

Search Engine Optimization (a.k.a. SEO) - Search engine optimization is the science, or some say voodoo, of marketing websites through organic search engine listings. Natural or Organic search engine listings are those free (non-paid)links which show up in a major search engines. Through research, mathematical analysis, and experience, seo specialists can dramatically increase natural search engine rankings. Improved search engine rankings lead to more traffic and brand exposure. A search engine optimization campaign can take months to complete, and also requires constant maintenance to compete for these highly sought search engine placements. As opposed to PPC management, search engine optimization projects typically have a high initial cost and a low long term costs.

Pay Per Click (a.k.a. PPC) Management - PPC Management is the creation and management of a Pay Per Click campaign. While there are many PPC engines, the two most important are Google AdWords and Yahoo Search Marketing (formerly Overture). Again through a combination of research, mathematical analysis, & experience, a PPC bid management specialist can vastly improve the Return on Investment ( a.k.a. ROI) seen from the pay per click campaign. PPC Management can also be used as a weapon against competition through such black arts as bid jamming and gap bidding, forcing the competition to spend more advertising budget for equal or less exposure. Typically PPC Management has a low initial cost to set up, but far higher daily costs over the length of the campaign.

A good internet marketing campaign will use both search engine optimization and PPC management in concert. Initially, PPC campaigns will have higher costs as exposure is needed on keywords until such time that natural rankings can be indexed and ranked. As natural search engine rankings are attained, the pay per click campaign can be scaled back. Ideally, a website would like to rank on page 1 for all its major keywords in Google, Yahoo, & MSN. These 3 search engines provide 90% or more of natural search engine traffic for most sites so it is vital to gain natural rankings in each search engine. Realistically, it is rarely a case where a website can rank naturally for all its keywords due to competition, so a PPC campaign is an excellent alternative for capturing traffic on keywords for which your website does not rank well naturally.

Future posts to the SEO-PPC Blog will focus on techniques, tips, and philosophies regarding search engine optimization and PPC bid management. It is my goal, and that of my company Black Ops Marketing, to take the mystery and rumors out of search engine optimization, PPC management, and other types of internet marketing services. We hope that by providing a forum for discussions about SEO & PPC, we can help educate business persons & students who are interested in marketing service for the 21st century. There is no debating that internet marketing is the cheapest and most effective form of advertising in the world today. It is simply a matter of how much time it will take your business to recognize that fact, and how many of your competitors already know it that will determine the fate of your e-commerce website.