Seo Analytics Blog Archive
Interchange Search Caching with "Permanent More"
Most sites that use Interchange take advantage of Interchange's "more lists". These are built-in tools that support an Interchange "search" (either the search/scan action, or result of direct SQL via [query]) to make it very easy to paginate results. Under the hood, the more list is a drill-in to a cached "search object", so each page brings back a slice from the cache of the original search. There are extensive ways to modify the look and behavior of more lists and, with a bit of effort, they can be configured to meet design requirements.
Where more lists tend to fall short, however, is with respect to SEO. There are two primary SEO deficiencies that get business stakeholders' attention:
- There is little control over the construction of the URLs for more lists. They leverage the scan actionmap and contain a hash key for the search object and numeric data to identify the slice and page location. They possess no intrinsic value in identifying the content they reference.
- The search cache by default is ephemeral and session-specific. This means all those results beyond page 1 the search engine has cataloged will result in dead links for search users who try to land directly on the more-listed pages.
It is the latter issue that I wish to address because there is--and has been for some time now--a simple mechanism called "permanent more" to remedy the default behavior.
You can leverage "permanent more" by adding the boolean mv_more_permanent, or the shorthand pm, to your search conditions. E.g.:
Link:
<a href="[area search="
co=1
sf=category
se=Foo
op=rm
more=1
ml=5
pm=1
"]">All Foos</a>
Loop:
[loop search="
co=1
sf=category
se=Foo
op=rm
more=1
ml=5
pm=1
"]
...loop body with [more-list]...
[/loop]
Query:
[query
list=1
more=1
ml=10
pm=1
sql="SELECT * FROM products WHERE category LIKE '%Foo%'"
]
...same as loop but with 10 matches/page...
[/query]
If the initial search is defined with the "permanent more" setting, it will produce the following adjustments:
- The hash key used to store and identify the search cache is deterministic based on the search conditions. Many searches for Interchange are category driven. Thus, all end users who wish to browse a category end up clicking identical links, which create duplicate search caches, belonging uniquely to them. With permanent more, they all share the same cache, with the same identifier. As long as the search conditions don't change, neither does the cache identifier. Even as the cache is refreshed with new executions of the search, the object remains in the same location. Thus, the results a search engine produced this morning reference links still valid now, tomorrow, or next week, provided they reference the same search conditions.
- The cached search object has no session affinity. Any link referencing the cache with the correct hash key has access to the content.
Taken together, "permanent more" removes (for the most part, addressed later) dead links from more lists cataloged by search engines. There are, however, other benefits to "permanent more" beyond those intended as described above:
- As stated in passing, standard Interchange search caching produces duplicate search objects for common search conditions. For a busy site, these caches can have an impact on storage. Typically, maintenance is implemented to clean up cache files for all such files whose age exceeds by some amount the session duration (standard is 48 hours). With permanent more, duplicate caches are eliminated. A cache location is reused by all users with the same search requirements, keeping data-storage requirements for caches to the minimum necessary. As searches change, ophaned caches can still easily be cleaned up as they will immediately start to age with no more access to them necessary for storage.
- For the same reason that "permanent more" resolves search-engine links, it also resolves content management for individual sites using a reverse proxy for caching. Because most (and certainly the easiest) caching keys are based off of URL, the deterministic nature of the hash keys for "permanent more" allows assurance that the cached content in the proxy accurately reflects the search content over time, and that all users will hit the cached resource and not generate new, unique links with varying hash keys.
One shortcoming of "permanent more" to be aware of is the impact of changing data underneath the search. Even if search conditions do not change, the count and order of matching record sets may. So, e.g., enough products may be removed from a given category to cause the last page of a more list to become empty, which would cause any specific link into that page to become dead. More minor, but still a possibility, is the introduction or removal of products so that a particularly searched-for term has been "bumped" to another page within the search cache since the last time the search engine crawled the more lists. For searches backed by particularly volatile data, "permanent more" may not be sufficient to address search-engine or caching demands.
Finally, "permanent more" should be avoided for any search features that may cache data sensitive to an individual user. This is unlikely to happen as, under most circumstances, the configuration of the search itself will change based on the unique characteristics of the user executing the search (e.g., a username included in a query to review order history). However, it is still possible that context-sensitive information could be stored in the search object and, if so, all other users with access to the more lists would have access to that information.
SEO friendly redirects in Interchange
In the past, I've had a few Interchange clients that would like the ability to be able to have their site do a SEO friendly 301 redirect to a new page for different reasons. It could be because either a product had gone out of stock and wasn't going to return or they completely reworked their url structures to be more SEO friendly and wanted the link juice to transfer to the new URLs. The normal way to handle this kind of request is to set up a bunch of Apache rewrite rules.
There were a few issues with going that route. The main issue is that to add or remove rules would mean that we would have to restart or reload Apache every time a change was made. The clients don't normally have the access to do this so it meant they would have to contact me to do it. Another issue was that they also don't have the access to modify the Apache virtual host file to add and remove rules so again, they would have to contact me to do it. To avoid the editing issue, we could have put the rules in a .htaccess file and allow them to modify it that way, but this can present its own challenges because some text editors and FTP clients don't handle hidden files very well. The other issue is that even though overall basic rewrite rules are pretty easy to copy, paste and reuse, they still can have nasty side effects if not done properly and can also be difficult to troubleshoot so I devised a way to allow them to be able to manage their 301 redirects using a simple database table and Interchange's Autoload directive.
The database table is a very simple table with two fields. I called them old_url and new_url with the primary key being old_url. The Autoload directive accepts a list of subroutines as its arguments so this requires us to create two different GlobalSubs. One to actually do the redirect and one to check the database and see if we need to redirect. The redirect sub is really straight forward and looks like this:
sub redirect {
my ($url, $status) = @_;
$status ||= 302;
$Vend::StatusLine = qq|Status: $status moved\nLocation: $url\n|;
$::Pragma->{download} = 1;
my $body = '';
::response($body);
$Vend::Sent = 1;
return 1;
}
The code for the sub that checks to see if we need to redirect looks like this:
sub redirect_old_links {
my $db = Vend::Data::database_exists_ref('page_redirects');
my $dbh = $db->dbh();
my $current_url = $::Tag->env({ arg => "REQUEST_URI" });
my $normal_server = $::Variable->{NORMAL_SERVER};
if ( ! exists $::Scratch->{redirects} ) {
my $sth = $dbh->prepare(q{select * from page_redirects});
my $rc = $sth->execute();
while ( my ($old,$new) = $sth->fetchrow_array() ) {
$::Scratch->{redirects}{"$old"} = $new;
}
$sth->finish();
}
if ( exists $::Scratch->{redirects} ) {
if ( exists $::Scratch->{redirects}{"$current_url"} ) {
my $path = $normal_server.$::Scratch->{redirects}{"$current_url"};
my $Sub = Vend::Subs->new;
$Sub->redirect($path, '301');
return;
} else {
return;
}
}
}
We normally create these as two different files and put them into our own directory structure under the Interchange directory called custom/GlobalSub and then add this, include custom/GlobalSub/*.sub, to the interchange.cfg file to make sure they get loaded when Interchange restarts. After those files are loaded, you'll need to tell the catalog that you want it to Autoload this subroutine and to do that you use the Autoload directive in your catalog.cfg file like this:
Autoload redirect_old_links
After modifying your catalog.cfg file, you will need to reload your catalog to ensure to change takes effect. Once these things are in place, you should just be able to add data into the page_redirects table and start a new session and it will redirect you properly. When I was working on the system, I just created an entry that redirected /cgi-bin/vlink/redirect_test.html to /cgi-bin/vlink/index.html so I could ensure that it was redirecting me properly.
More Code and SEO with the Google Analytics API
My latest blog article inspiration came from an SEOmoz pro webinar on Actionable Analytics. This time around, I wrote the article and it was published on SEOmoz's YOUmoz Blog and I thought I'd summarize and extend the article here with some technical details more appealing to our audience. The article is titled Visualizing Keyword Data with the Google Analytics API.
In the article, I discuss and show examples of how the number of unique keywords receiving search traffic has diversified or expanded over time and that our SEO efforts (including writing blog articles) are likely resulting in this diversification of keywords. Some snapshots from the articles:
The unique keyword (keywords receiving at least one search visit)
count per month (top) compared to the number of articles available
on our blog at that time (bottom).
I also briefly examined how unique keywords receiving at least one visit overlapped between each month and saw about 10-20% of overlapping keywords (likely the short-tail of SEO).

The keyword overlap per month, where the keywords receiving at least
one visit in consecutive months are shown in the overlap section.
Now, on to things that End Point's audience may find more interesting. Something that might appeal more to our developer-types is the code written to use the Google Analytics API to generate the data used for this article. I researched a bit and tried writing my own ruby code (gem-less) to pull from the Google API, followed by using the Gattica gem, and finally the garb gem. After wrestling with the former two options, I settled on the garb gem, which had decent documentation here to get me up and running with a Google Analytics report quickly. Here's an example of the code required to create your first Google Analytics API report:
#!/usr/bin/ruby
require 'rubygems'
require 'garb'
# set email, password, profile_id
Garb::Session.login(email, password)
profile = Garb::Profile.first(profile_id)
report = Garb::Report.new(profile,
:limit => 100,
:start_date => Date.today - 30,
:end_date => Date.today)
report.dimensions :keyword
report.metrics :visits
report.results.each do |result|
puts "#{result.keyword}:#{result.visits}"
end
If you aren't familiar with the Google Analtyics API, possible dimensions and metrics are documented here. There are some Google Analytics API limitations on metric and dimension combinations, but I think if you get creative you'd be able to overcome most of those limitations (assuming you won't be exceeding the limit of 1,000 API requests per day).
Why should you care about the Google Analytics API? Well, the API allowed me to programmatically aggregate the keyword counts in monthly increments for the SEOmoz article. One thing I consider to be pretty lame is the inability to select more than 3 custom segments and exclude the "All Visits" segment to allow a better visual comparison of the segments. In the data below, I have 3 defined custom segments. I would prefer to compare about 10 custom segments of End Point's blog keyword groupings (e.g., "Rails Keywords", "Postgres Keywords"), but Google Analytics limits the selected segments and includes "All Visits" when you select more than one custom segment.
Another thing I consider to be lame is the inability to merge Google Analytics profiles. Recently, End Point combined its corporate blog GA profile with its main website GA profile to better track conversion between the sites:
Dead metrics from migrated profile.
With the Google Analytics API, we could compute different aggregates of data, compare more than a few custom data segments, and combine two google profiles if they have merged. Of course, these things wouldn't necessarily be easy, but working with the gem proved to be simple, so in theory this all could be done and in the meantime we'll keep our dead profile around.
Again, please read the original article here if you are interested :)
Blog versus Forum, Blogger versus WordPress in Ecommerce
Today, Chris sent me an email with two questions for one of our ecommerce clients:
- For ecommerce client A, should a forum or blog be added?
- For ecommerce client A, should the client use Blogger or WordPress if they add a blog?
These are relevant questions to all of our clients because forums and blogs can provide value to a static site or ecommerce site. I answered Chris' question and thought I'd expand on it a bit for a brief article.
First, a rundown comparing the pros and cons of blog versus forum:
| Blog | Forum | |
| Pros |
|
|
| Cons |
|
|
If we assume that it takes the same amount of effort to write articles as it does to manage user generated content, the decision comes down to whether or not you want to utilize user contributions as part of the content. If the effort involved to write content or manage user generated content is different, a decision should be made based on how much effort the site owners want to make. Other opportunities for user generated content include product reviews and user QnA.
Next, a rundown comparing the pros and cons of Blogger versus self-hosted WordPress:
![]() |
![]() | |
| Pros |
|
|
| Cons |
|
|
The decision to create a Blogger blog or install a WordPress blog will depend on resources such as engineering or designer involvement. A self-hosted blog solution will likely provide a larger feature set and more flexibility, but it also requires more time to enhance, manage and maintain the software. A hosted blog solution such as Blogger will be easy to set up and maintain, but has disadvantages because it is a less flexible solution. I didn't discuss a WordPress-hosted solution because I'm not very familiar with this type of setup, however, I believe the WordPress-hosted solution limits the use of plugins and themes.
For our ecommerce clients, installing a self-hosted WordPress instance on top of their Spree or Interchange ecommerce site has been relatively simple. For another one of our clients, we developed a Radiant plugin to integrate Blogger article links into their site, which has worked well to fit their needs.
SEO 2010 Trends and Strategies
Yesterday I attended SEOmoz's webinar titled "SEO Strategies for 2010". Some interesting factoids, comments and resources for SEO in 2010 were presented that I thought I'd highlight:
- Mobile browser search
- Mobile search and ecommerce will be a large area of growth in 2010.
- Google Webmaster Tools allows you to submit mobile sitemaps, which can help battle duplicate content between non-mobile and mobile versions of site content. Another way to handle duplicate content would be to write semantic HTML that allows sites to serve non-mobile and mobile CSS.
- Social Media: Real Time Search
- Real time search marked its presence in 2009. The involvement of Twitter in search is evolving.
- Tracking and monitoring on URL shortening services should be set up to measure traffic and benefit from Twitter.
- Dan Zarrella published research on The Science of Retweeting. This is an interesting resource with fascinating statistics on retweets.
- Social Media: Facebook's Dominance
- Recent research by comScore has shown that 5.5% of all time on the web is spent in Facebook.
- Facebook has very affordable advertising. Facebook has so much demographic and psychographic data that allows sites to deliver very targeted advertisements.
- Facebook shouldn't be ignored as a potential business network, but metrics should be put in place to determine the value it brings.
- Social Media: Shifting LinkGraph
- In the past, sites received links from blog resources which became a factor in the site's popularity rankings in search. Now, linking has shifted to microblogging such as twitter or other social media platforms. Some folks are stingier about passing links through sites rather than social media. It's interesting to observe how links and information is passed through the web and consider how this can affect search.
- Bing
- Despite the fact that Google is responsible for a large percentage of search, Bing shouldn't be ignored.
- Bing has shown some differences in ranking such as being less sensitive to TLDs (.info, .cc, .net, etc.), and giving more weight to sites with keywords in the domain than other search engines.
- Other
- Personalized search is on the rise. This is something to pay attention to, but hard to measure.
- QDF (query deserves freshness), a search factor related to the freshness of content, has led to search engines indexing content faster. 2010 search strategies recommend becoming a news source to improve search performance.
- Local search is definitely something to be aware of in 2010. Google's Place Rank algorithm is similar to the PageRank algorithm - it looks at specific location or local attributes as a factor in local search.
I found that a trend of the discussion revolved around having good metrics, not just good metrics, but the right metrics such as conversion and engagement. Testing any of the recommendations above (improving your mobile browsing, getting involved in social media, optimizing for Bing) should be measured against conversion to determine the value of the efforts. Also, multivariate or A/B testing were recommended for testing local search optimization and other efforts.
Content Syndication, SEO, and the rel canonical Tag
End Point Blog Content Syndication
The past couple weeks, I've been discussing if content syndication of our blog negatively affects our search traffic with Jon. Since the blog's inception, full articles have been syndicated by OSNews. The last couple weeks, I've been keeping an eye on the effects of content syndication on search to determine what (if any) negative effects we experience.
By my observations, immediately after we publish an article, the article is indexed by Google and is near the top search results for a search with keywords similar to the article's title. The next day, OSNews syndication of the article shows up in the same keyword search, and our article disappears from the search results. Then, several days later, our article is ahead of OSNews as if Google's algorithm has determined the original source of the content. I've provided visual representation of this behavior:
With content syndication of our blog articles, there is a several day lag where Google treats our blog article as the duplicate content and returns the OSNews article in search results for a search similar to our the blog article's title. After this lag time, the OSNews article is treated as duplicate content and our article is shown in the search results.
During the lag time, a search for "google pages indexed seo", an article I published last Thursday, the OSNews article is shown at search position #5.
After the lag time, a search for "google pages indexed seo" returned the original End Point blog article to search position #2.
Several other factors have influenced the lag time, but typically I've seen very similar behavior.
End Point's content syndication has only been an issue with blog articles, since the majority of our new content comes in the form of blog articles. Examples of content syndication in the ecommerce space may include:
- inner-company content syndication of products across sister sites. For example, our client Backcountry.com sells outdoor gear, while their site RealCyclist targets the road biking niche of the outdoor gear industry. Cycling products sold on both sites and may compete directly for search engine traffic.
- syndication of product information through affiliate programs like Commission Junction and AvantLink. Affiliates are paid a small portion of the sales and may target traffic by building supplementary content or communities around content provided by ecommerce sites through the affiliate program.
Cross-Domain rel=canonical Tag
I've been planning to write this article and with impeccable timing, Google announced support for the rel=canonical tag across different domains this week. I've referenced the use of the rel=canonical tag in two articles (PubCon 2009 Takeaways, Search Engine Thoughts), but I haven't gone too much into depth about its use. Support of the rel=canonical tag was introduced early this year as a method to help decrease duplicate content across a single domain. A non-canonical URL that includes this tag suggests its canonical URL to search engines. Search engines then use this suggestion in their algorithms and results to reduce the effects of duplicate content.
<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />
With the cross-domain rel=canonical support announcement, the rel=canonical tag presents another tool to battle duplicate content from content syndication across domains.
Back to Content Syndication
The point of my investigation was to identify whether or not content syndication to OSNews negatively affects our search traffic. The data above suggests that after the brief lag time, Google's algorithm sorts out the source of the original content. The value of exposure, referral traffic, and link juice from OSNews outweighs lost search traffic during this lag time.
In the example of similar product content across backcountry.com's sites, using the rel=canonical tag across domains would allow backcountry.com to suggest prioritization of same product URLs for search results. This may be a valuable tool for directing search traffic to the desired domain.
In the example of content syndication across sites that are not owned by the same company, the use of the rel=canonical tag is more complicated. If the goals of the site that grabs content are to compete directly for search traffic, they would likely not want to use the canonical tag. However, if the goal of the site that grabs content is to focus on search traffic from aggregate content or by building a community around the valuable content, they may be more willing to implement the cross-domain rel=canonical tag to point to the original source of the content. In the case of affiliate programs, I believe it will be difficult to negotiate the cross-domain rel=canonical tag use into existing or future contracts.
The takeaways:
- Content syndication of our blog does not cause negative long term effects on search. This should be monitored for sites that may have much different behavior than the data I provided above.
- The announcement of support of the cross-domain rel=canonical tag may be helpful for battling duplicate content across sites, especially to sites owned by the same company.
- The use of the cross-domain rel=canonical tag in affiliate programs or through sites owned by different companies will be trickier to negotiate.
List Google Pages Indexed for SEO: Two Step How To
Whenever I work on SEO reports, I often start by looking at pages indexed in Google. I just want a simple list of the URLs indexed by the *GOOG*. I usually use this list to get a general idea of navigation, look for duplicate content, and examine initial counts of different types of pages indexed.
Yesterday, I finally got around to figuring out a command line solution to generate this desired indexation list. Here's how to use the command line using http://www.endpoint.com/ as an example:
Step 1
Grab the search results using the "site:" operator and make sure you run an advanced search that shows 100 results. The URL will look something like:
http://www.google.com/search?num=100&as_sitesearch=www.endpoint.com
But it will likely have lots of other query parameters of lesser importance [to us]. Save the search results page as search.html.
Step 2
Run the following command:
sed 's/<h3 class="r">/\n/g; s/class="l"/LINK\n/g' search.html | grep LINK | sed 's/<a href="\|" LINK//g'
There you have it. Interestingly enough, the order of pages can be an indicator of which pages rank well. Typically, pages with higher PageRank will be near the top, although I have seen some strange exceptions. End Point's indexed pages:
http://www.endpoint.com/ http://www.endpoint.com/clients http://www.endpoint.com/team http://www.endpoint.com/services http://www.endpoint.com/sitemap http://www.endpoint.com/contact http://www.endpoint.com/team/selena_deckelmann http://www.endpoint.com/team/josh_tolley http://www.endpoint.com/team/steph_powell http://www.endpoint.com/team/ethan_rowe http://www.endpoint.com/team/greg_sabino_mullane http://www.endpoint.com/team/mark_johnson http://www.endpoint.com/team/jeff_boes http://www.endpoint.com/team/ron_phipps http://www.endpoint.com/team/david_christensen http://www.endpoint.com/team/carl_bailey http://www.endpoint.com/services/spree ...
For the site I examined yesterday, I saved the pages as one.html, two.html, three.html and four.html because the site had about 350 results. I wrote a simple script to concatenate all the results:
#!/bin/bash
rm results.txt
for ARG in $*
do
sed 's/<h3 class="r">/\n/g; s/class="l"/LINK\n/g' $ARG | grep LINK | sed 's/<a href="\|" LINK//g' >> results.txt
done
And I called the script above with:
./list_google_index.sh one.html two.html three.html four.html
This solution isn't scalable nor is it particularly elegant. But it's good for a quick and dirty list of pages indexed by the *GOOG*. I've worked with the WWW::Google::PageRank module before and there are restrictions on API request limits and frequency, so I would highly advise against writing a script that makes requests to Google repeatedly. I'll likely use the script described above for sites with less than 1000 pages indexed. There may be other solutions out there to list pages indexed by Google, but as I said, I was going for a quick and dirty approach.
Remember not to get eaten by the Google Monster
Learn more about End Point's technical SEO services.
WordPress Plugin for Omniture SiteCatalyst
A couple of months ago, I integrated Omniture SiteCatalyst into an Interchange site for one of End Point's clients, CityPass. Shortly after, the client added a blog to their site, which is a standalone WordPress instance that runs separately from the Interchange ecommerce application. I was asked to add SiteCatalyst tracking to the blog.
I've had some experience with WordPress plugin development, and I thought this was a great opportunity to develop a plugin to abstract the SiteCatalyst code from the WordPress theme. I was surprised that there were limited Omniture WordPress plugins available, so I'd like to share my experiences through a brief tutorial for building a WordPress plugin to integrate Omniture SiteCatalyst.
First, I created the base wordpress file to append the code near the footer of the wordpress theme. This file must live in the ~/wp-content/plugins/ directory. I named the file omniture.php.
<?php /*
Plugin Name: SiteCatalyst for WordPress
Plugin URI: http:www.endpoint.com/
Version: 1.0
Author: Steph Powell
*/
function omniture_tag() {
}
add_action('wp_footer', 'omniture_tag');
?>
In the code above, the wp_footer is a specific WordPress hook that runs just before the </body> tag. Next, I added the base Omniture code inside the omniture_tag function:
...
function omniture_tag() {
?>
<script type="text/javascript">
<!-- var s_account = 'omniture_account_id'; -->
</script>
<script type="text/javascript" src="/path/to/s_code.js"></script>
<script type="text/javascript"><!--
s.pageName='' //page name
s.channel='' //channel
s.pageType='' //page type
s.prop1='' //traffic variable 1
s.prop2='' //traffic variable 2
s.prop3='' //traffic variable 3
s.prop4= '' //traffic variable 4
s.prop5= '' //traffic variable 5
s.campaign= '' //campaign variable
s.state= '' //user state
s.zip= '' //user zip
s.events= '' //user events
s.products= '' //user products
s.purchaseID= '' //purchase ID
s.eVar1= '' //conversion variable 1
s.eVar2= '' //conversion variable 2
s.eVar3= '' //conversion variable 3
s.eVar4= '' //conversion variable 4
s.eVar5= '' //conversion variable 5
/************* DO NOT ALTER ANYTHING BELOW THIS LINE ! **************/
var s_code=s.t();if(s_code)document.write(s_code)
--></script>
<?php
}
...
To test the footer hook, I activated the plugin in the WordPress admin. A blog refresh should yield the Omniture code (with no variables defined) near the </body> tag of the source code.
After verifying that the code was correctly appended near the footer in the source code, I determined how to track the WordPress traffic in SiteCatalyst. For our client, the traffic was to be divided into the home page, static page, articles, tag pages, category pages and archive pages. The Omniture variables pageName, channel, pageType, prop1, prop2, and prop3 were modified to track these pages. Existing WordPress functions is_home, is_page, is_single, is_category, is_tag, is_month, the_title, get_the_category, the_title, single_cat_title, single_tag_title, the_date were used.
...
<script type="text/javascript"><!--
<?php
if(is_home()) { //WordPress functionality to check if page is home page
$pageName = $channel = $pageType = $prop1 = 'Blog Home';
} elseif (is_page()) { //WordPress functionality to check if page is static page
$pageName = $channel = the_title('', '', false);
$pageType = $prop1 = 'Static Page';
} elseif (is_single()) { //WordPress functionality to check if page is article
$categories = get_the_category();
$pageName = $prop2 = the_title('', '', false);
$channel = $categories[0]->name;
$pageType = $prop1 = 'Article';
} elseif (is_category()) { //WordPress functionality to check if page is category page
$pageName = $channel = single_cat_title('', false);
$pageName = 'Category: ' . $pageName;
$pageType = $prop1 = 'Category';
} elseif (is_tag()) { //WordPress functionality to check if page is tag page
$pageName = $channel = single_tag_title('', false);
$pageType = $prop1 = 'Tag';
} elseif (is_month()) { //WordPress functionality to check if page is month page
list($month, $year) = split(' ', the_date('F Y', '', '', false));
$pageName = 'Month Archive: ' . $month . ' ' . $year;
$channel = $pageType = $prop1 = 'Month Archive';
$prop2 = $year;
$prop3 = $month;
}
echo "s.pageName = '$pageName' //page name\n";
echo "s.channel = '$channel' //channel\n";
echo "s.pageType = '$pageType' //page type\n";
echo "s.prop1 = '$prop1' //traffic variable 1\n";
echo "s.prop2 = '$prop2' //traffic variable 2\n";
echo "s.prop3 = '$prop3' //traffic variable 3\n";
?>
s.prop4 = '' //traffic variable 4
...
The plugin allows you to freely switch between WordPress themes without having to manage the SiteCatalyst code and to track the basic WordPress page hierarchy. Here are example outputs of the SiteCatalyst variables broken down by page type:
Homepage
s.pageName = 'Blog Home' //page name s.channel = 'Blog Home' //channel s.pageType = 'Blog Home' //page type s.prop1 = 'Blog Home' //traffic variable 1 s.prop2 = '' //traffic variable 2 s.prop3 = '' //traffic variable 3
Tag Page
s.pageName = 'chocolate' //page name s.channel = 'chocolate' //channel s.pageType = 'Tag' //page type s.prop1 = 'Tag' //traffic variable 1 s.prop2 = '' //traffic variable 2 s.prop3 = '' //traffic variable 3
Category Page
s.pageName = 'Category: Food' //page name s.channel = 'Food' //channel s.pageType = 'Category' //page type s.prop1 = 'Category' //traffic variable 1 s.prop2 = '' //traffic variable 2 s.prop3 = '' //traffic variable 3
Static Page
s.pageName = 'About' //page name s.channel = 'About' //channel s.pageType = 'Static Page' //page type s.prop1 = 'Static Page' //traffic variable 1 s.prop2 = '' //traffic variable 2 s.prop3 = '' //traffic variable 3
Archive
s.pageName = 'Month Archive: November 2009' //page name s.channel = 'Month Archive' //channel s.pageType = 'Month Archive' //page type s.prop1 = 'Month Archive' //traffic variable 1 s.prop2 = '2009' //traffic variable 2 s.prop3 = 'November' //traffic variable 3
Article
s.pageName = 'Hello world!' //page name s.channel = 'Test Category' //channel s.pageType = 'Article' //page type s.prop1 = 'Article' //traffic variable 1 s.prop2 = 'Hello world!' //traffic variable 2 s.prop3 = '' //traffic variable 3
A followup step to this plugin would be to use the wp_options table in WordPress to manage the Omniture account id, which would allow admin to set the Omniture account id through the WordPress admin without editing the plugin code. I've uploaded the plugin to a github repository here.
Learn more about End Point's analytics services.
Update: This plugin is included in the WordPress plugin registry and can be found at http://wordpress.org/extend/plugins/omniture-sitecatalyst-tracking/.PubCon Vegas: 7 Takeaway Nuggets
I'm back at work after last week's PubCon Vegas. I published several articles about specific sessions, but I wanted to provide some nuggets on recurring themes of the conference.
Google Caffeine Update
This year Google rolled out some changes referred to as the Google Caffeine update. This change increases the speed and size of the index, moves Google search to real-time, and improves search results relevancy and accuracy. It was a popular topic at the conference, however, not much light was shed on how algorithm changes would affect your search results, if at all. I'll have to keep an eye on this to see if there are any significant changes in End Point's search performance.
Bing
Bing is gaining traction. They want to get [at least] 51% of the search market share.
Social media
Social media was a hot topic at the conference. An entire track was allocated to Twitter topics on the first day of the conference. However, it still pales in comparison to search. Of all referrals on the web, search still accounts for 98% and social media referrals only account for less than 1% (view referral data here). Dr. Pete from SEOmoz nicely summarized the elephant in the room at PubCon regarding social media that it's important to measure social media response to determine if it provides business value.
Ecommerce Advice
I asked Rob Snell, author of Starting a Yahoo Business for Dummies, for the most important advice for ecommerce SEO he could provide. He explained the importance of content development and link building to target keywords based on keyword conversion. Basically, SEO efforts shouldn't be wasted on keywords that don't convert well. I typically don't have access to client keyword conversion data, but this is great advice.
Internal SEO Processes
Another recurring topic I observed at PubCon was that often internal SEO processes are a much bigger obstacle than the actual SEO work. It's important to get the entire team on your side. Alex Bennert of Wall Street Journal discussed understanding your audience when presenting SEO. Here are some examples of appropriate topics for a given audience:
- IT Folks: sitemaps, duplicate content (parameter issues, pagination, sorting, crawl allocation, dev servers), canonical link elements, 301 redirects, intuitive link structure
- Biz Dev & Marketing Folks: syndication of content, evaluation of vendor products & integration, assessing SEO value and link equity of partner sites, microsites, leveraging multiple assets
- Content Developers: on page elements best practices, linking, anchor text best practices, keyword research, keyword trends, analytics
- Management: progress, timelines, roadmaps
On the topic of internal processes, I was entertained by the various comments expressing the developer-marketer relationship, for example:
- "Don't ever let a developer control your URL structure."
- "Don't ever let a developer control your site architecture."
- "This site looks like it was designed by a developer."
Apparently developers are the most obvious scapegoat. Back to the point, though: It often requires more effort to get SEO understanding and support than actually explaining what needs to be done.
Search Engine Spam
Search engine spam detection is cool. During a couple of sessions with Matt Cutts, I became interested in writing code to detect search spam. For example:
- Crawling the web to detect links where the anchor text is '.'.
- Crawling the web to identify sites where robots.txt blocks ia_archiver.
- Crawling the web to detect pages with keyword stuffing.
I've typically been involved in the technical side of SEO (duplicate content, indexation, crawlability), and haven't been involved in link building or content development, but these discussions provoked me to start looking at search spam from an engineer's perspective.
Google Parameter Masking
Apparently I missed the announcement of parameter masking in Google Webmaster Tools. I've helped battle duplicate content for several clients, and at PubCon I heard about parameter masking provided in Google Webmaster Tools. This functionality was announced in October of 2009 and allows you to provide suggestions to the crawler to ignore specific query parameters.
Parameter masking is yet another solution to managing duplicate content in addition to the rel="canonical" tag, creative uses of robots.txt, and the nofollow tag. The ideal solution for SEO would be to build a site architecture that doesn't require the use of any of these solutions. However, as developers we have all experienced how legacy code persists and sometimes a low effort-high return solution is the best short term option.
Learn more about End Point's technical SEO services.
PubCon Vegas Day 3: User Generated Content
On day 3 of PubCon Vegas, a great session I attended was Optimizing Forums For Search & Dealing with User Generated Content with Dustin Woodard, Lawrence Coburn, and Roger Dooley. User generated content is content generated by users in the form of message boards, customizable profiles, forums, reviews, wikis, blogs, article submission, question and answer, video media, or social networks.
Some good statistics were presented about why to tap into user generated content. Nielsen research recently released showed that 1 out of every 11 minutes spent online is on a social network and 2/3rds of customer "touch points" are user-generated.
Dustin provided some interesting details about long tail traffic. He looked at HitWise's data of the top 10,000 search terms for a 3 month period. The top 100 terms accounted for 5.7% of all traffic, the top 1000 terms accounted for 10.6% of all traffic, and the entire 10,000 data set accounted for just 18.5% of all traffic. With this data, representing the long tail would be analogous to a lizard with a one inch head and a tail that was 221 miles long that represents the long tail traffic.
Dustin gave the following steps for developing a user generated content community:
- Seed it with a few editors and really good initial content.
- Give them a voice.
- Make it easy to contribute.
- Make it cool or trendy.
- Provide ownership.
- Create competition with contests, ranking or by highlighting expertise.
- Build a sense of community or a sense of exclusivity.
- Give the people community a purpose.
All SEO best practices apply to a user generated content, but throughout the session, I learned several specific user generated content tips:
- Predefining keyword rich categories, topics and tags will go a long way with optimization. The better structure for topics that is created up front, the better the user generated content can content in the long run. Users are not inherently good at content organization, so content can be easily buried with poor information architecture.
- Developing automated cross-linking between user generated content helps improve authority, build clusters of content, and enrich the internal link structure. Dustin had experience with building widgets to automatically links to 5 pieces of user generated content and another widget to allow the user to select several pieces of user generated content from a set of related content.
- Examples of battling duplicate content include disallowing duplicate page titles and meta descriptions. Content that is moved, renamed or deleted should be managed well.
- Finally, building a badge or widget to display user involvement helps increase external linking to your site, but this should be carefully managed to avoid appearing spammy. Widget best practices are that the widget should have excellent accessibility, widgets should be simple with light branding and always have fresh content.
- Developing your own tiny URL helps pass and keep intact external links to your site with user generated content. Lawrence suggested to "gently tweet" user generated content that is the highest quality.
Several of End Point's clients are either in the middle of or considering building a community with user generated content. In ecommerce, blogs, forums, reviews, and Q&A are the most prevalent types of user generated content that I've encountered. Many of the things mentioned in this session were good tips to consider throughout the development of user generated content for ecommerce.
Learn more about End Point's technical SEO services.
PubCon Vegas Day 2: International and Mega Site SEO, and Tools for SEO
On the second day of PubCon Vegas, I attended several SEO track sessions including "SEO for Ecommerce", "International and European Site Optimization", "Mega Site SEO", and "SEO/SEM Tools". A mini-summary of several of the sessions is presented below.
Derrick Wheeler from Microsoft.com spoke on Mega Site SEO about "taming the beast". Microsoft has 1.2 billion URLs that are comprised of thousands of web properties. For mega site SEO, Derrick highlighted:
- Content is NOT king. Structure is! Content is like the princess-in-waiting after structure has been mastered.
- Developing an overall SEO approach and organization to getting structure, content, and authority SEO completed is more valuable or relevant to the actual SEO work. This was a common theme among many of the presentations at PubCon.
- Getting metrics set up at the beginning of SEO work is a very important step to measure and justify progress.
- Don't be afraid to say no to low priority items.
Most developers deal with a large amount of legacy code. Derrick discussed primary issues when working with legacy problems:
- Duplicate and undesirable pages. For Microsoft.com, managing and dealing with 1.2 billion pages results in a lot of duplicate and undesirable pages from the past.
- Multiple redirects.
- Improper error handling (error handling on 404s or 500s).
- International URL structure can be a problem for international sites. Having an appropriate TLD (top level domain) is the best solution, but if that's not possible, a process should be implemented to regulate the international urls.
- Low Quality Page Titles and Meta Tags. For large sites with hundreds of thousands of pages, it's really important to have unique page titles and meta descriptions or to have a template that forces uniqueness.
In summary, structure and internal processes are areas to focus on for Mega Site SEO. Legacy problems are something to be aware of when you have a site so large where changes won't be implemented as quickly as small site changes.
In International and European Search Management, Michael Bonfils, Nelson James, and Andy Atkins-Krueger discussed international SEO and SEM tactics. Takeaways include:
- In terms of international search marketing, it's important to incorporate culture into search optimization and marketing. If it works in one country, it may not work in another country and so don't offend a culture by not understanding it. Some examples of content differences for targeting different cultures include emphasizing price points, focusing on product quality, and asserting authority or trust on a site.
- It's also important to understand how linguistics affects your keyword marketing. Automatic translation should not be used (all the speakers mentioned this). A good example of linguistics and search targeting is the use of the search term "soccer cleats", or "football boots". In England, the term "football boot" has a very small portion of the traffic share, but singular terms in other languages ("scarpe de calcio", "botas de futbal") have a much larger percentage of the search market share. Andy shared many other examples of how direct translation would not be the best keywords to target ("car insurance", "healthcare", "30% off", "cheap flights").
- Local hosting is important for metrics, linking, and to develop trust. Nelson James shared research that shows that 80% of the top 10 results of the top 30 keywords in china had a '.cn' top level domain, but the other top sites that were '.com' sites are all hosted in china.
- Other technical areas for international search that were mentioned are using the meta language tag, pinyin, charset, and language set. Duplicate content also will become a problem across sites of the same language.
- It's important to understand the search market share. In Russia, Google shares 35% of the search market and Yandexx has 54%. In China, Baidu has 76% and Google has 22%. There are some reasons that explain these market share differences. Yandexx was written to manage the large Russian vocabulary that Google does not handle as well. Baidu handles search for media better than Google and search traffic in China is much more entertainment driven rather than business driven in the US.
In the last session of the day, about 100 tools were discussed in SEO/SEM Tools. I'm planning on writing another blog post with a summary of these tools, but here's a short list of the tools mentioned by multiple speakers:
- SEMRush
- Google: Keyword Ad Tool, Webmaster Tools, Adplanner, SocialGraph API, Google Trends, Analytics, Google Insights
- SpyFu: Kombat, Domain Ad History, Smart Search, Keyword Ad History
- SEOBook
- SEOmoz: Linkscape, Mozbar, Top Pages, etc.
- MajesticSEO
- Raven SEO Tools: Website Analytics, Campaign Reports
Stay tuned for a day 3 and wrap up article!
Learn more about End Point's technical SEO services.
PubCon Vegas Day 1: Keyword Research Session
On the first day of PubCon Vegas, I was bombarded by information, sessions, and people. PubCon is a SEO/SEM conference that has a variety of sessions categorized in SEO (Search Engine Optimization), SEM (Search Marketing), Social Media and Affiliates. My primary interest is in SEO, which is why I attended the SEO track yesterday that included sessions about in-house SEO, organic keyword research and selection, and hot topics in SEO.
Because my specific involvement in SEO has focused on technical SEO, I was surprised that my highlight of day one was "Smart Organic Keyword Research and Selection" which included speakers Wil Reynolds, Craig Paddock, Carolyn Shelby, and Mark Jackson.
With good organization and humor, Carolyn first presented the "ABCs of Organic Keyword Research and Selection": A is for analytics and knowing your audience. B is for brainstorm and bonus. and C is for Cookie!, crunch the numbers, cull the lists, and create a final list of keywords.
On the analytics side, Carolyn mentioned good sources of analytics include web server logs (read my article on the value of log or bot parsing), Google Analytics "traffic generating" keyword list, and logs from internal site search.
In regards to knowing your audience, Carolyn shared her personal experience of focus group research: For a project that targeted teenage girls, she invited her daughter and several of her daughter's friends to join her around the table with laptops. She showed them a picture and ask them to search for that image. She recorded the search terms used and used this information to help understand her target audience behavior.
On the brainstorm side, she likes to involve core web team members, product managers, marketing, developers, designers, promoters, marketers, and front liners (customer service representatives, tech support). B was also for bonus, which was to get input from the "suits" of a company to get a list of ideal keywords to understand how they measure keyword success.
Craig Paddock spoke on "Organic Keyword Research and Selection" next. He touched on some of the following SEO keyphrase concepts:
- keyphrase research: Keyword research is based on keyword popularity, click through rate, quality (measured by conversion and engagement), keyword competitiveness, and current ranking
- keyphrase expanders and variations: Broad keyword phrases should include variations of keywords that include words like 'best', 'online', 'buy', 'cheap', 'discount', 'wholesale', 'accessories', 'supplies', 'reviews', and abbreviations of words like states. For End Point's ecommerce clients, targeting keyphrases with customer reviews is a great way to generate traffic from user generated content
- keyphrase discovery: It shouldn't be assumed that clients know the industry. Craig shared an example that his boxing retailer client made the mistake of targeting specific boxing terms that had low traffic. They expanded to include more popular terms like "lose weight" and "burn calories". Another tactic to discover keyphrase is to ask what kind of problems the website service offered solve and choose keywords that target these questions and answers.
- keyphrase quality: Keyphrase quality is typically measured by conversion rate (revenue / visitor) or engagement. Engagement is measured by the time on site, pages/visit, and bounce rate, which are commonly included in analytics packages.
- keyphrase selection: Using exact match and broad match on keywords is helpful and let the customers guide the keyword selection. Craig mentioned that data shows that there is a higher conversion rate on more specific keyphrases, which isn't surprising.
- keyphrase targeting: Keyphrase targeting should match competitiveness with link popularity. An example of this being that more competitive words on your site should be higher up in the hierarchy of the site such as on the home page. For End Point, this would involve us targeting competitive phrases terms like "ecommerce", "ruby on rails development", and "web application development" on our homepage and targeting less competitive phrases such as "interchange development" or "ruby on rails ecommerce" on pages lower in the hierarchy.
- keyphrase analysis: One area of interest was how analytics tools attribute "credit" to keyphrases. In Google Analytics, if a customer searches "interchange consulting" and visits endpoint.com, then a week later searches "end point", the conversion or credit of the keyphrase is attributed to the "end point" keyword rather than "interchange consulting". This is important in ecommerce because this attribution doesn't accurately credit targeted keywords for revenue. Craig did mention that other tools (including Omniture) provide the ability to select last click attribute versus first click attribute to fix this attribution problem. Another solution to this problem mentioned was to set a user defined variable in Google Analytics equal to a cookie that has the first click search term ("interchange consulting" in the example above) and set the cookie to not expire.
Wil Reynolds spoke next on "Keyword Analysis AFTER the rankings". He touched on an important concept that SEO (specifically keyphrase research and targeting) is never done because keywords are constantly evolving because people change the way they search, blended search (video, image) is on the rise, and there are social or economic influences on the keyword popularity. Some good examples of keyphrase trending include:
- "Shopping" was a good keyword in 1999 because ecommerce was growing on the web and users didn't know what to search for.
- "Handheld device" transitioned to "Smartphone"
- "Eco-Friendly" has grown while "Environmentally Friendly" has declined - view this trend here
- "Netbooks" and "Ultraportables" are popular search terms on the rise that were non-existent two years ago - view netbook trends here
- Brands in the gear industry evolve at a much faster pace than the plumbing or wood floor industry
Wil's examples and advice apply directly to our clients who should to be aware of social and economic influences that may require they change they keyphrase targeting over time.
Finally, Mark Jackson spoke on focusing your keywords for better results. He discussed the importance of analyzing the keyword competitiveness to determine which keywords to target to get the most value out of keyword SEO work.
In summary, I still don't love keyword and keyphrase research and selection :), but I found that the speakers presented a great overview of keyword research and selection with a good mixture of personal experience, expertise and examples. In summary, some great concepts to keep in mind in regards to keyword research are:
- There are always missed opportunities in keyword targeting.
- There are lotsa tools! Tools are good for measuring keyphrase competitiveness, user engagement, and identifying missed opportunities.
- SEO keyphrase research and selection is an ongoing process.
Now, back to day 2 activities...
Learn more about End Point's technical SEO services.
Performance optimization of icdevgroup.org
Some years ago Davor Ocelić redesigned icdevgroup.org, Interchange's home on the web. Since then, most of the attention paid to it has been on content such as news, documentation, release information, and so on. We haven't looked much at implementation or optimization details. Recently I decided to do just that.
Interchange optimizations
There is currently no separate logged-in user area of icdevgroup.org, so Interchange is primarily used here as a templating system and database interface. The automatic read/write of a server-side user session is thus unneeded overhead, as is periodic culling of the old sessions. So I turned off permanent sessions by making all visitors appear to be search bots. Adding to interchange.cfg:
RobotUA *
That would not work for most Interchange sites, which need a server-side session for storing mv_click action code, scratch variables, logged-in state, shopping cart, etc. But for a read-only content site, it works well.
By default, Interchange writes user page requests to a special tracking log as part of its UserTrack facility. It also outputs an X-Track HTTP response header with some information about the visit which can be used by a (to my knowledge) long defunct analytics package. Since we don't need either of those features, we can save a tiny bit of overhead. Adding to catalog.cfg:
UserTrack No
Very few Interchange sites have any need for UserTrack anymore, so this is commonly a safe optimization to make.
HTTP optimizations
Today I ran the excellent webpagetest.org test, and this was the icdevgroup.org test result. Even though icdevgroup.org is a fairly simple site without much bloat, two obvious areas for improvement stood out.
First, gzip/deflate compression of textual content should be enabled. That cuts down on bandwidth used and page delivery time by a significant amount, and with modern CPUs adds no appreciable extra CPU load on either the client or the server.
We're hosting icdevgroup.org on Debian GNU/Linux with Apache 2.2, which has a reasonable default configuration of mod_deflate that does this, so it's easy to enable:
a2enmod deflate
That sets up symbolic links in /etc/apache2/mods-enabled for deflate.load and deflate.conf to enable mod_deflate. (Use a2dismod to remove them if needed.)
I added two content types for CSS & JavaScript to the default in deflate.conf:
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript
That used to be riskier when very old browsers such as Netscape 3 and 4 claimed to support compressed CSS & JavaScript but actually didn't. But those browsers are long gone.
The next easy optimization is to enable proxy and browser caching of static content: images, CSS, and JavaScript files. By doing this we eliminate all HTTP requests for these files; the browser won't even check with the server to see if it has the current version of these files once it has loaded them into its cache, making subsequent use of those files blazingly fast.
There is, of course, a tradeoff to this. Once the browser has the file cached, you can't make it fetch a newer version unless you change the filename. So we'll set a cache lifetime of only one hour. That's long enough to easily cover most users' browsing sessions at a site like this, but short enough that if we need to publish a new version of one of these files, it will still propagate fairly quickly.
So I added to the Apache configuration file for this virtual host:
ExpiresActive On ExpiresByType image/gif "access plus 1 hour" ExpiresByType image/jpeg "access plus 1 hour" ExpiresByType image/png "access plus 1 hour" ExpiresByType text/css "access plus 1 hour" ExpiresByType application/x-javascript "access plus 1 hour" FileETag None Header unset ETag
This adds the HTTP response header "Cache-Control: max-age=3600" for those static files. I also have Apache remove the ETag header which is not needed given this caching and the Last-modified header.
There are cases where the above configuration would be too broad, for example, if you have:
- images that differ with the same filename, such as CAPTCHAs
- static files that vary based on logged-in state
- dynamically-generated CSS or JavaScript files with the same name
If the website is completely static, including the HTML, or identical for all users at the same time even though dynamically generated, we could also enable caching the HTML pages themselves. But in the case of icdevgroup.org, that would probably cause trouble with the Gitweb repository browser, live documentation searches, etc.
After those changes, we can see the results of a new webpagetest.org run and see that we reduced the bytes transferred, and the delivery time. It's especially dramatic to see how much faster subsequent page views of the Hall of Fame are, since it has many screenshot thumbnail images.
Optimizing a simple non-commerce site such as icdevgroup.org is easy and even fun. With caution and practicing on a non-production system, complex ecommerce sites can be optimized using the same techniques, with even more dramatic benefits.
SEO: External Links and PageRank
I had a flash of inspiration to write an article about external links in the world of search engine optimization. I've created many SEO reports for End Point's clients with an emphasis on technical aspects of search engine optimization. However, at the end of the SEO report, I always like to point out that search engine performance is dependent on having high quality fresh and relevant content and popularity (for example, PageRank). The number of external links to a site is a large factor in popularity of a site, and so the number of external links to a site can positively influence search engine performance.
After wrapping up a report yesterday, I wondered if the external link data that I provide to our clients is meaningful to them. What is the average response when I report, "You should get high quality external links from many diverse domains"?
So, I investigated some data of well known and less well known sites to display a spectrum of external link and PageRank data. Here is the origin of some of the less well known domains referenced in the data below:
- http://www.petfinder.com/: This is where my dogs came from.
- http://www.endpoint.com/: That's Us!
- http://www.d-9.com/: The site for the movie District 9 - I saw it last weekend.
- http://www.gastronomyinc.com/: Market Street Grill is a great seafood restaurant in Salt Lake City.
- http://divascupcakes.com/: This is a great gourmet cupcake place in Salt Lake City.
- http://www.rediguana.com/: A GREAT Mexican food restaurant in Salt Lake City.
And here is the data:
I retrieved the PageRank from a generic PageRank tool. SEOmoz was used to collect external link counts and external linking subdomains. Finally, Yahoo Site Explorer was used to retrieve external link counts to the domain in question. I chose to examine both external link counts from SEOmoz and Yahoo Site Explorer to get a better representation of data. SEOmoz compiles their data about once a month and does not have as many urls indexed as Yahoo, which explains why their numbers may be lagging behind the Yahoo Site Explorer external link counts.
Out of curiosity, I went on to plot the Page Rank data vs. Log (base 10) of the other data.
PageRank vs Log of SEOmoz external link count
PageRank vs Log of SEOmoz external linking subdomain count
PageRank vs Log of Yahoo SiteExplorer external link count
PageRank is described as a theoretical probability value on a logarithmic scale and it's based on inbound links, PageRank of inbound links, and other factors such as Google visit data, search click-through rates, etc. The true popularity rank is a rank between 1 and X, where X is equal to the total number of webpages crawled by search engine A. After pages are individually ranked between 1 and X, they are scaled logarithmically between 0 and 10.
The takeaway from this data is when an "SEO report" gives advice to "get more external links", it means:
- If your site has a PageRank of < 4, getting external links on the scale of hundreds may impact your existing PageRank or popularity
- If your site has a PageRank of >= 4 and < 6, getting external links on the scale of thousands may impact your existing PageRank or popularity
- If your site has a PageRank of >= 6 and < 8, getting external links on the scale of tens to hundreds of thousands may impact your existing PageRank or popularity
- If your site has a PageRank of >= 8, you probably are already doing something right...
Furthermore, even if a site improves external link counts, other factors will play into the PageRank algorithm. Additionally, keyword relevance and popularity play key roles in search engine results.
Learn more about End Point's technical SEO services.
Site Search on Rails
I was recently tasked with implementing site search using a commercially available site search application for one of our clients (Gear.com). The basic implementation requires that a SOAP request be made and the XML data returned be parsed for display. The SOAP request contains basic search information, and additional information such as product pagination and sort by parameters. During the implementation in a Rails application, I applied a few unique solutions worthy of a blog article. :)
The first requirement I tackled was to design the web application in a way that produced search engine friendly canonical URLs. I used Rails routing to implement a basic search:
map.connect ':id', :controller => 'basic', :action => 'search'
Any simple search path would be sent to the basic search query that performed the SOAP request followed by XML data parsing. For example, http://www.gear.com/s/climb is a search for "climb" and http://www.gear.com/s/bike for "bike".
After the initial search, a user can refine the search by brand, merchant, category or price, or choose to sort the items, select a different page, or modify the number of items per page. I chose to force the order of refinement, for example, brand and merchant order were constrained with the following Rails routes:
map.connect ':id/brand/:rbrand', :controller => 'basic', :action => 'search'
map.connect ':id/merch/:rmerch', :controller => 'basic', :action => 'search'
map.connect ':id/brand/:rbrand/merch/:rmerch', :controller => 'basic', :action => 'search'
Rather than allow different order of refinement parameters in the URLs, such as http://www.gear.com/s/climb/brand/Arcteryx/merch/Altrec and http://www.gear.com/s/climb/merch/Altrec/brand/Arcteryx, the order of search refinement is always limited to the Rails routes specified above and the former URL would be allowed in this example.
For example, http://www.gear.com/s/climb/brand/Arcteryx/merch/Altrec is a valid URL for Arcteryx Altrec climb, http://www.gear.com/s/climb/brand/Arcteryx for Arcteryx climb, and http://www.gear.com/s/climb/merch/Altrec for Altrec climb.
All URLs on any given search result page are built with a single Ruby method to force the refinement and parameter order. The method input requires the existing refinement values, the new refinement key, and the new refinement value. The method builds a URL with all previously existing refinement values and adds the new refinement value. Rather than generating millions of URLs with the various refinement combinations of brand, merchant, category, price, items per page, pagination number, and sort method, this logic minimizes duplicate content. The use of Rails routes and the chosen URL structure also creates search engine friendly URLs that can be targeted for traffic. Below is example pseudocode with the URL-building method:
def build_url(parameters, new_key, new_value)
# set url to basic search information
# append brand info to url if parameters[:brand] exists or if new_key is brand
# append merchant info to url if parameters[:merchant] exists or if new_key is merchant
# append category info to url if parameters[:cat] exists or if new_key is cat
# ...
end
The next requirement I encountered was breadcrumb functionality. Breadcrumbs are an important usability feature that provide the ability to navigate backwards in search and refinement history. Because of the canonical URL solution described above, the URL could not be used to indicate the search refinement history. For example, http://www.gear.com/s/climb/brand/Arcteryx/merch/Altrec does not indicate whether the user had refined by brand then merchant, or by merchant then brand. I investigated a few solutions having implemented similar breadcrumb functionality for other End Point clients, including appending the '#' (hash or relative url) to the end of the URL with details of the user refinement path, using JavaScript to set a cookie containing the user refinement path whenever a link was clicked, and using a session variable to track the user refinement path. In the end, I found it easiest to use a single session variable to track the user refinement path. The session variable contained all information needed to display the breadcrumb with a bit of parsing.
For example, for the URL mentioned above, the session variable of 'brand-Arcteryx:merch-Altrec' would yield the breadcrumb: "Your search: climb > Arcteryx > Altrec" And the session variable 'merch-Altrec:brand-Arcteryx' would yield the breadcrumb: "Your search: climb > Altrec > Arcteryx". I could have used more than one session variable, but this solution worked out to be simple and comprised less than 10 lines of code.
Another interesting necessity was determining the best way to parse the XML data. I researched several XML parsers including XmlSimple, Hpricot, ReXML, and libxml. About a year ago, John Nunemaker reported on some benchmark testing of several of these packages (Parsing XML with Ruby). After some investigative work, I chose Hpricot because it was very easy to implement complex selectors that reminded me of jQuery selectors (which are also easy to use). The interesting thing that I noticed throughout the implementation was that the refinement parsing took much more time than the actual product parsing and formatting. For Gear.com, the number of products returned ranges from 20-60 and products were quickly parsed. The number of refinements returned ranged from very small for a distinct search Moccasym (4 refinement options) to a general search jacket (50+ refinement options). If performance is an issue in the future, I can further investigate the use of libxml-ruby or other Ruby XML parsing tools that may improve the performance.
A final point of interest was the decision to tie the Rails application to the same database that drives the product pages (which was easily done). This decision was made to allow access of frontend taxonomy information for the product categorization. For example, if a user chooses to refine a specific by a category (jacket in Kids Clothing), the Rails app can retrieve all the taxonomy information for that category such as the display name, the number of products in that category, subcategories, and subsubcategories. This may be important information required for additional features, such as providing the ability to view the subcategories in this category or view other products in this category that aren't shown in the search results.
I was happy to see the success of this project after working through the deliverables. Future work includes integration of additional search features common to many site search packages, such as implementing refinement by color and size, or retrieving recommended products or best sellers.
Learn more about End Point's Ruby on Rails development.
nofollow in PageRank Sculpting
Last week the SEO world reacted to Matt Cutts' article about the use of nofollow in PageRank sculpting.
Google uses the PageRank algorithm to calculate popularity of pages in the web. Popularity is only one factor in determining which pages are returned in search results (relevance to search terms is the other major factor). Other major search engines use similar popularity algorithms. Without describing the algorithm in detail, the important takeaways are:
- PageRank of a single page is influenced by all inbound (external links) links
- PageRank of a single page is passed on to all outgoing links after being normalized and divided by the total number of outgoing links
So, given page C with an inbound links from page A and B, where page A and B have equal page rank X, page A has 3 total external links and B has 5 total external links, page C receives more PageRank from page A than page B.

From an external link perspective, it's great to get as many links as possible from a variety of sources that rank high and have a low number of external links. From an internal site perspective, it's important to examine how PageRank is passed throughout a site to apply the best site architecture. In addition to designing a site architecture that pleases users and passes link juice throughout a site effectively, the rel="nofollow" tag was adopted by several major search engines and was used as an additional tool to stop the flow of link juice from one page to another. The nofollow tag can also be used to identify paid links (early implementation) or to avoid passing links to external sites completely.
In the example above, rel="nofollow" could be added to 2 links on page B which would result in the same PageRank passed from page B to page C as from page A to page C.

Then, at a recent SEO conference, Matt Cutts (head of the Google spam team) made a comment about how the PageRank algorithm changed its use of nofollow and just last week, it was announced that the PageRank algorithm would no longer use the nofollow attribute in PageRank sculpting. Any link with the nofollow attribute will no longer reduce the count of outgoing page links to improve link juice passed on to other pages, but link juice will still not be passed from one link to another with the nofollow attribute.
In the ongoing example, the link juice passed from page B to page C will be less than from page A to C because it has more outgoing links, even if they are nofollow links.

One SEOmoz article I read suggests that SEO best practices will now be to recommend blog owners to disallow comments that may contain external links to prevent the dilution of link juice. Other potential solutions would be to filter out links from user generated content (comments or qna specifically), use iframes to display any user generated content, or embed flash or java with external links. The nofollow attribute may be used to stop the flow of link juice to external pages, however, it may no longer be used for internal PageRank sculpting.
Learn more about End Point's technical SEO services.
SEO Ecommerce
I recently read an article that discusses Magento SEO problems and solutions. This got me to think about common search engine optimization issues that I've seen in e-commerce. Below are some highlighted e-commerce search engine optimization issues. The Spree Demo, Interchange Demo, and Magento Demo are used as references.
Duplicate Home Pages (www, non-www, index.html)
Duplicate home pages can come in the form of a homepage with www and without www, a homepage in the form of http://www.domain.com/ and a homepage with some variation of "index" appended to the url, or a combination of the two. In the Interchange demo, http://demo.icdevgroup.org/i/demo1 and http:/demo.icdevgroup.org/i/demo1/index.html are duplicate, http://demo.spreecommerce.com and http://demo.spreecommerce.com/products/ in the spree demo, and finally http://demo.magentocommerce.com/ and http://demo.magentocommerce.com/index.php in the Magento demo.
External links positively influence search engine performance more if they are pointing to one index page rather than being divided between two or three home pages. Since the homepage most likely receives the most external links, this issue can be more problematic than other generated duplicate content. I've also seen this happen in several content management systems.
This article provides directions on mod_rewrite use to apply a 301 redirect from the www.domain.com/index.php homepage to www.domain.com. This solution or other redirect solutions can be applied to Spree, Interchange, and other ecommerce platforms.
Irrelevant Product URLs
A search engine optimization best practice is to provide relevant and indicative text in the product urls. In the Interchange demo, the default catalog uses the product sku in the product url (http://demo.icdevgroup.org/i/demo1/os28073.html). In Magento and Spree, product permalinks with relevant text are used in the product url. In wordpress, the author has the ability to set permalinks for articles. I am unsure if Magento gives you the ability to customize product urls. Spree does not currently give you the ability to manage custom product permalinks. However, for all of these ecommerce platforms, these fixes may all be in the works since it is important for ecommerce platforms to implement search engine optimization best practices.
Duplicate product content
I've observed several situations where products divided into multiple taxonomies results in duplicate content creation via different user navigation paths. For example, in the Spree demo, the "Ruby Baseball Jersey" can be reached through the Ruby brand page, the Clothing page, or the homepage. The three generated duplicate content urls are http://demo.spreecommerce.com/products/ruby-on-rails-ringer-t-shirt, http://demo.spreecommerce.com/t/brands/ruby/p/ruby-baseball-jersey, and http://demo.spreecommerce.com/t/categories/clothing/shirts/p/ruby-baseball-jersey.
Another example of this can be found in the Interchange demo. The left navigation taxonomy tree provides links to any product url with "?open=X,Y,Z" appended to the url. The "open" query string indicates how the DHTML tree should be displayed. For example, the "Digger Hand Trencher" has a base url of http://demo.icdevgroup.org/i/demo1/os28076.html. Depending on which tree nodes are exploded, the product can be reached at http://demo.icdevgroup.org/i/demo1/os28076.html?open=0,11,13,19, http://demo.icdevgroup.org/i/demo1/os28076.html?open=0,11,13, etc. This standard demo functionality yields a lot of duplicate content.
In Magento, products are the in the form of www.domain.com/product-name, although the article I mentioned above mentions that www.domain.com/category/product.html product urls were generated. Perhaps this was a recent fix, or perhaps the demo is configured to avoid generating this type of duplicate content.
Duplicate product page content is often used to indicate which breadcrumb should display or to track user click-through behavior (for example, did a user click on a "featured product"? a "best seller"? a specific "product advertisement"?). In Interchange, session ids are appended to urls which is another source of duplicate content. Instead of using the url to track user navigation or behavior, several other solutions such as using cookies, using a '#' (hash), or using session data can be used to avoid duplicate content generation.
Performance
Performance should not be overlooked in ecommerce for search engine optimization. In March of 2008, Google wrote about how landing page load time will be incorporated into the Quality Score for Google Adwords - which is also believed to apply to regular search results. And github recently released some data on how performance improvements influenced http://www.github.com/ Googlebot visits.
Keeping a high content to text ratio, consolidation, minification, and gzipping css and javascript, and minimizing the use of javascript based suckerfish can all improve search engine performance.
The Interchange default catalog has a simple template with minimal css and javascript includes, so the developer is responsible for sticking to best performance practices. The Magento demo appears to have decent content to text ratio, but still requires 5 css files that should be consolidated and minified if they are included on every page. Finally, Spree has undergone some changes in the last month and is moving in the direction of including one consolidated javascript file plus any javascript required for extensions on every page, and the upcoming release of Spree 0.8.0 will have considerable frontend view improvements.
Ecommerce platforms should have decent performance - yslow or this book on high performance website essentials are good resources.
Lacking basic CMS management
Basic CMS management such as the ability to manage and update page titles and page meta data is something that has been overlooked by ecommerce platforms in the past, but appears to have been given more attention recently. An ecommerce solution should also have functionality to create and manage static pages.
The Interchange demo does not have meta description and keyword functionality, however, page titles are equal to product names which is an acceptable default. It's also very simple to add a static content page (as a developer) and would require just a bit more effort to have this content managed by a database in Interchange. The Spree core is missing some basic CMS management such as page title and meta data management, but this functionality is currently in development. One Spree contributer developed a Spree extension that provides management of simple static pages using a WYSIWYG editor. At the moment, Magento appears to have the most traditional content management system functionality out of the box.
Another area to improve CMS within Ecommerce is to determine a solution to integrate a blog. A quick search of "magento add blog" revealed how to set up a wordpress blog in Magento with an extension. One of End Point's clients, CCI Beauty, also has wordpress integrated into their Interchange setup. Finally, there has been discussion about the development of "Spradiant", or mixing spree and radiant.
Another missed opportunity in ecommerce platforms is finding a solution to elegantly blend content and product listings to target specific keywords. A "landing page" can have a page title, meta data, and content targeted towards a specific terms. http://www.backcountry.com/store/gear/arcteryx-vests.html and http://www.backcountry.com/store/gear/cargo-pant.html are examples of targeted terms with corresponding products. Going one step farther, search pages themselves can have managed content to attract keywords, such as a page title, and meta data for specific high traffic keywords with the related products. For example, http://www.domain.com/s/ruby_shirt could be a search page for "Ruby Shirt" which contains meaningful content and relevant products.
Mishandled Product Pagination
Finding a search engine optimization solution for pagination can be a difficult problem in ecommerce. When there are less than 100 products for a site, this shouldn't be an issue because a simple taxonomy can appropriately group the products with low crawl depth. A website with 10,000 products must balance between keeping a low taxonomy depth to minimize crawl depth and ensure that all products are listed and indexable.
For example, products may be divided and fit into three levels of navigation: category, subcategory, and group. If there are 10,000 products, divided into 10 categories, 10 subcategories per category, and 10 product groups per category, 10 products can be shown on each group per page with no pagination. However, product taxonomy is not always so ideal. In some groups there may be 2 products and in others there may be 30. Pagination, or pages with an offset of product listings are generated to accommodate these product listings (for example, http://www.backcountry.com/store/group/61/Sun-Hats-Rain-Hats-Safari-Hats.html, http://www.backcountry.com/store/group/61/Sun-Hats-Rain-Hats-Safari-Hats-p1.html).
A few problems can arise from the pagination solution. First, by web 2.0 standards, the content should be generated via ajax. An SEO friendly ajax solution must be implemented - where the onclick event refreshes the content, but the links are still crawlable via search engine bots. Second, page 1 with no product offset will have 1 level less of crawl depth, therefore it will receive the most link juice from it's parent page (subcategory). As a result, there must be thoughtful analysis of which products to present on that page: should high traffic pages get the traffic? should popular items be listed on the first page? should low traffic products be listed to try to bump the traffic on those pages? should products with the most "user interaction" (reviews, qna, ratings) be shown on that page? Another problem that comes up is that the page meta data and title will most likely be very similar since the content is a list of similar products. These two pages can essentially be competing for traffic and may be counted as duplicate content if the page titles and meta data are equal.
Interchange uses the more list to handle pagination, but this functionality is not search engine friendly as it generates urls such as http://demo.icdevgroup.org/i/demo1/scan/MM=3ffffa066192cba677e1428d7461ddc9:10:19:10.html?mv_more_ip=1&mv_nextpage=results&mv_arg=, http://demo.icdevgroup.org/i/demo1/scan/MM=3ffffa066192cba677e1428d7461ddc9:20:27:10.html?mv_more_ip=1&mv_nextpage=results&mv_arg=, etc. The Spree demo had some pagination implementation, but upon recent frontend changes, it is no longer included in the demo. The Magento demo was carefully arranged so that product group pages have no more than 9 products to avoid showing any pagination functionality. However, when modifying the number of products displayed per group or using the "Sort By" mechanism, ?limit=Y and &order=X&dir=asc is appended to the url - which can produce a large volume of duplicate content (try filters on this page).
It is difficult to determine which of the above problems is the most problematic. From personal experience, I have been involved in tackling all duplicate content issues, and then moving on to "optimization" opportunities such as enhancing the content management system. At the very least, developers and users of any ecommerce platform should be aware of common search engine optimization issues.
Learn more about End Point's technical SEO services, Interchange development, and Rails shopping cart development.
Rails and SEO advantages
In today's climate, search engine optimization is a must to be competitive. Rails routing provides this advantage and much more.
Descriptive, content packed URLs afford your website better search rankings because they provide a clear context as to what the page is about. Using keywords in the filename goes even further. Under normal circumstances, without advanced configuration, a web page filename is rigid and fixed. This isn't a problem in itself, except for that it doesn't help with SEO one bit.
Having multiple URLs linking to the same page opens more doors to search engine crawlers. Generally, once indexed correctly, this means more access paths in to your site which in turn a result in a greater variety and volume of traffic.
Normally in most other programming languages, you would need to use an Apache rewrite rule to accomplish this. This rule will detect a digit in a file name and pass it along as a parameter to another dynamically generated page.
RewriteRule ^/.*([0-9]+).*$ /index.php?i=$1 [R=301,L]
This rule is definitely probably too greedy of a match, however, it serves to illustrate the point. With that rule in place, any request containing at least one number will be forwarded along to the index.php handler. Then by using a site map or just by modifying existing link structure, you can spell out multiple, descriptive, relevant URLs and increase the number of ways into your site. Not only will the quantity of links improve, but more importantly, the quality will too.
Rails does it a little different. Apache generally deals with files and for the most part isn't aware of application dynamics. This is where rails routing comes in. Rails is MVC oriented; each controller is comprised of one or many methods or actions in rails terminology. The URI is typically broken down into constituent parts in the following fashion.
map.connect ":controller/:action/:id"
With rails routing, you can specify that the elements in the URL are passed along to the correct controller and corresponding method along with variables that are used in your code.
map.connect "music/:category/:year/:month",
:controller => "events",
:action => "show",
:requirements => {
:year => /(19|20)\d\d/,
:month => /[01]\d/,
},
As you can see as reflected in these examples, rails is powerful tool for building websites, well beyond SEO advantages. The point is though, for SEO, you can specify as many alternate pathways into your site utilizing keyword rich linkage. By using apache you can accomplish a lot, but with rails, you can accomplish so much more. If your application has dynamic category set that you wanted to have accessible via URI, rails would be ideal for this. With rails, not only could categories be represented, but products and product descriptions can be easily translated into the URI and then propagated out to the search engines indexes.
Learn more about End Point's ruby on rails development and technical SEO services.
End Point: Search Engine Bot Parsing
I've talked to several coworkers before about bot parsing, but I've never gone into too much detail of why I support search engine bot parsing. When I say bot parsing, I mean applying regular expressions to access log files to record distinct visits by the bot. Data such as the url visited, exact date-time, http response, bot, and ip address is collected. Here is a visual representation of bot visits (y-axis is hits).
And here are the top ten reasons why search engine bot parsing should be included in search engine optimization efforts:
#10: It gives you the ability to study search engine bot behavior. What is bot behavior after 500 error responses to a url? What IP addresses are the bots coming from? Do bots visit certain pages on certain days? Do bots really visit js and css pages?
#9: It can be used as a teaching tool. Already, I have discussed certain issues from data generated by this tool and am happy to teach others about some search engine behavior. After reading this post, you will be much more educated in bot crawling!!
#8: It gives you the ability to compare search engine bot behavior across different search engines. From some of the sites I've examined, the Yahoo bot has been visiting much more frequently than Googlebot, msnbot, and Teoma (Ask.com). I will follow up this observation by investigating which urls are getting crawled by Yahoo so much more frequently.
#7: It can help identify where 302s (temporary redirects) are served when 301s (permanent redirects) should be served. For example, spree has a couple of old domains that are 302-ing occassionally. We now have the visibility to identify these issues and remediate them.

#6: It gives you the ability to study bot crawling behavior across different domains. Today, I was asked if there was a metric for a "good crawl rate". I'm not aware of a metric, but comparing data across different domains can certainly give you some context to the data to determine where to make search engine optimization efforts if you are divided between several domains.
#5: It gives you the ability to determine how often your entire site is crawled. I previously developed a bot parsing tool that did a comparison to the urls included in the sitemap. It provided metrics of how often 100% of a site was crawled or even how often 50% of a site was crawled. Perhaps only 95% of your site has ever been crawled - this tool can help identify which pages are not getting crawled. This data is also relevant because as the bots deem your content more "fresh", they will visit more. "Freshness" is an important factor in search engine performance.

#4: It gives you the ability to correlate search engine optimization efforts with changes in bot crawling. Again, the goal is to increase your bot visits. If you begin working on a search engine marketing campaign, the bot crawl rate over time will be one KPI (key performance indicator) to measure the success of the campaign.
#3: It gives you the immediate ability to identify crawlability issues such as 500 or 404 responses, or identify the frequency of duplicate content being crawled. Many other tools can provide this information as well, but it can be important to distinguish bot behavior from user behavior.

#2: It provides a benchmark for bot crawling. Whether you are implementing a search engine marketing campaign or are simply making other website changes, this data can server as a good benchmark. If a website change immediately causes bot crawling problems, you can identify the problem before finding out a month later as search engine results start to suffer. Or, if a website change causes an immediate increase in bot visibility, keep it up!

And the #1 reason to implement bot parsing is...
"cuz robots are cool", Aleks. No explanation necessary.

Learn more about End Point's technical SEO services.
End Point SEO with Linkscape
Linkscape was released in October of 2008 and is SEOmoz's collection of index data from the web that currently contains 36 billion URLs over 225 million domains. You must have a pro membership to access advanced reporting, but without a pro membership you can access basic data such as mozRank (SEOmoz's own logarithmic metric for page popularity) for the url, number of links to a url, number of domains to a url, and mozRank for the domain.
For example, I ran a basic report on www.google.com and found:
- The mozRank of http://www.google.com/ is 9.36 out of 10
- There are ~96.8 million links to http://www.google.com/
- There are ~1.6 million domains linking to http://www.google.com/
More interesting data on www.facebook.com:
- The mozRank of http://www.facebook.com/ is 7.40 out of 10
- There are 0.9 million links to http://www.facebook.com/
- There are 60,000 domains linking to http://www.facebook.com/
Because I haven't given justice to describing Linkscape, please read more about Linkscape, or see Linkscape Comic for visual enhancements.
Case Study
In an effort to examine and improve End Point's search engine performance, I pulled together some snippets of data from Linkscape for End Point and End Point's blog after getting a pro membership.
www.endpoint.com:
- The mozRank of http://www.endpoint.com/ is 5.24 out of 10
- There are 24,084 links to http://www.endpoint.com/
- There are 189 domains linking to http://www.endpoint.com/
Top 5 most common anchor text phrases to www.endpoint.com:
- "End Point Corporation" from 86 links over 36 domains
- BLANK from 16 links over 4 domains
- "DESIGNED BY END POINT CORPORATION" from 10 links over 1 domain
- "End Point Corp." FROM 18 links over 5 domains
- "Endpoint" from 19 links over 2 domains
blog.endpoint.com:
- The mozRank of http://blog.endpoint.com/ is 4.88 out of 10
- There are 220 links to http://blog.endpoint.com/
- There are 8 domains linking to http://blog.endpoint.com
Top 5 most common anchor text phrases to blog.endpoint.com:
- "Blog" from 10 links over 1 domain
- "End Point blog" from 31 links over 4 domains
- "Home" from 10 links over 1 domain
- "End Point blog" - http blog.endpoint.com from 1 link over 1 domain
- "Jon Jensen" from 7 links over 1 domain
Here are a few points I'd like to mention as an initial reaction to the data:
#1:
The End Point site was registered in October of 1995 and the blog was registered July of 2008. And the mozRank (~popularity) of the blog has built up considerably in less than a year. Why? Linkscape reports that the top most important links to blog.endpoint.com come from www.endpoint.com, so in the short time that the blog has been in existence, much of the value has passed from www.endpoint.com. The remainder of the links to blog.endpoint.com come from sites like osnews.com, cryptography.mesogunus.com, blogenius.com, and perl.coding-school.com.
On the other hand, www.endpoint.com has many external links passed from client websites spread over 189 domains.
This emphasizes the fact that having high popularity external links can significantly influence page popularity (www.endpoint.com passing to blog.endpoint.com in this case).
When explaining this to a fellow End Pointer (Jon), he also commented, "based on [the data], if nothing external changes, it's pretty much impossible for the blog to overtake www.endpoint.com in ranking" - another good point to realize. We can continue to work on improving www.endpoint.com's popularity and it will continue to pass along value to blog.endpoint.com.
#2:
Google Analytics shows that traffic to www.endpoint.com is lacking in terms related to services that End Point provides, such as "ecommerce", "postgres", and "interchange". The Linkscape reports can help explain why. Linkscape provides a list of the 50 most common anchor text phrases to a url derived from the 3,000 most important links to that url. Out of the 50 most common anchor text phrases, more than half of them contain some variant of "End Point Corporation" and less than 5 of them contain terms related to the services that we offer. Although this is not surprising, it highlights an opportunity for End Point to consider targeting service related keywords. We may not have direct control over all of our external links, but we can request to enhance the existing anchor text or alt text of images.
#3:
Linkscape also points out where anchor text is blank or lacking in relevant keywords. Some examples of these include images without alt text, anchor text like 'work', 'at work', 'my employer', or 'open this site in another window'. Again, we may not have complete control over external link anchor text, but we can try to address the missed opportunity for passing link value through relevant anchor text such as in these examples.
Conclusion:
Ultimately, End Point does not have control over all external links or anchor text in external links. However, we will try to address some of the issues mentioned and revisit the data in a couple of months (Linkscape data is refreshed monthly) in hopes of improving our search engine performance. SEOmoz and Linkscape provides tons (!) of other data related to a wide variety of search engine optimization topics. I'm very excited to have access to this tool and hope to provide more snippets of data in the future.
Learn more about End Point's technical SEO services.
Apache RewriteRule to a destination URL containing a space
Today I needed to do a 301 redirect for an old category page on a client's site to a new category which contained spaces in the filename. The solution to this issue seemed like it would be easy and straight forward, and maybe it is to some, but I found it to be tricky as I had never escaped a space in an Apache RewriteRule on the destination page.
The rewrite rule needed to rewrite:
/scan/mp=cat/se=Video Games
to:
/scan/mp=cat/se=DS Video Games
I was able to get the first part of the rewrite rule quickly:
^/scan/mp=cat/se=Video\sGames\.html$
The issue was figuring out how to properly escape the space on the destination page. A literal space, %20 and \s all failed to work properly. Jon Jensen took a look and suggested a standard unix escape of '\ ' and that worked. Some times a solution is right under your nose and it's obvious once you step back or ask for help from another engineer. Googling for the issue did not turn up such a simple solution, thus the reason for this blog posting.
The final rule:
RewriteRule ^/scan/mp=cat/se=Video\sGames\.html$ http://www.site.com/scan/mp=cat/se=DS\ Video\ Games.html [L,R=301]
Search Engine Optimization Thoughts
Search engine optimization and search engine marketing cover a wide range of opportunities to improve a website's search engine traffic. When performining a search engine site review, here are several questions that I would investigate. Note that the questions are biased towards technical search engine optimization efforts. Some of the questions provide links to help define common search engine optimization terms. Although this is not typical End Point blog fashion, the answers to these questions can potentially lead to search engine optimization improvements.
Technical Topics
- Do the pages indexed by major search engines accurately represent the site's content?
- Are there duplicate index pages indexed?
- Are there old index pages or domains that aren't redirected?
- Are there pages missing from major search engine indexes?
- Are there too many pages in major search engine indexes?
- Are 301 redirects used permanently on the site?
- Can rel="canonical" or the use of 301s be applied as a temporary solution to fix duplicate content issues?
- Is there low hanging fruit to fix duplicate content issues?
- Is there low hanging fruit to fix duplicate content generated by external links?
- Are there on site duplicate content issues that can be remedied with '#' (relative link)?
- Can external affiliate link structures be modified to use '#' (relative link) instead of query strings?
- Is there a trend that pages deep in a site's structure don't get indexed?
- Is there a trend that a certain group of pages don't get indexed?
- Does any duplicate content originate from links in on site javascript?
- Are there any pages indexed that don't need to be indexed?
- Is there a robots.txt file?
- Can the use of rel="nofollow" on site help direct link juice?
- Do url structures follow standard guidelines?
- Is there pagination on site that produces an ever-increasing crawl depth?
- Can navigation be modified to reduce on site pagination?
- Is there a sitemap?
- Would a sitemap provide an alternate way to index pages on a site?
- Is there a mechanism that causes creation of new urls and deletion of old urls (example: items going out of stock)?
- Is there a mechanism that causes urls to fluctuate in and out of navigation (example: items going in and out of stock)?
- Does the website owner have a way to monitor bot crawl data and trends?
- Are there google webmaster tool errors that need to be addressed (examples: duplicate page title, meta data)?
- Are there any problems with server response (examples: frequent server down time)?
- Are there frequent 404 responses?
- Can relevant, helpful content that can be added to the 404 page?
- Are 302 (temporary redirects) ever served when 301s should be served?
- Does any page content have excessive content that isn't indexed by major search engines?
- Can pages with excessive content be divided into different pages target different keywords?
- How does the YSlow score compare to competitor sites?
- Are there low hanging performance improvements that can be made?
- Is there excessive inline javascript or css?
Keyword Topics
- Does the website owner have a set of defined target keywords?
- Is the site architecture built around the target keywords?
- Are on page elements appropriately targeting keywords?
- Do images have alt text that targets relevant keywords?
- Are page titles targetting keywords appropriately (example: Keyword - Category - Website Name)?
- Is there a mechanism that allows for page titles, headers, and content to be refreshed frequently?
- Does the site target short tail and long tail content?
- Does the website owner have a separate site that generates relevant content and produces inbound links?
- Does the website owner have a tool to measure search engine traffic over time?
- Does the website owner have a tool to measure incoming links?
- Does the website owner have a tool to measure incoming link anchor text trends?
User Generated Content Topics
- Does the site have user generated content?
- Would building content around user generated content lead to an increase of long tail search engine traffic?
- Is there a way to rank user generated content by quality?
- Does the site have a social community that encourages production of user generated content?
I hope to address some of these topics in depth in the future as part of this blog.
Learn more about End Point's technical SEO services.




