If you've been following the evolution of search engine
optimization strategies, you know that the game's not getting any easier. Not
only that, but traditional forms of optimizing aren't as effective as they once
were. Am I telling you anything you don't already know?
Michael
Campbell, author of the popular e-book,
Nothing But `Net and
editor of The Vault, a subscription newsletter pertaining to search engine
positioning strategies, offers some serious answers to these concerns as he
examines a new "wave" of search engines: theme engines.
Why DO The
Search Engines Change So Much?
Why do the search engines constantly have to evolve into a
different type of engine? Why can't they stay the same?
To answer
this, let's look at the ultimate goal of a search engine. What do the search
engines want to do? They want to provide relevant results to you, the user. Why
can't they do that under the current system?
There are several reasons why
the current system isn't working. For one thing, the Internet is growing at an
unheard of rate. Plus, spammers are growing at an unheard of rate as well. In
many ways, the engines are fighting a losing battle to provide relevant results
while combating spamming and duplicate pages.
In essence, the engines need
a way to store more pages, combat spam, and still provide (or attempt to
provide) pertinent results. So, in an effort to provide relevant results, the
engines began sliding in other variables, which is where the 1st, 2nd, and 3rd
generation search engines come in.
1st, 2nd, and 3rd Generation
Engines
By understanding the path we've taken to get where we are in
this crazy search engine business, it might give us some insight into where
we're going.
You may have heard of 1st, 2nd, and 3rd generation
engines, but what exactly does that mean?
Michael Campbell explains,
In the beginning, search results were very basic and largely
depended on what was on the Web page. Important factors included keyword
density, title, and where in the document keywords appeared.
First
generation added relevancy for META tags, keywords in the domain name, and a
few bonus points for having keywords in the URL. Basic spam filters emerged
that got rid of keyword stuffing and same color text. The portals also made
their appearance, and engines started looking like giant billboards and
overstuffed yellow pages.
All of this is quite familiar, isn't it? Almost too
familiar.
But, do META tags hold as much importance as they once did?
No. Does using keywords in various tags help as much? Generally not.
Instead, the engines took it a step further in their quest for
relevant results by bringing in 2nd generation engines.
Campbell
explains,
Second generation, which is in full swing with the themes
thing, added much in the way of off page criteria and link analysis. A few of
the major components they employ are tracking clicks, page reputation, link
popularity, temporal tracking, and link quality. Then they started adding in
term vectors, stats analysis, cache data, and context where two-word keyword
pairs were extracted from a page to better categorize it.
We'll cover "term vectors" and other information mentioned in
the above paragraph later in this article. For now, let's continue with 2nd
generation engines.
We all know how important a good solid link
popularity is these days. Does any old link count? Certainly not. The days of
huge link exchange programs with no thought for "related" links are over.
Plus, with Google's PageRank system and DirectHit's method of
tracking clicks and the length of visits, we're seeing more evidence of a 2nd
generation engine.
But what is a 3rd generation engine? It's almost
mind boggling to consider.
Campbell explains,
Third generation is already underway. It adds word stemming
and a thesaurus on top of the term vector database to assist in keeping a
search in context. Auto extraction of keyword pairs also helps automatically
categorize a page, where searches like `shop for' or `find' trigger totally
different search results based on the context or intent of the person doing the
searching.
G3 adds Web maps which, although not searchable, are a
useful filtering tool to get rid of duplicate sites and many stand alone pages
that drive traffic to only a few destinations. This means pages like doorways,
gateways, entry, splash, or whatever you want to call them, will soon get
filtered out.
They will also be extracting as much data as possible about
your individual searching habits. All the major engines plan on building
personal profiles, little robots that `come to know you' over a period of time,
based on past searching habits.
Okay, so we have a good idea of where the search engines are
headed, but how can we keep up? The 2nd and 3rd generation engines are
theme-based, but what does that mean, and how does it translate to what we need
to do with our own sites?
What are "Theme" Engines?
What exactly is a "theme" engine? First, let's hear the
scientific definition. This isn't easy reading, so it might help if you have a
brown paper bag handy in case you hyperventilate.
Computer scientists
working with Campbell define "themes" or "topics" as,
Using a term vector database, they weigh page keyword density
to calculate the page vector, which is compared and stored relative to the term
vector. They then compute a Web page reputation by graphing interconnectivity
and link relevancy, making sure the reputation of the page and the content on
the page actually match. The closest matches get the highest search engine
positioning.
Uh huh. Kinda hurts the brain cells, doesn't it?
Now,
let's look at an easier-to-understand explanation. How does Michael Campbell
define a theme engine?
One. The answer is one. What you say about your Web page, how
the structure of other people's Web pages compares on the same topic, and what
other people say your site is about, must match, be in harmony with each other,
be as one.
Or, in the cold hard world of the search engines, where
everything is weighted and calculated according to mathematical formulas,
whoever is closest to the 1.000000 without going over is the winner, coming up
tops in the search engine.
A theme engine looks at all the information
on a `seed set' or a group of sites and pages that it has already spidered and
has in its index. It assigns each page in the index a number or page vector.
This becomes the `core' of the search engine.
Suppose you just
submitted a Web page, so you are now in competition with everything in the
core. The engine looks at everything on your page, from one and two keyword
phrase densities, to page length, compares it to the seed set and assigns your
page a number, for each keyword phrase. These numbers assigned to the keyword
phrases are known as `term vectors.'
The closer your term vector is to
the page vector, the better chance your page has of being a top ten contender
for any particular keyword phrase. You might even be `folded in' to the core,
bumping off some other page, causing it to fall out of the search engine. (Some
engines will adopt the `pay to stay in the core' model in the near future, so
paid sites won't get bumped out.)
Then, there is what the rest of the
Internet and its users have to say about your page. Link analysis, traffic,
stats, and cache data are all taken into consideration and analyzed.
The next step is to add in and calculate words in incoming links to your page,
making sure they match up to your term vector. So, what the search engine has
determined that your page is about must match what the rest of the Internet
says your page is about in their links to you.
So in review, in
layman's terms, here is what I would define as a theme based engine:
What you say your page is about, what the search engine calculates your page to
be about, and what the rest of the Internet thinks your page is about, must
match, according to their mathematical formulas.
Then, as the whipped
cream topping on top of the theme behavior sundae, are the stats and cache
data. If your site is one of a search engine's top exit pages, it must be good,
because people don't come back and search some more once they've found your
site. You just got a big boost in positioning. And, if your site gets searched
and clicked on so often that you are in the engine's cache for speedy data
retrieval, your site must be very good indeed.
All of these factors,
both on and off page criteria, help define what a theme-based search engine is
looking for. They are looking for unanimous approval that your site is all
about a particular topic. And the more narrow the focus on that topic, the
better your site will do.
Take a deep breath. You probably feel like your mind is burning
with information, because this is a lot to digest. Go get a cup of coffee (or a
stiff drink), and let's get back to work.
Which Engines are Theme
Engines?
In Campbell's opinion, all search engines are moving toward
being theme-based.
It's just another way of saying they are implementing `second
generation' search engine strategies. Some engines call it `in context'
searching, while others call it `rank and reputation' or `on topic.' These are
all different ways of saying the same thing: adding off-page criteria to help
determine relevancy.
So, with all of the engines gravitating toward being theme
engines, does this mean that we have to scrap our current search engine
optimization strategies? Not necessarily.
Let's look at a few of our
current optimization strategies to see how effective they'll be with theme
engines.
Current Optimization Strategies
1. Cloaking
With the move toward theme engines, will cloaking be as
effective when working with a theme engine?
John Heard, producer of
IP-Delivery, a leading cloaking software, says it will be just as effective and
even allow for more flexibility in page content if implemented properly.
According to Heard,
There is no difference between a cloaked or non-cloaked site
when it comes to themes for in-bound link popularity in most cases. However, it
should be noted that a cloaked site can choose what links it does or does not
show to the search engines. This is potentially advantageous.
Say that
you want to trade links with someone. You want the advantage of their link
popularity but you don't want to send your popularity back to them. A cloaked
page will help you do this if you set it up right. By placing the links only on
the consumer version page but excluding them from the search engine optimized
(cloaked) page, you can 'hide' them from the engine.
So yes, in that
way, cloaking can affect link popularity. It's entirely in the hands of the SEO
professional in the manner it's used. Cloaking is handy if you want to cross
link sites and show those links to humans but don't want the engines to see the
links.
A good example is if you own a computer hardware site and a
travel site. The two topics are not theme related so you don't want the engines
to see the links between them. On the other hand, you might want your site
visitors to see the links and a cloaking system would give you the best of both
worlds.
2. Keyword Weight
Is keyword weight dying in importance, similar to its
optimization buddies the META tags?
Not at all.
Campbell explains
that keyword density is a very important foundation upon which everything else
is built.
Different types of documents or pages have different
characteristic densities. The seed set of Web pages that the theme engine used
to populate its database will determine what is a normal keyword density for
each keyword, based on the entire collection of pages for any particular
topic.
Since the term vector database (TVD) is an open-ended
application, other applications can be run on top of it. This gives the search
engines the ability to change the target keyword densities from the normal
parameters at will, to give the illusion of fresh search results, without
needing to recompile the database. Smoke and mirrors mostly, but it keeps the
very important keyword density target moving.
3. Competitive Keywords
One real problem when working with theme engines is getting
stuck in the wrong vector if there are already many sites on a particular
subject. Being in the wrong vector will mean that your page won't match the
term vector, so your site's ranking will suffer.
But what if you're
working with a highly competitive keyword phrase?
Campbell
suggests,
If there are already 50 documents with 100 percent relevancy
associated with a term vector in the database core, you are not likely to get
in unless you pay for it. If you are really lucky, you might nail some off-page
criteria that makes your site more important and bump some other site off. It
is do-able, but it is a lot of hard work.
If you need instant traffic,
just go after the low hanging fruit. Go after a second or third, yet popular,
way of saying the same thing. For example, the phrase cellular phones is fierce
and mobile phones is tough. Wireless phones is a very popular search phrase but
has relatively little competition. My advice would be to go after the low
hanging fruit first, and then try playing with the professionals at the top of
the tree.
Getting stuck in the wrong vector is nasty. You'll need to
change the content on your page to be sure it cannot be taken out of context.
Make sure that on your banana bread recipe page, don't say you're growing fond
of the recipe. Otherwise, the vector might determine your site is about growing
bananas and not banana bread. The good news is that we can expect TVD's to get
more accurate as they add more context intelligence.
4. Stop Words
Another problem in working with theme engines concerns stop
words. If an engine considers a word a stop word, it won't get indexed at all.
So, if your keyword phrase contains a stop word, you need to work around it.
"If the engine is filtering out the word Web in the phrase Web site hosting, it
means focusing your efforts on the phrase site hosting or saying the same thing
but in a different way, like domain hosting," explains Campbell.
5.
Redundancy Filters
With the theme engines looking for redundant Web pages, how can
you avoid setting off the redundancy filters?
Simply put, the days of
having mirror sites are over.
So, to avoid setting off the redundancy
filter, don't duplicate, mirror, or copy your pages. Don't use "cookie cutter"
templates with the keywords swapped out.
Campbell explains,
The filters are getting even tighter with Web maps. They can
tell if a bunch of pages are doorways, or dupes, even if they are stored on
different domains, because the page length and bite size are similar and they
all point to the same place. They'll all get nuked in the culling process.
Campbell suggests sitting down and writing what the page is
about.
Then once the page is complete, look at the target keyword
densities you would like to achieve and start working the keywords into the
title, headlines, links, and body copy of the page.
Try not to go too
crazy with doorway pages for each site. Spread them around on different
domains. Set up completely different Web sites to sell related yet different
product lines, and create your own mini Internet of linked sites.
6. Lengthy Pages
With theme engines, you'll be walking a fine line between giving
the engine what it wants to see (related content) and providing too much
information.
If you provide too much information, it's likely that the
page pertains to more than one topic, which means you'll have a more difficult
time getting a top ranking.
But is it also a problem with TVD's
compressing large pages?
"The TVD doesn't actually store the entire
page," says Campbell.
It looks at the page, tries automatically to determine what it
is about, and reduces it down to only a few words, like a dozen or so possible
keywords and phrases.
The more words there are on a page, the more
likely you are to talk about several topics, which in turn dilutes the dozen or
so possible term vectors that the page can be about. Ideally, you want to focus
the page on a single theme or keyword and describe its context with several two
or three word combos.
If I had to pick a number, I would say to try to
keep pages between 100 to 700 words, unless you really know what you are
doing.
7. Changes in Ranking
With term vector databases, your pages may have been discovered
by the engine's spider and given a ranking but have not yet been added to the
database. Does this mean that once the pages are actually added to the
database, their ranking could go up or down?
Campbell explains,
Yes, a page may have been discovered by a crawler but not yet
folded in to the TVD. The temporary positioning in search results is based on
the likelihood that your page contains relevant information. It's commonly
called page reputation or what your page is known for. It is largely based on
what incoming links say your site is about.
Once the engine recompiles
its index, the page reputation will be compared to the term vector using a
complicated mathematical formula and weighting scheme. In short, the reputation
of the page and the term vector of the page must match to be a top 10
contender. The further away the numbers are from each other, the less
relevancy, and the poorer the positioning of the page in search results.
How does Inktomi's 3rd Generation Engine Compare with
AltaVista's?
Campbell says that he hasn't seen a lot of difference
between the two.
They all seem to be going in the same general direction. But
to be sure, the customization or proprietary experience at one engine over
another will be their big selling point in the future.
They will
definitely want to make their search experience unique -- to give the users a
brand, or reason, why they would rather fight than switch. Otherwise, they may
fade into the same old bland mediocrity and continue to lose traffic because of
it.
Tips on Working with Theme Engines
How can we create Web pages that theme engines will like and
boost our odds at getting top rankings?
- The days of having one single Web site devoted to your entire
operation are virtually over. When working with theme engines, you'll want to
make sure that what your page is about matches what the engines believe your
page is about which also matches what other sites believe your page is
about.
So, set up additional Web sites for different areas of your
company. If you sell sports equipment, set up a site for hunting equipment, one
for fishing, for baseball, and so forth. Interlink them together, carefully
controlling how you're describing the links pointing to those other pages.
Make the overall design of the sites similar, so that customers will
understand that they're still on your "turf," but change the content and the
featured theme.
- When using link text, try eliminating punctuation marks and
small, inconsequential words, like "and," "the," "it," and "for." Cut to the
chase with link text by putting JUST the keyword phrase that you're aiming
toward.
- Related link popularity is crucial, so if you have one site
that's listed in several of the large, important directories, be sure to link
it to your other sites, especially new ones.
- FOCUS! Keep each page focused on one topic, and keep each
site focused on one topic. Ideas for topics include your different services,
product names or categories, uses of those products, etc. Make sure that each
interior or information page reinforces the main theme of the entire site.
"Use two, or at the most three, word combos in links to achieve this,"
suggests Campbell.
Use incoming and outgoing related links and
content-rich pages.
- "Pull all the pages out of the database, set them up as
static pages, and put them to work for you in the search engines," Campbell
suggests.
- Check each page carefully. Make sure that everything on the
page points to one central theme or has one focus. Do everything you can to
make sure that the engine understands what your primary theme is.
- Don't use nearly identical pages with slightly different
keyword phrases. Instead, create new and different pages for your keyword
phrases.
More work? Absolutely. Will it pay off? You'll certainly have
a much better chance at top rankings while avoiding the redundancy filters.
- Create your pages as if you're writing an article on your
keyword phrase. Don't be afraid of content, but don't go overboard with it
either.
- As you focus each page on a particular keyword phrase, use
that keyword phrase in your tags and text on the page: title, headline, ALT,
URLs, link text, META tags, etc.
META tags certainly don't hold the
importance that they once did, so don't depend solely on them to achieve a top
ranking.
- Go after keyword phrases that aren't as competitive in the
beginning. Then, go after the more competitive phrases next.
What Does the Future Hold?
Campbell answers,
In the future, you might be able to load the engine full of
lists of keywords. Your interests, likes and dislikes, geographical info, and
favorite Web sites can be entered, from which the engine can create a context
engine just for you. Just think, they'll know what your next search is likely
to be, even before you do.
It's almost frightening, isn't it?
For More
Information
Michael has already published three very good search
engine reports and is working on another that discusses:
- Offpage criteria: What's on your page is only half the
optimization battle.
- How search engines determine what your page is known for.
- Why you might want to tell people to "link off" of your web
site.
- Establishing a "reputation" on a topic using leverage from
other sites.
- How to turn Google's Page Rank system into an unfair
advantage.
- Are you a hub or an authority? How to use both for maximum
positioning.
- Handy tips for getting listed in Google and staying there.
- Making your site theme proof: A fail-safe strategy to ensure
success.
- How to link different themes together and dominate the search
engines.
If you would like to order any of Michael's reports go to
http://www.searchenginepositioning.com
More Research Sources
Read the WWW9 research papers by visiting
http://www9.org/w9cdrom
Read
AltaVista's September 15, 2000 press release which mentions their 3rd
generation engine.
http://doc.altavista.com/company_info/press/pr091500.html
Read Inktomi's April 11, 2000 press release announcing their 3rd
generation search engine.
http://www1.inktomi.com/new/press/2000/gen3.html
Robin Nobles teaches 2-, 3-, and 5-day hands-on search engine marketing workshops in locations across the globe (SearchEngineWorkshops.com) as well as online SEO training courses (OnlineWebTraining.com). They have recently launched localized SEO training centers through SearchEngineAcademy.com, and they have expanded their workshops to Europe with Search Engine Workshops UK. They have also opened the first networking community for SEOs, the Workshop Resource Center (WRC).
Click here to go back to the
index of search engine marketing articles

This work is licensed
under a Creative Commons License.