SEO, Latent Semantic Indexing and Natural Language Processing
By Michael Marshall
There has been much discussion lately about content relevancy
and the relationship between Latent Semantic Indexing (LSI)
and search engine optimization.
Some say LSI has nothing to do with how Google scores
pages. Others hail
LSI as a powerful method of optimizing or theming your content
and leads to marked improvement in natural search results.
Before proceeding, it is important to point out that there is
much confusion and misconception in the SEO industry with
regard to LSI. Within
the industry among the typical SEO professional, the term LSI
is used quite pervasively but also quite loosely.
Almost any technology that a search engine might use to
aid in the process of determining the meaning of content gets
labeled as LSI. The
way most in the world of SEO use LSI has really become a catch
all term for any technology used for the purpose of semantic
LSI is really a specific implementation of a whole family of
approaches to semantic analysis, all of which might be said
properly to fall under the area of Natural Language Processing
(NLP). Even the description which some critics of LSI
provide, while accurate as far as they go, are only a
description of one way of going about LSI which
itself is only one method in the world of NLP. Google
could be using any of these other forms of LSI or even different approaches under NLP,
any number of which still focuses on find meaning in a hidden
or latent fashion.
weakness of the form of LSI some
critics describe is its scalability. They are correct in
that and for that reason a search engine like Google would
most likely not be using that particular form of LSI.
However, they could be using some other form of LSI and
most definitely do use one or more of the other many different
approaches to semantic analysis within NLP.
experiments with search results to show that Google is not
using LSI (as
popularly described). Off
the top of my head, there are a few substantive reasons why
such examples would not be a good test to determine whether LSI (as popularly described) was being used by
Comparing results between singular vs. plural and different verb
tenses has much more to do with linguistic stemming than it
does with LSI.
Some analyses of the examples totally ignore how much other factors
play a role in Google's algorithm and as such would obscure
the influence of LSI if you look merely at the number of
results returned and the ordering of those results.
Not seeing the differences one expects in the number of results and
similarity in the top 5 results only counts against the idea
of Google using LSI if one presupposes that LSI is
or must be the dominant factor in Google's algorithm. Not
even the most ardent proponents of LSI would hold to that.
such the problems, I agree with some critics that Google
probably does not use the form of LSI popularly described primarily because that
particular form does have a serious scalability issue. However,
as I have said, that is not the only form of LSI and LSI is
not the only approach to NLP that a search engine like Google
definitely uses NLP technology in determining the meaning of
content and therefore NLP has a significant relationship and
importance to SEO. Some
awareness of and tools for NLP can greatly help in the
optimization of web page content for relevance and can have a
significant impact on performance in natural search results.
At the Marketing
Pilgrim blog, an interview with Joe Hall and Marie-Claire
Jenkins (C.J.), who is completing a PhD in Natural
Language Processing (NLP) and Artificial Intelligence at the
University of East Anglia, highlights the important
relationship between SEO and NLP.
Is there a relationship between SEO and NLP? If so, what is
is indeed a relationship.
search engines use words to assess what a web page is about, using
NLP amongst other techniques. The content on a web page
will help determine what the topic of the page is.
the techniques used in NLP allows us to provide the best
format and patterns for the search engine.
fact I think that the entire site is affected because
analysing a whole site, each page, helps to determine exactly
what a site is about. Seeing as NLP seeks to mimic human
language understanding, using common sense is a good idea.
is why search engines always recommend writing good relevant
should be pointed out that SEM S.C.O.U.T., a tool promoted by
Search Engine Workshops, does not use LSI;
it uses another very powerful (well-tested) method among the
many approaches to NLP.
Do you Need Access to SEO
Includes Live and recorded Webinars
Live SEO Group discussions
Plus much more
All from the comfort of your home.....
the all that is included Now
About Michael Marshall:
over 19 years experience in information technology covering a
wide range of specialties including: web design, software
engineering, e-commerce solutions, artificial intelligence,
and Internet marketing. He is a member of the World
Association of Internet Marketers and of SEO Professionals. He
has degrees in Linguistics, Philosophy and Theology.
Michael is a contributing author to SEOToday.com,
the premier website for SEM professionals, and a contributor
to “Building Your Business With Google for Dummies” by
Brad Hill (Wiley Publishing). He is a frequent presenter at
Ultra Advanced SEO Symposiums, a meeting of select masters of
the search engine marketing industry, and has been invited
many times to speak at Search
Engine Workshops. Michael is a licensed instructor
at the North
Carolina Search Engine Academy. He has also recently
become an instructor on search engine technologies at the U.S.