Everyone in the online world knows extremely well that the
most sought after traffic to one's site comes from a Google
search. Folks, 80% of searches on the internet are done in
Google.
In theory, it is simple - if you have something interesting
to someone else, if you build a website with the honest to
goodness goal to provide something useful for someone else,
that someone else will find you. That is also how the creators
of Google describe their main goal, to more or less have
a great repository of information, and help people of our
planet find useful stuff.
In practice, it is not that simple. It is not that simple
because there are thousands, possibly even millions of sites
like yours, because you might be running a very honest online
business, selling some very useful product, but do not have
unheard of, exceptionally grand 'content'. If your site is
listed on page 265 of a search results set, be sure you will
never get any visitors that way.
Unlike Yahoo and others, who rely on human involvement,
Google does everything through automation. Websites are indexed
(or crawled, or spidered - all terms refer to the same process)
by their indexing software called Googlebot. Googlebot looks
at websites daily, and rules programmed into the software
decide which of your pages make it into the main Google index
and which don't. After your site was indexed, whether it
was submitted for indexing by a human or the robot just stumbled
upon it, your pages are ranked, so Google knows on which
page of a search to put your site on, and on what search
phrases should your site even be part of the result search.
The Googlebot is very smart and works really well. Keep
in mind however, that is just a piece of software, a very
sophisticated one, but it's just a computer program. Consequently,
it has a set of algorithms (rules) it uses to index web site
content (information), a set of capabilities (as I said before,
Googlebot is really intelligent) and a set of limitations.
As such, there is an impressive number of ways in which one
can trip up the Googlebot and make it impossible for it to
index your content. Alternately, the Googlebot can index
your site well, and then people will find it when searching
for words it contains.
This article will try to teach you all the basics necessary
to achieve consistency and persistency in Google, starting
with the very basic step: getting indexed by Googlebot, Google's
indexing robot.
1. Read Google's own Webmaster Guidelines
The people behind Google seem to have two main things down
to a science: One, most of their algorithms (rules) are so
secret, that all us non-Google employees do is speculate.
Two, their guidelines are very simple, direct and precise.
Following their guidelines will never hurt your site's ranking.
Disregarding their guidelines can and probably will hurt
you in the long run. So go to http://www.google.com/webmasters/guidelines.html
and read what Google has to say about itself.
2. Have text links.
Make every single page on your site accessible via a text-based
link, as opposed to Javascript, Flash, DHTML (Dynamic HTML),
etc. Googlebot's native language is text. Google says: "Make
a site with a clear hierarchy and text links. Every page
should be reachable from at least one static text link."
This is probably the number one key to your site's existence
in Google. Googlebot is actually a robotic, browser-like
software, based on the venerable Lynx browser. The reasoning
behind this approach is that the creators are trying to get
as close as possible to emulating human browsing, making
sure your website is actually human friendly. Consequently,
by downloading Lynx on your computer and looking at your
site through Lynx (http://lynx.isc.org), you will see more
or less exactly the information Googlebot can read and index
and the links Googlebot can follow. You will also see HTML
errors on your pages and places where a robot would be stuck
and could not reach the rest of your site.
I know it is very unfair to those of us who understand and
love the potential of websites built completely in Flash,
or other engines. However, until the nice folks who run Google
figure out a good way to crawl inside a Flash file and extract
the appropriate information, we are stuck with standard HTML.
This is not to say that you cannot make your site really
pretty and fill it with Java Script and Flash eye candy.
But you must have regular text and standard text links. Usually
you can achieve the desired effect by having extra navigation
menus based on standard text links.
3. Avoid frames.
Avoid frames at all cost. If you must use them (for example
to make someone else's page look like it's part of your site),
do not use them on your front page.
Frames are like the plague, they sneak up on you. It is
incredibly easy to lose Googlebot's tracks inside a badly
formatted frameset. You might hear that some of the robots,
including Google's Googlebot and Yahoo's Slurp are quickly
gaining capabilities to go inside frames properly. My philosophy
is, until a feature becomes ubiquitous, if you're uncertain,
leave it in the closet.
4. Keep the number of links on a given page less
than 100.
This comes straight from Google's Webmaster Guidelines: "Keep
the links on a given page to a reasonable number (fewer than
100)."
This looks more like a suggestion and I am not 100% sure
if you get penalized in any way or if Googlebot just stops
reading your links after 100. I can however tell you from
personal experience that I tried a page with 700 links and
it seemed fine. Then one day I tried to view the page from
my Blackberry PDA and I got this strange error message saying
my page is illegally formatted. After I split the page into
several ones with 80 links each, the pages worked on the
PDA also.
Who cares about the Blackberry? Well, if you're reading
this and your goal is to get visitors, then your main concern
should be not to alienate anyone. Remember, today more than
ever, people use different devices and different software
to access the web. Every visitor is a potential customer.
Every employee at a major US lawfirm and many other corporate
people use a Blackberry.
Lastly, why would you need that many links on one page anyway?
Let's say, for example, that you specialize in promotional
products - corporate branded gifts, such as pens, caps, mints
and other products (called sometimes 'premiums') imprinted
with one's logo. Your name is John Doe, and you decided to
name your company JDPromos (not very imaginative, but will
do for our examples). You would want to have every item in
your catalog as a text link, so every item gets indexed as
a link and as a keyword. Also, those who run forums, ezines,
blogs, might want to have standard links to their articles,
as the software they use might create dynamic links, invisible
to certain robots.
5. Give every page a meaningful title.
Give every single page on the site a complete and meaningful
title. This is also directly from Google's Webmaster Guidelines.
See Rule #1.
Incidentally, for those who are fascinated by the debates
on the death of the Meta Tags, the
The "title" tag is supported by every web creation tool
out there, and goes in the header of a web page (between
the "head" and the "/head" tags).
Google offers the 'allintitle' syntax, which lets users
search only text that appears in a page title. A lot of people
who integrate a Google bar into their websites allow users
to get results only by title. There are over 29 million results
returned for Untitled Document.
Most of us - myself included - copy and paste template pages,
out of the convenience of not having to recreate all design
elements from scratch. If you do so, do not forget to change
the title.
Make sure your title is not just a list of keywords and
that it is related to the actual content of the page. Google
can and will check that, before deciding on your page's 'relevance'.
6. Do not place important text inside images.
Google says: "Try to use text instead of images to display
important names, content, or links. The Google crawler doesn't
recognize text contained in images."
It is very tempting to create images with text inside them,
for the very simple reason that as designers, we are not
limited to the very few font (type) options that basic HTML
allows. Also, different browsers tend to display things differently
nowadays, so it is much easier to create a text image, which
will be shown consistently and not worry about styles, operating
systems, etc.
7. Use descriptive "ALT" tags.
The "ALT" tag is used as a text alternative (hence the name)
for images and image links and was designed so that text
browsers (such as Lynx) do not just display a generic 'Image'
for every picture link you might have. If all your links
say 'Image', how would a potential visitor know what they
are?
Make sure that the text description is meaningful and accurate.
Take our promotional items company as an example. Let's say
they have a picture of a tradeshow display, as an example
of a service they provide outside the ordinary imprinted
mint boxes, calculators and keychains. If the "ALT" tag only
says "display", that is what Googlebot will see and index.
If the tag says something like "example of a tradeshow display
design", that is certainly more useful and more Googlebot
friendly.
Please note that although the "ALT" tag does count and Google
seems to put a high price on this tag, it ranks lower than
plain text.
8. Use meaningful descriptions for links
With the risk of sounding like a scratched CD, I'll have
to say this again: Whether you use picture links or text
links, please use meaningful text inside your tags so that
Googlebot can associate that text with that href link.
In other words, let's pretend again that we are designing
that website for that imaginary promotional items company
we called JDPromos. If you intend to put a link to a set
of sample coffee mugs promos, say something like "link to
JDPromos samples of branded coffee mugs", not just "coffee
mugs", or even worse, "click here for pictures". Never use
link text like "read more" or "go here" or "download it", "click
here", "don't click here", you get the picture - I hope.
Don't try to fool the Googlebot with hidden links or duplicate
content or irrelevant pages of words like "sex" and "hot
girls." The Googlebot doesn't like being played and you will
be penalized, one way or another, in the long run.
9. Use a "description" tag for every page
Include a tag in your page header to summarize your site.
Use a meaningful one or two sentence description, do not
keyword spam.
Even better, include descriptive text on the site's front
page where users can actually read it. This text will appear
as the description for your site in Google results.
Place more important content higher in the page than less
important content in a page, Google does categorize text
on a page based on it's position, text at the bottom of a
page is considered less important, or 'relevant', to use
one of Google's own terms.
10. Use short query strings
Use URLs with query strings sparingly, if at all possible.
Query strings are also called dynamic pages. You can usually
recognize dynamic pages by the presence of the "?" character.
Keep in mind that the shorter the list of query string parameters,
the better. Be aware that not every search engine robot can
crawl dynamic pages as well as static pages. It helps to
keep the parameters short and the number of them few.
11. Never use the "&id=" parameter
If you must use query strings, or dynamic pages, never use
the "&id=" parameter as part of the string.
I know this might sound ridiculous, as it might be hard
or impossible for you not to use the "&id=" parameter,
but if you are a programmer and you can change the variable's
name, replace "id" with something else. Otherwise, Googlebot
will just skip that page.
Google says: "Don't use "&id=" as a parameter in your
URLs, as we don't include these pages in our index."
12. Use robots.txt
Use robots.txt to show the Googlebot around your site. This
ancient and very standard mechanism for directing well-behaved
robots like the Googlebot will allow you to specify places
where the robot is not welcome, whether for privacy reasons,
or for reasons of avoiding Google penalties. You might want
to keep the robot away from your cgi-bin directory and other
places you maybe don't want available to the entire searching
population of the globe. Remember this is a guideline, not
a barrier, robots that are not programmed to comply, will
disregard. Bottom line, use the robots.txt to guide Googlebot,
but not to enforce strict security.
Google says: "Make use of the robots.txt file on your web
server. This file tells crawlers which directories can or
cannot be crawled."
13. Make a sitemap
A site map is just a page on your website where you guide
your users through the structure of your site. The most basic
form of sitemap is a page that lists all of your pages, with
a brief description and a link - all text, of course. When
you make the sitemap, follow all the rules above and don't
forget that the purpose of the sitemap is to guide your human
visitor.
Google says: "Offer a site map to your users with links
that point to the important parts of your site. If the site
map is larger than 100 or so links, you may want to break
the site map into separate pages."
14. Use the Google Sitemaps project
At the time of this writing, the fastest, best and most
accurate way to make sure your site is properly crawled and
indexed by Googlebot is to participate in the Google Sitemaps
project.
In a nutshell, you make a sitemap as an XML page and submit
it directly to Google. Google then sends Googlebot to index
your site. Besides the speedy free submission, you also get
a good amount of statistics and the opportunity to fix potential
errors in your site.
Please note that the XML sitemap needed for the Google Sitemap
project is intended specifically for Googlebot, and is different
from the sitemap described in the previous Rule, which is
intended solely for human users.
Also, do not be afraid of XML, Google's sitemap is a very
simple text file and they give you all the necessary information
and directions at: https://www.google.com/webmasters/sitemaps
Good luck!
Andrei co-owns bsleek (http://www.bsleek.com) – a
site that specializes in web hosting, design, promotional
items, printing, CD Presentations and more. Andrei is on
the Board of Consultants for Daterade.com and has amassed
an extensive technical knowledge and experience through his
career as the CIO for a major travel management company and
through his past careers in military research, data acquisition
and aerospace engineering.
|