tips for Search Engine Optimization (SEO)
There is no magical way to put a site in good position in search results of major search engines like Google or Yahoo. A lot of hard work is necessary to build a good website, with good and useful content, that is easily accessible to people and bots. Here are some tips for search engine optimization:
correct HTTP status response in server header
It is important to indicate correctly the HTTP status response of URLs in the HTTP status response server header. Search engines like Google and Yahoo index in their search results only URLs that return in the server header HTTP status response 200 (OK). It is good practice for a site to return the relevant HTTP status response for errors like page not found - HTTP status response 404 (Not Found) - or internal server error - HTTP status response 500 (Internal Server Error) - so search engines can know which are the URLs with valid content that need indexing, and which are URLs corresponding to 'not found' or 'server error' pages. For most websites it is good practice to return HTTP status 403 (Forbidden) for URLs of folders without index file, not 200 (OK) with the folder tree.
The check URL tool can be used to view the server HTTP header.
the robots.txt file
The robots.txt file is the most important file of a website for search engines. The robots.txt file indicates to bots which URLs of the site are not to be accessed. Not all bots respect the rules in the robots.txt file, but all major search engines do, and it is very important for a website to have a correct robots.txt file. It is good practice to block in the robots.txt file to search engines URLs that can damage the way a site is crawled and indexed by search engines, for example URLs that can create an infinite URL space, like URLs with session Ids, or next page URLs of a calendar.
URLs blocked by robots.txt can appear in search results if they are linked to from other pages, but they appear with only the URL in search results, no title or snippet. To prevent a URL appearing in search results a meta noindex tag is needed in the <head> element of the page, and the page must not be blocked in the robots.txt file, for search engines to be able to read the meta noindex file. In this case it is necessary for the page to have valid (X)HTML mark-up, for search engines to be able to parse the page correctly and extract the noindex meta tag.
It helps search engines if they are presented with URLs with indexable content, or where necessary, with HTTP redirects in the server header. URLs with content that points to other URLs (via frames or meta refresh tags) can create problems for a website with crawling and indexing in search results.
valid (X)HTML and CSS mark-up
Help browsers and search engines to parse correctly the content of your pages by using correct (X)HTML and CSS mark-up. It is good practice to have all styling information in an external CSS file, for better use of bandwidth, compact HTML source content and accessibility (if people need to use their own CSS settings in their browsers).
HTML elements of high visibility to search engines
Have good smart content for HTML elements of high visibility to search engines like the <title> element, meta description tag, <h1>, <h2>... headings, anchor text of links, the alt attribute of the <img> element for images. Make things simple, for example do not use nested <span> elements within <a> or <h1>tags, do the styling by using the class attribute of <a> or <h1>
search-engine friendly URLs
Use friendly URLs, with a word or two describing the content of the page. Google recommends hyphens as word separators in the URLs.
good crawlable site navigation.
Have well structured site navigation. Do not have too many links in an HTML page, more than a few hundred links on a page can cause problems for bots and people.
Indicate to search engines which are the URLs you want indexed in search results by submitting a good XML sitemap. A good XML sitemap has to respect the correct XML syntax specified by search engines (Google, Yahoo, MSN, Ask), and it should not contain URLs that do not need to be indexed in search results. URLs can be collected by search engines also from hyperlinks, not only from the sitemap, but it is important to have in the sitemap only the URLs that need to be indexed in search results, and they have to be URLs that do not redirect.
avoid duplicate content
Google and other search engines introduced recently support for the new <link rel="canonical"...> to point search engines to the canonical URL (the one to be indexed in search results) from URLs with similar content
avoid URL infinite spaces
Examples of infinite spaces are a calendar with a "next month" link, automatically generated session Id parameters in the URL query string, URLs generated by a search script. URLs like these have to be disallowed in the robots.txt file and the link nofollowed.
avoid things that can be perceived as intended to manipulate search results
Avoid link schemes and domain farms, they are against guidelines published by all important search engines. If in doubt about a link add rel="nofollow" The link will be usable by people visiting your site, but it will indicate to search engines not to follow that specific occurence of the link for inclusion in search results. Avoid hidden content and keyword stuffing. Keyword stuffing can be any out-of-context larg chunk of words important to the website, but not included in the HTML content in a meaningful way, for example too many repetitive phrases in the meta keyword tag, or in the alt attribute of <img> elements
- Google help center for webmasters,
- Google webmaster blog,
- Google webmaster help forum,
- Google Webmaster Tools,
- sitemaps.org - the reference website for XML sitemaps,
- Google blog article about URL infinite spaces
- Google blog article about OCR - character recognition for images embedded in PDF files for Google,
- Google blog article about canonical link tag.