XML: build a sitemap for search engines

A Sitemap is an XML file which contains all the information needed by search engines to crawl a web site, and eventually index it more efficiently. The XML file mainly contains URLs and additional information which can be used by search engines crawlers. If a Sitemap is not present in the root of a web site, the crawlers usually will follow the links inside a page and try to "discover" all the pages of the site. With a Sitemap, that process is done more efficiently, quickly and the web master is sure that all relevant pages are indeed analysed by search engines spiders. Sitemaps do not guarantee that a web site is indexed and ranked, however they help a lot the process.

There are many Sitemaps creator on the web, however I would like to explain how to build one yourself. As a side note, you should know that RSS feeds can be used as Sitemaps in tools like Google Webmaster, however it is better to use the Sitemap protocol to ensure that other web spiders understand the structure of your site.


How to build a Sitemap
The Sitemap is an XML file with specific protocol. The information inside the file's nodes will provide essential information for web crawlers. We start building the XML file stating the version and the charset:
Then we define the protocol standard. The urlset tag will encapsulate all the urls in our Sitemap:
The xmlns used here is a general one. You can use schemas from Google as well:
As said, the urlset tag is now open, and inside it we place the actual urls following this pattern:

  http://www.example.com/
  2011-01-01
  monthly
  0.8
Then we repeat it for every single page we want to include in the Sitemap.

What's the meaning of all those tags?
As you can see, every single page included in the Sitemap has different information. Let's see them, one by one.
is the URL of your page. The first URL should be the URL of your web site, then the other URLs should be inside pages (something like http://www.example.com/products.html or http://www.example.com/products.asp?code=12 etc). You can actually point to specific pages and basically could be anything relevant to your web site. Note that if your web server requires it, you should end the URL with a backslash. The URL have to start with the appropriate protocol (such as http://).
stands for last modified and it's optional. This is the date when the page has been modified. The date must be in W3C datetime format. I usually use the YYYY-MM-DD format.
is stating how frequently the page is changed. It is an optional tag. Valid values for the tag are:
  1. always
  2. hourly
  3. daily
  4. weekly
  5. monthly
  6. yearly
  7. never
It should be noted that this tag is not really "pushing" the web crawlers to check the page more or less often. Someone would think that using "always" will make the spiders come back more often and thus obtain a higher rank. That is not true. It is just an hint to spiders, and it won't work as crawl frequency on its own.
is again an optional tag, but it's quite interesting. It gives a priority to your URL relative to other URL in the site. The default value is 0.5, and it can be change to a value from 0.0 to 1.0. Using it, the Sitemap will indicate which are the most important pages in the web sites. Again, using this tag will not put your most important pages in a higher rank, however it will help web crawlers to better understand the structure of your web site. Do not put every page to a 1.0 priority, hoping to obtain a higher ranking. That is definitely not working!

Closing the nodes
After inserting all our relevant pages (meaning that you don't really need to put all of them), we close the tag with .
Now your Sitemap page is ready to be published. Upload it and submit it to your favourite search engine. As said before, you can use Google Webmaster Tools to do it.
I hope that this post will help you. You can visit sitemaps.org for more information.