Just wanted to share some code that I wrote to generate a sitemap.xml file in Google's sitemap format. It's a django application modeled off of django.contrib.syndication. Here's the rub:

First, create a Sitemap subclass. Like Feed, this will be queried for a list of items, then queried for properties for each item returned. For this site, my BlogSitemap subclass looks like this:

class BlogSitemap (Sitemap):
    def items( self ):
        return Entry.objects.filter( is_draft=False )
    def lastmod( self, obj ):
        return obj.mod_date

You can specify location, lastmod, changefreq, and priority in a similar fashion. See Google's documentation for more information. The location property will default to the value returned from the object's get_absolute_url method.

Next, point your urlconf to the appropriate sitemap view (or views). The sitemap views expect a dictionary mapping sitemap names to Sitemap subclasses. idioteque.sitemap.views.sitemap will simply compile a global list of URLs from all entries in the passed-in dictionary. My urls.py looks like this:

sitemaps = { 'blog' : BlogSitemap }
( r'^sitemap.xml$', 'idioteque.sitemap.views.sitemap', {'sitemaps':sitemaps} )

You can also create a sitemap index that references separate files for particular sections. Have a look at my urls.py for an example.

Lastly, there is a convenience function (idioteque.sitemap.ping_google) to ping Google when you want them to re-download your sitemap. For my blog, I overrode Entry's save method to ping google when I post something that isn't a draft:

def save( self ):
    super(Entry, self).save()
    if not self.is_draft:
        ping_google( "theidioteque.net/sitemap.xml" )

There are still some rough edges, but I'm using this code. Things that still need to be ironed out include:

  • Caching the sitemap views. This should be as simple as adding a django.views.decorators.cache.cache_page decorator, but it breaks urlresolvers.reverse at the moment. I have a ticket in.
  • Improving the ping_google function, so you don't have to pass the full URL in. This seems like something that should be taken care of automatically, I just haven't thought of a good way.