What do search engines see in my page?

Search engines are your ultimate blind visitors. They don’t see JavaScript, stumble over framesets, ignore CSS, feel around HTML-tags and leave them alone, choke on Flash. All they really want to see is content. And that content is plain text. Text that can be indexed, weighed, stored, chunked, ranked & retrieved. Or whatever it is they do…

So if you want a glimpse of what Google/Yahoo/MSN can really see in your pages take a look at the Search Engine Spider Simulator. It’ll take your page, strip off all HTML, media, links, meta-info and commonly used words. What is left is a resumé of all the unique words that are on your page.

Now if this simulator doesn’t return anything maybe that’s your explanation for not being indexed.
No content? Why bother.

Google Sitemaps weblog plugins

In the previous post I said that Google Sitemaps will accept your feeds just as well. And it does, no worries. But as I looked further into the dynamic generation of sitemaps I found there were WordPress plugins already available. (Just 3 days after the service went public. How’s that for a user community …)

Currently I have Arne Brachhold’s Google Sitemaps Generator v2 Final running. All I had to do was upload the plugin, activate it, make an empty sitemap.xml writable and I was up and running. You’ll get an extra administration page after activation where you can set a whole lot of options. We’ll see how this one fares.

If that plugin doesn’t work for you. Dirk Zimmermann also has a plugin, although that one didn’t work out for me as I have my WP in a subdir (presumably).

People using Movable Type may want to look at Niall Kennedy’s Weblog.

Update: Arne Brachold’s Google Sitemap Generator for WordPress just bumped up to version 2.5. Good stuff: 1. you can now add external pages that aren’t generated with WordPress. 2. The plugin pings Google to notify them of an update. 3. The plugin has become multi-lingual.

Google Sitemaps (Beta)

To let webmasters help Google index their site better there is Google Sitemaps. Sounds like a good idea.

So how does it work?
First you need to have a Google account (having a GMail account is probably enough).
Second you need to create a Sitemap file in the root of your site. This is an XML file that lists all your indexable pages. Google even provides a generator for this file.
Third you have to tell Google where your sitemap file can be found.
Last, wait to see what Google does with the sitemap file.

I’m still stuck at the the second stage. The generator from Google requires Python to run. Unfortunately I can’t. I don’t like to update the file mannualy so I’d like this to be automated. If anyone knows of a good solution to generate sitemaps automatically. I’d love to hear about it.

Update: It seems that Google Sitemaps will accept RSS 2.0 and Atom 0.3 feeds as well. So for now I’ve added those. And looking at my logs I see Google visit some links.