MediaWiki:Google Sitemaps


!!! The wiki no longer exists. This page is only remaining for archival purposes.

Currently there is no official automatic [[Google Sitemaps|Sitemaps]] generation for MediaWiki installations. So the ThinkLemon wiki uses a custom script to build a Sitemap file from the Wiki database.

== What it does ==

This [[PHP]] script gathers information from the MediaWiki installation and fetches the page titles, namespaces and timestamps from the database. It then outputs the page collection to the required [[XML]] format.

See [http://www.thinklemon.com/wiki/sitemap.xml.php sitemap.xml.php] for the results of this wiki.

”’Update”’: This version (v0.3) has been updated to work with MediaWiki 1.5.x installations. It does NOT work with 1.3.x and 1.4.x versions of MediaWiki. Nor is it guaranteed to work with future versions.

See the [http://www.thinklemon.com/wiki/index.php?title=MediaWiki:Google_Sitemaps&oldid=1252 November 25, 2005 version of this page] if you are running a 1.4.x version of MediaWiki

== Instructions ==

# Copy-paste the source code from below to an empty text file.

# Save the file as sitemap.xml.php.

# Upload the script to the dir containing your MediaWiki installation.

# Test the script by calling it from a browser.

If you are certain the output is correct then apply the sitemap to [[Google Sitemaps]]

Note that Google is picky on placement of the script. It has to be in the path as were the to index pages are.

Please put Q&A in the [[MediaWiki talk:Google Sitemaps|discussion page]].

== Sourcecode ==

Version 0.3

< ?php
# -----------------------------------------------------
# MediaWiki - Google Sitemaps generation. v0.3
#
# A page that'll generate valid Google Sitemaps code
# from the current MediaWiki installation.
# v0.3: Small changes to fix others situations
# v0.2: Updated for MediaWiki 1.5.x
# v0.1: First attempt for MediaWiki 1.4.x
#
# See http://www.thinklemon.com/wiki/MediaWiki:Google_Sitemaps
#
# TODO: Further refinements like caching...
# -----------------------------------------------------

# -----------------------------------------------------
# Includes
# Need to include/require some Mediawiki stuff
# especially LocalSettings.php for definitions.
# -----------------------------------------------------

define( 'MEDIAWIKI', true );

require_once( './LocalSettings.php' );
require_once( 'includes/GlobalFunctions.php' );

# -----------------------------------------------------
# Send XML header, tell agents this is XML.
# -----------------------------------------------------

header("Content-Type: application/xml; charset=UTF-8");

# -----------------------------------------------------
# Send xml-prolog
# -----------------------------------------------------

echo '<'.'?xml version="1.0" encoding="utf-8" ?'.">\n"; 

# -----------------------------------------------------
# Start connection
# -----------------------------------------------------

$connWikiDB = mysql_pconnect($wgDBserver, $wgDBuser, $wgDBpassword)
	or trigger_error(mysql_error(),E_USER_ERROR);
mysql_select_db($wgDBname, $connWikiDB);

# -----------------------------------------------------
# Build query
# Skipping redirects and MediaWiki namespace
# -----------------------------------------------------

$query_rsPages = "SELECT page_namespace, page_title, page_touched ".
	"FROM ".$wgDBprefix."page ".
	"WHERE (page_is_redirect = 0 AND page_namespace NOT IN (8, 9)) ".
	"ORDER BY page_touched DESC";

# -----------------------------------------------------
# Fetch the data from the DB
# -----------------------------------------------------

$rsPages = mysql_query($query_rsPages, $connWikiDB) or die(mysql_error());
# Fetch the array of pages
$row_rsPages = mysql_fetch_assoc($rsPages);
$totalRows_rsPages = mysql_num_rows($rsPages);

# -----------------------------------------------------
# Start output
# -----------------------------------------------------

?>



< ?php  // Find Project Namespace if($wgMetaNamespace === FALSE) 	$wgMetaNamespace = str_replace( ' ', '_', $wgSitename ); do {  	# ----------------------------------------------------- 	# 1. Determine the pagetitle using namespace:page_name 	# 2. Set priority of the namespace 	# ----------------------------------------------------- 	 	$nPriority = 0; 	switch ($row_rsPages['page_namespace']) { 		case "1": 			$sPageName = "Talk:".$row_rsPages['page_title']; 			$nPriority = 0.9; 			break; 		case 2: 			$sPageName = "User:".$row_rsPages['page_title']; 			$nPriority = 0.7; 			break; 		case 3: 			$sPageName = "User_talk:".$row_rsPages['page_title']; 			$nPriority = 0.6; 			break; 		case 4: 			$sPageName = $wgMetaNamespace.":".$row_rsPages['page_title']; 			$nPriority = 0.9; 			break; 		case 5: 			$sPageName = $wgMetaNamespace."_talk:".$row_rsPages['page_title']; 			$nPriority = 0.8; 			break; 		case 6: 			$sPageName = "Image:".$row_rsPages['page_title']; 			$nPriority = 0.5; 			break; 		case 7: 			$sPageName = "Image_talk:".$row_rsPages['page_title']; 			$nPriority = 0.4; 			break; 		case 8: 			$sPageName = "MediaWiki:".$row_rsPages['page_title']; 			$nPriority = 0.4; 			break; 		case 9: 			$sPageName = "MediaWiki_talk:".$row_rsPages['page_title']; 			$nPriority = 0.3; 			break; 		case 10: 			$sPageName = "Template:".$row_rsPages['page_title']; 			$nPriority = 0.3; 			break; 		case 11: 			$sPageName = "Template_talk:".$row_rsPages['page_title']; 			$nPriority = 0.2; 			break; 		case 12: 			$sPageName = "Help:".$row_rsPages['page_title']; 			$nPriority = 0.1; 			break; 		case 13: 			$sPageName = "Help_talk:".$row_rsPages['page_title']; 			$nPriority = 0.1; 			break; 		case 14: 			$sPageName = "Category:".$row_rsPages['page_title']; 			$nPriority = 0.6; 			break; 		case 15: 			$sPageName = "Category_talk:".$row_rsPages['page_title']; 			$nPriority = 0.5; 			break; 		default: 			$sPageName = $row_rsPages['page_title']; 			$nPriority = 1; 	} # ----------------------------------------------------- # Start output # ----------------------------------------------------- ?>
	
		< ?php echo fnXmlEncode( "http://" . $wgServerName . eregi_replace('\$1',$sPageName,$wgArticlePath) ) ?>
		< ?php echo fnTimestampToIso($row_rsPages['page_touched']); ?>
		weekly
< ?php echo $nPriority ?>
	
< ?php } while ($row_rsPages = mysql_fetch_assoc($rsPages)); ?>

< ?php # ----------------------------------------------------- # Clear Connection # ----------------------------------------------------- mysql_free_result($rsPages); # ----------------------------------------------------- # General functions # ----------------------------------------------------- // Convert timestamp to ISO format function fnTimestampToIso($ts) { 	# $ts is a MediaWiki Timestamp (TS_MW) 	# ISO-standard timestamp (YYYY-MM-DDTHH:MM:SS+00:00) 	return gmdate( 'Y-m-d\TH:i:s\+00:00', wfTimestamp( TS_UNIX, $ts ) ); } // Convert string to XML safe encoding function fnXmlEncode( $string ) { 	$string = str_replace( "\r\n", "\n", $string ); 	$string = preg_replace( '/[\x00-\x08\x0b\x0c\x0e-\x1f]/', '', $string ); 	return htmlspecialchars( $string ); } ?>

== Version History ==

* V0.3: Script updated for other install situations.

:* Altered SQL Statement to exclude namespaces instead of include.

:* Added Template-namespaces and per-namespace priority.

:* Added XML encoding of pagetitles and correct path.

* V0.2: Script updated for MediaWiki 1.5.x as the database schema changed. The ‘cur’ table has moved to the ‘page’ table.

* V0.1: first attempt at Google Sitemaps for MediaWiki 1.4.x.

== Disclaimer ==

This script is provided as-is. It is not guaranteed to work at other webservers other than the ThinkLemon.com domain.

ThinkLemon.com cannot be held liable for loss of data, crashing servers, loss of business, loss of whatever. Take the script, TEST it and change it to make it work for you, again TEST it.

[[Category:MediaWiki|Google Sitemaps]] [[Category:Techniques]]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.