Generate Sitemap for Google and other Search Engines using PHP
Google has a Sitemap generator written using Python. Python is a great language, but I prefer PHP. I took the basic idea of the Sitemap generator and converted it into a solution more suitable for my application needs. If you are looking for a way to easily create a Sitemap for submitting to Google and other search engines, you have come to the right place. I call this one cron_gen_sm.php.
You can download cron_gen_sm.php
here. Also, you
can view partial code below.
<?php
/*******************************************************************************
cron_gen_sm.php
===========================================
Created on August 17, 2008
by Mike Rodarte of Mike's Technical Service
http://www.mts7.com
===========================================
This file is a PHP version of the Python Google Sitemap Generator. It is set to
use currently existing files on the server to help create the Sitemap and send
it to Google.
=======================
Configuration variables
-----------------------
base_dir - similar to DOCUMENT_ROOT: /home/user/www/
base_path - like http://www.example.com/ -- notice the trailing slash
Sitemap - file name for the Sitemap
SEND_TO_GOOGLE - 1 for yes, 0 for no
Also available below are three variations of generating your Sitemap. You can
pull the necessary data from a database, extract the data from a text file, or
manually create a static array with the necessary values. Future versions will
have directory scanning functionality.
=================
File dependencies
-----------------
init.php - includes files required for this script to execute
- defines NL, SEP, and $query_count
inc/funcs.php - has writeLogFile
inc/db_vars.php - creates database handler $dbh
inc/display_funcs.php - print_array
Any of the above mentioned variables, constants, and/or functions are available
upon request.
====================
Database - PDO MySQL
--------------------
This project makes the use of PDO (PHP Data Objects) for its database function.
This is not required for the script to execute properly unless you want to use
it with your database. Please make the necessary adjustments to make this
script execute with your current database. Future versions may have the
standard mysql_query type functions available in an additional get_pages_from_db
function.
The use of this code is permissible as long as this comment block is included in
its entirety in the source code file. Please give credit to whom credit is due.
*******************************************************************************/
// require_once("init.php");
// configure the four options below for your server configuration
$base_dir = '/home/username/public_html/';
$base_path = 'http://www.example.com/';
$Sitemap = "sitemap.xml";
// set to 1 for yes and 0 for no
define("SEND_TO_GOOGLE", 1);
// get page file names from a database
// uncomment the line below to use a database to retrieve page file names
// $files = get_pages_from_db();
// uncomment the lines below to read page file names from a text file
// the text file should have entries in this format: file.ext daily
// get page file names from a text file
//$files = get_pages_from_file($base_dir.'page_file_names.txt');
// create static array of page file names to include in the site map
// uncomment the lines below to use a static array of pages
$files = array(
'index.php' => array('yearly', '1.0'),
'about.php' => array('monthly', '.33')
);
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
///////////////////////// DO NOT EDIT BELOW THIS LINE //////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
$msg = ''; // diagnostic message
// command to execute to get the last modified date
define("CMD_LAST_MOD", "date --iso-8601=seconds -u -r ");
// general file header for use with Google type Sitemaps
$file_header =
'<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">'.NL;
// general file footer for use with the Sitemap
$file_footer = '</urlset>';
// create array of frequencies with associated priorities
$freqs = array (
'always' => '1',
'hourly' => '.75',
'daily' => '.67',
'weekly' => '.5',
'monthly' => '.33',
'yearly' => '.1',
'never' => '0'
);
// include root directory in sitemap with highest priority
$xml_data = ' <url>'.NL.' <loc>'.$base_path.'</loc>'.NL;
$xml_data .= ' <changefreq>daily</changefreq>'.NL;
$xml_data .= ' <priority>1.0</priority>'.NL;
$xml_data .= ' </url>'.NL;
foreach($files as $name=>$file) {
// get data from array
$freq = isset($file[0]) ? $file[0] : 'monthly';
$priority = isset($file[1]) ? $file[1] : $freqs[$freq];
// $name = key($files);
// get last modified time from system
$last_mod = system(CMD_LAST_MOD.$base_dir.$name);
// the CMD_LAST_MOD returns a value in a non-supported format
// the line below converts it to the Google-preferred format
$last_mod = str_replace("+0000", "+00:00", $last_mod);
// write information to xml_data
$xml_data .= ' <url>'.NL.' <loc>'.$base_path.$name.'</loc>'.NL;
$xml_data .= ' <lastmod>'.$last_mod.'</lastmod>'.NL;
$xml_data .= ' <changefreq>'.strtolower($freq).'</changefreq>'.NL;
$xml_data .= ' <priority>'.$priority.'</priority>'.NL;
$xml_data .= ' </url>'.NL;
}
unset($files); // free memory
// create contents of xml file
$xml_file = $file_header.$xml_data.$file_footer;
// create sitemap file at the base directory
writeLogFile($base_dir.$Sitemap, $xml_file, 0 , 1);
if ( SEND_TO_GOOGLE ) {
$msg .= "sending to google...".SEP;
$url = $base_path."sitemap.xml";
$query = urlencode($url);
// notify Google of sitemap
$g_url = "http://www.google.com/webmasters/tools/ping?sitemap=$query";
// use cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $g_url);
if ( curl_exec($ch) ) {
$response = curl_getinfo($ch);
if ( is_array($response) ) {
$msg .= print_array($response) . SEP; // print_array is custom function
}
else {
$msg .= $response . SEP;
}
}
else {
$msg .= curl_error($ch) . SEP;
}
curl_close($ch);
}
/*******************************************************************************
array get_pages_from_db ( )
This function assumes the use of PDO MySQL with a database handler called dbh.
It also makes use of the variable query_count defined in the init.php file. If
you do not have PDO MySQL enabled on your server, please create a new function
to use the mysql_query sort of functions to achieve the same purpose.
If the query is successful, the results are stored in an array in the proper
format.
This function can be manipulated to receive field names, table name, where
clause, and order by clause in the parameter list, should this be necessary.
*******************************************************************************/
function get_pages_from_db() {
global $dbh, $query_count, $msg;
$files = array();
$q = 'SELECT * FROM pages WHERE disp = 1 ORDER BY page_file_name;';
$stmt = $dbh->prepare($q);
if ( $stmt->execute() ) {
$query_count++;
$rows = $stmt->fetchAll(PDO::FETCH_ASSOC);
$num_rows = count($rows);
if ( 0 < $num_rows ) {
foreach($rows as $row) {
$files[$row['page_file_name']][0] = $row['page_change_freq'];
$files[$row['page_file_name']][1] = $row['page_priority'];
}
}
}
else {
$errors = $stmt->errorInfo();
$msg .= $q . SEP . $errors[2] . SEP;
}
$stmt = null;
return $files;
}
/*******************************************************************************
array get_pages_from_file ( string $file_name )
This function receives a file name from the caller and tries to read the file
contents into an array. If successful, it populates a different array with its
space delimited values.
*******************************************************************************/
function get_pages_from_file() {
global $freqs;
$args = func_get_args();
$file_name = $args[0];
$files = array();
$priority = '';
$freq = '';
if ( is_file($file_name) && file_exists($file_name) ) {
$file = file($file_name, FILE_IGNORE_NEW_LINES);
if ( is_array($file) ) {
foreach($file as $line) {
trim($line);
// determine what has been set
$vars = explode(" ", $line);
$page = $vars[0];
if ( isset($vars[1]) ) {
if ( is_numeric($vars[1]) && $vars[1] <= 1 ) {
$priority = $vars[1];
}
else if ( array_key_exists($vars[1], $freqs) ) {
$freq = $vars[1];
}
}
if ( isset($vars[2]) ) {
if ( is_numeric($vars[2]) && $vars[2] <= 1 && $priority == '' ) {
$priority = $vars[2];
}
else if ( array_key_exists($vars[2], $freqs) && $freq == '' ) {
$freq = $vars[2];
}
}
// add newly found values to files array
$files[$page][0] = $freq != '' ? $freq : 'monthly';
$files[$page][1] = $priority != '' ? $priority : '.5';
}
unset($file);
}
else {
$msg .= "$file is not an array" . SEP;
} // end if file is array
}
else {
$msg .= "$file_name is not a file" . SEP;
} // end if file name exists
return $files;
} // end get_pages_from_file
?>
You can upload cron_gen_sm.php to your current cron job script folder or anywhere else on your server. I prefer to send the Sitemap to Google once a week because I am not updating the links very often.
If you have questions or comments about this script, please use the live chat feature on the website here or send an email to the address on the contact page. This is not the only way to generate a Sitemap for Google, but it works well. Feel free to recommend revisions.
