Sample Code

Generate Sitemap for Google and other Search Engines using PHP

Google has a Sitemap generator written using Python. Python is a great language, but I prefer PHP. I took the basic idea of the Sitemap generator and converted it into a solution more suitable for my application needs. If you are looking for a way to easily create a Sitemap for submitting to Google and other search engines, you have come to the right place. I call this one cron_gen_sm.php.

You can download cron_gen_sm.php here. Also, you can view partial code below.

<?php
/*******************************************************************************
cron_gen_sm.php

===========================================
Created on August 17, 2008 
by Mike Rodarte of Mike's Technical Service
http://www.mts7.com
===========================================

This file is a PHP version of the Python Google Sitemap Generator.  It is set to 
use currently existing files on the server to help create the Sitemap and send 
it to Google.  

=======================
Configuration variables
-----------------------
    base_dir - similar to DOCUMENT_ROOT:  /home/user/www/
    base_path - like http://www.example.com/ -- notice the trailing slash
    Sitemap - file name for the Sitemap
    SEND_TO_GOOGLE - 1 for yes, 0 for no
    
Also available below are three variations of generating your Sitemap.  You can 
pull the necessary data from a database, extract the data from a text file, or 
manually create a static array with the necessary values.  Future versions will 
have directory scanning functionality.  

=================
File dependencies
-----------------
    init.php - includes files required for this script to execute
             - defines NL, SEP, and $query_count
    inc/funcs.php - has writeLogFile
    inc/db_vars.php - creates database handler $dbh
    inc/display_funcs.php - print_array

Any of the above mentioned variables, constants, and/or functions are available 
upon request.

====================
Database - PDO MySQL
--------------------
This project makes the use of PDO (PHP Data Objects) for its database function.  
This is not required for the script to execute properly unless you want to use 
it with your database.  Please make the necessary adjustments to make this 
script execute with your current database.  Future versions may have the 
standard mysql_query type functions available in an additional get_pages_from_db 
function.

The use of this code is permissible as long as this comment block is included in 
its entirety in the source code file.  Please give credit to whom credit is due.
*******************************************************************************/

// require_once("init.php");

// configure the four options below for your server configuration
$base_dir '/home/username/public_html/';
$base_path 'http://www.example.com/';
$Sitemap "sitemap.xml";

// set to 1 for yes and 0 for no
define("SEND_TO_GOOGLE"1);

// get page file names from a database
// uncomment the line below to use a database to retrieve page file names
// $files = get_pages_from_db();

// uncomment the lines below to read page file names from a text file
// the text file should have entries in this format: file.ext daily
// get page file names from a text file
//$files = get_pages_from_file($base_dir.'page_file_names.txt');

// create static array of page file names to include in the site map
// uncomment the lines below to use a static array of pages
$files = array(
    
'index.php' => array('yearly''1.0'),
    
'about.php' => array('monthly''.33')
);

////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
///////////////////////// DO NOT EDIT BELOW THIS LINE //////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////

$msg '';     // diagnostic message

// command to execute to get the last modified date
define("CMD_LAST_MOD""date --iso-8601=seconds -u -r ");

// general file header for use with Google type Sitemaps
$file_header 
'<?xml version="1.0" encoding="UTF-8"?>
<urlset
  xmlns="http://www.google.com/schemas/sitemap/0.84"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
                      http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">'
.NL;
// general file footer for use with the Sitemap
$file_footer '</urlset>';

// create array of frequencies with associated priorities
$freqs = array ( 
    
'always'    => '1',
    
'hourly'    => '.75',
    
'daily'        => '.67',
    
'weekly'    => '.5',
    
'monthly'    => '.33',
    
'yearly'    => '.1',
    
'never'        => '0'
);

// include root directory in sitemap with highest priority
$xml_data ' <url>'.NL.'  <loc>'.$base_path.'</loc>'.NL;
$xml_data .= '  <changefreq>daily</changefreq>'.NL;
$xml_data .= '  <priority>1.0</priority>'.NL;
$xml_data .= ' </url>'.NL;

foreach(
$files as $name=>$file) {
    
// get data from array
    
$freq = isset($file[0]) ? $file[0] : 'monthly';
    
$priority = isset($file[1]) ? $file[1] : $freqs[$freq];
    
// $name = key($files);

    // get last modified time from system
    
$last_mod system(CMD_LAST_MOD.$base_dir.$name);
    
    
// the CMD_LAST_MOD returns a value in a non-supported format
    // the line below converts it to the Google-preferred format
    
$last_mod str_replace("+0000""+00:00"$last_mod);
    
    
// write information to xml_data
    
$xml_data .= ' <url>'.NL.'  <loc>'.$base_path.$name.'</loc>'.NL;
    
$xml_data .= '  <lastmod>'.$last_mod.'</lastmod>'.NL;
    
$xml_data .= '  <changefreq>'.strtolower($freq).'</changefreq>'.NL;
    
$xml_data .= '  <priority>'.$priority.'</priority>'.NL;
    
$xml_data .= ' </url>'.NL;
}

unset(
$files); // free memory

// create contents of xml file
$xml_file $file_header.$xml_data.$file_footer;

// create sitemap file at the base directory
writeLogFile($base_dir.$Sitemap$xml_file1);

if ( 
SEND_TO_GOOGLE ) {
    
$msg .= "sending to google...".SEP;
    
$url $base_path."sitemap.xml";
    
$query urlencode($url);
    
    
// notify Google of sitemap
    
$g_url "http://www.google.com/webmasters/tools/ping?sitemap=$query";
    
    
// use cURL
    
$ch curl_init();
    
curl_setopt($chCURLOPT_URL$g_url);
    if ( 
curl_exec($ch) ) {
        
$response curl_getinfo($ch);
        if ( 
is_array($response) ) {
            
$msg .= print_array($response) . SEP// print_array is custom function
        
}
        else {
            
$msg .= $response SEP;
        }
    }
    else {
        
$msg .= curl_error($ch) . SEP;
    }
    
curl_close($ch);
}

/*******************************************************************************
array get_pages_from_db (  ) 

This function assumes the use of PDO MySQL with a database handler called dbh. 
It also makes use of the variable query_count defined in the init.php file.  If 
you do not have PDO MySQL enabled on your server, please create a new function 
to use the mysql_query sort of functions to achieve the same purpose.

If the query is successful, the results are stored in an array in the proper 
format.

This function can be manipulated to receive field names, table name, where 
clause, and order by clause in the parameter list, should this be necessary.
*******************************************************************************/
function get_pages_from_db() { 
    global 
$dbh$query_count$msg;
    
    
$files = array();

    
$q 'SELECT * FROM pages WHERE disp = 1 ORDER BY page_file_name;';
    
    
$stmt $dbh->prepare($q);
    if ( 
$stmt->execute() ) {
        
$query_count++;
        
$rows $stmt->fetchAll(PDO::FETCH_ASSOC);
        
$num_rows count($rows);
        if ( 
$num_rows ) {
            foreach(
$rows as $row) {
                
$files[$row['page_file_name']][0] = $row['page_change_freq'];
                
$files[$row['page_file_name']][1] = $row['page_priority'];
            }
        }
    }
    else {
        
$errors $stmt->errorInfo();
        
$msg .= $q SEP $errors[2] . SEP;
    }
    
    
$stmt null;

    return 
$files;
}

/*******************************************************************************
array get_pages_from_file ( string $file_name ) 

This function receives a file name from the caller and tries to read the file 
contents into an array.  If successful, it populates a different array with its 
space delimited values.
*******************************************************************************/
function get_pages_from_file() { 
    global 
$freqs;
    
    
$args func_get_args();
    
$file_name $args[0];
    
$files = array();
    
$priority '';
    
$freq '';
    
    if ( 
is_file($file_name) && file_exists($file_name) ) {        
        
$file file($file_nameFILE_IGNORE_NEW_LINES);
        if ( 
is_array($file) ) {
            foreach(
$file as $line) {
                
trim($line);
                
// determine what has been set
                
$vars explode(" "$line);
                
$page $vars[0];
                if ( isset(
$vars[1]) ) {
                    if ( 
is_numeric($vars[1]) && $vars[1] <= ) {
                        
$priority $vars[1];
                    }
                    else if ( 
array_key_exists($vars[1], $freqs) ) {
                        
$freq $vars[1];
                    }
                }
                if ( isset(
$vars[2]) ) {
                    if ( 
is_numeric($vars[2]) && $vars[2] <= && $priority == '' ) {
                        
$priority $vars[2];
                    }
                    else if ( 
array_key_exists($vars[2], $freqs) && $freq == '' ) {
                        
$freq $vars[2];
                    }
                }
                
                
// add newly found values to files array
                
$files[$page][0] = $freq != '' $freq 'monthly';
                
$files[$page][1] = $priority != '' $priority '.5';
            }
            unset(
$file);
        } 
        else {
            
$msg .= "$file is not an array" SEP;
        } 
// end if file is array
    
}
    else {
        
$msg .= "$file_name is not a file" SEP;
    } 
// end if file name exists
    
    
return $files;
// end get_pages_from_file
?>

You can upload cron_gen_sm.php to your current cron job script folder or anywhere else on your server. I prefer to send the Sitemap to Google once a week because I am not updating the links very often.

If you have questions or comments about this script, please use the live chat feature on the website here or send an email to the address on the contact page. This is not the only way to generate a Sitemap for Google, but it works well. Feel free to recommend revisions.


Send Email for HTTP Error Codes 401, 403, 404, 500 using Perl CGI-bin

Ever wondered how long one of your websites has been down, but you didn't check your error_log file? There is an easy way to be alerted when there is an error on your website, such as 404 Not Found, 500 Internal Server Error, and others. I have made a simple CGI script using Perl that sends an email to the webmaster saying what page caused the error, what the error code is, and what the visitor's IP address is. I call it sendmail.pl.

You can download sendmail.pl here or copy and the code below and paste it in your own file.

#!/usr/bin/perl -w
use CGI qw/:standard/;

# sendEmail function
# used from code found on http://perl.about.com/od/email/a/perlemailsub.htm
# modified on August 8, 2008 by Mike Rodarte
# http://www.mts7.com
#
# This function is used to send email to an administrator of the given 
# website.  

################################################################################
# configure the options below                                                  #
#                                                                              #
my $error_page = "/error.html";                                                #
my $admin_email = "administrator\@yourdomain.com";                  #
#                                                                              #
# the from email address is set to serveradmin@servername.com                  #
# the subject is set to Error on site www.servername.com                       #
#                                                                              #
# do not configure code below this line                                        #
# unless you know what you're doing                                            #
################################################################################

# send the email with the current http status and the admin email address
sendEmail($ENV{REDIRECT_STATUS}, $admin_email);

# redirect the user to the supplied error page
print redirect('http://'.$ENV{SERVER_NAME}.$error_page);

sub sendEmail
{
    # get parameters from function
    my ($error, $to_email) = @_;
    
    # these variables are used in the message
    my $server = $ENV{SERVER_NAME};
    my $uri = $ENV{REQUEST_URI};
    my $path = "http://".$server.$uri;
    my $ip = $ENV{REMOTE_ADDR};
    
    # find the root domain name of the server (no subdomains like www)
    my $last_dot = rindex($server, ".");
    my $previous_dot = rindex($server, ".", $last_dot-1);
    my $small_string = substr($server, $previous_dot + 1);

    # message to send
    my $message = "There was an error ".$error." when ".$ip.
    " tried to access ".$path;
    
    # to email from configured option above
    my $to = $to_email;
    
    # you can change this email address to whatever you want
    # from email is serveradmin @ server root domain name
    my $from = "serveradmin\@".$small_string;
    
    # subject for email
    my $subject = "Error on Site ".$server;

    # path to sendmail
    my $sendmail = '/usr/lib/sendmail';

################################################################################
# the code below sends the email                                               #
# do not edit unless you know what you're doing                                #
################################################################################    
    # send the email
    open(MAIL, "|$sendmail -oi -t");
    print MAIL "From: $from\n";
    print MAIL "To: $to\n";
    print MAIL "Subject: $subject\n\n";
    print MAIL "$message\n";
    close(MAIL);
}

Take your copy of sendmail.pl and upload it to your cgi-bin directory. Use FTP, SSH, or anything similar to change it to an executable file. For example, in SSH you can type "chmod +x sendmail.pl" without the quotes. It sends an email and then forwards to the error document you specify. The default is error.html.

Once this is taken care of, modify .htaccess in your HTTP root directory (public_html), or create it if it does not exist. You can download a sample of .htaccess here. Be sure to save it as "All Files" for it to work properly. Inside .htaccess, place the following lines in it:

ErrorDocument 401 /cgi-bin/sendmail.pl
ErrorDocument 403 /cgi-bin/sendmail.pl
ErrorDocument 404 /cgi-bin/sendmail.pl
ErrorDocument 500 /cgi-bin/sendmail.pl

You can use whatever error codes you choose, but these are what I am using in this example. Using these codes, all Unauthorized, Forbidden, Not Found, and Internal Server Error requests will execute the script.

Be sure to have your error.html (or similar) document on your server because the script will forward to it after the script sends the email. You can configure the error document's name in the Perl script, as well as the webmaster's email address. To see my error.html, click here.

To review, you should have 3 files on your web server: cgi-bin/sendmail.pl, .htaccess, and error.html. All of these files work together to produce the webmaster's alert and keep the visitors from seeing the default error documents.


If you have questions or comments about this script, please use the live chat feature on the website here or send an email to the address on the contact page. This is not the only way to alert the webmaster, but it works well. Feel free to recommend revisions.