This page was last modified: July 28 2006 16:03:27   
Too Cool for Internet Explorer

The Webalizer

Webalizer is a tool for generating website access statistics from log files. The output is in HTML format for easy viewing with a browser... and it is possible to integrate it seemlessly with the existing design of your website.

Installation:

cd /usr/ports/www/webalizer/
make install clean distclean

If you want to be able to see usage by country, you must compile webalizer with the WITH_GEOIP option.:

cd /usr/ports/www/webalizer/
make WITH_GEOIP=yes install clean distclean

I have separate logfiles for each hosted website, and the following shows how I setup webalizer to analyse each of them.

Configuration

First I created a specific folder for statistics on each site:

cd /usr/local/www/domain.tld/
mkdir statistics
chown www:www statistics

Then I made a configuration file for each site:

cd /usr/local/etc/
vim webalizer_www.domain.tld.conf

This is the content of the above file:

LogFile /usr/local/www/logs/domain.tld-access_log
OutputDir /usr/local/www/domain.tld/statistics
HistoryName webalizer.hist
Incremental yes
ReportTitle Statistics for
HostName www.domain.tld
PageType htm*
PageType cgi
PageType php
HTMLPre <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
HTMLHead <META NAME="author" CONTENT="The Webalizer">
HTMLBody <BODY BGCOLOR="#E8E8E8" TEXT="#000000" LINK="#0000FF" VLINK="#FF0000">
HTMLPost <BR CLEAR="all">
HTMLTail <IMG SRC="msfree.png" ALT="100% Micro$oft free!">
HTMLEnd </BODY></HTML>
HideURL *.gif
HideURL *.GIF
HideURL *.jpg
HideURL *.JPG
HideURL *.png
HideURL *.PNG
HideURL *.ra
HideURL *.css
HideURL *.CSS
HideURL *.ico
HideURL *.ICO

I've included a short explanation of each keyword at the bottom of this page.

The first run...

If your website has already been available to the world for some time, you probably have access logs that needs to be read by webalizer. If you only have one big logfile, you can just put webalizer to work. But if you rotate your logs, you must initially feed webalizer with data from each log (oldest first).

If you read in multiple logfiles, remember to correct the LogFile value in the configuration file between each run. Also if you have been using newsyslog, remove the first an last line in each log indicating when the log was turned over.

webalizer -c /usr/local/etc/webalizer_www.domain.tld.conf
Webalizer V2.01-10-glzr (FreeBSD 6.0-STABLE) English
Using logfile /usr/local/www/logs/www.nerdgirl.dk-access.log (clf)
Using default GeoIP database:
GEO-106FREE 20060501 Build 1 Copyright (c) 2006 MaxMind LLC All Rights Reserved
Creating output in /usr/local/www/app_gen_content/www.nerdgirl.dk/statistics
Hostname for reports is 'www.nerdgirl.dk'
History file not found...
Previous run data not found...
Saving current run data... [07/27/2006 21:23:34]
Generating report for July 2006
Generating summary report
Saving history information...
31322 records in 0.75 seconds

Now you can have a look at the newly created HTML files an graphics:

cd /usr/local/www/domain.tld/statistics
ls
ctry_usage_200604.png
ctry_usage_200605.png
ctry_usage_200606.png
ctry_usage_200607.png
daily_usage_200604.png
daily_usage_200605.png
daily_usage_200606.png
daily_usage_200607.png
hourly_usage_200604.png
hourly_usage_200605.png
hourly_usage_200606.png
hourly_usage_200607.png
index.html
usage.png
usage_200604.html
usage_200605.html
usage_200606.html
usage_200607.html
webalizer.hist

The number of files created is of course depending on the content of the logfile.

Rotating issues...

Since I use the incremental option, I don't need to worry too much about when newsyslog rotates the logfiles, but I need to worry about how the logfiles are rotated.

If you are not rotating your logs, this is not an issue for you - except of cause the fact that you'r logs will grow huge over time.

I don't have that many visitors, so I am rotating my logfiles every month at midnight. Webalizer on the other hand, is running every hour at the 55th minute. This means that I loose 5 minutes of statistics every month - but that is acceptable.

The newsyslog entries for rotating each websites logfile, looks like this (all in one line):

/usr/local/www/logs/domain.tld-access.log  644  7  *  $M1D0  B
/var/run/httpd.pid  30

The above is rotating logfiles at the start of the first day in each month. The B flag prevents newsyslog from adding 'log file turned over' -messages in the log file, since this could cause an error when webalizer reads the log.

Since Apache is very sensitive, newsyslog must gracefully restart Apache when the logfile is turned over. This is accomplished by adding the location of the httpd.pid file, and the signal to use (30). You can read more about it here.

Automating webalizer

There is no reason to run webalizer by hand... just put cron to work at let it do the job for you. It is entirely up to you, to decide how often webalizer should update your statistics. I do it every hour at the 55th minute:

55 * * * * www webalizer -c /usr/local/etc/webalizer_www.domain.tld.conf
>/dev/null 2>>/usr/local/www/logs/webalizer-error_log

The above is all in one line when editing crontab

The reason for choosing the 55th minute of every hour, is that webalizer will then run af few minutes prior to the rotation of the logfiles, which happens once a month.

Any errors occurring, will be written to webalizer-error_log, which is a file I have created for this purpose.

Tip for handling multiple domains:
Another way of dealing with Webalizer and multiple domains, is by making a shell script that loops through all your *.conf files. That way, you only need 1 line in your crontab.

Create your shellscript in the same directory where your *.conf files is located, with the following content.:

for i in usr/local/etc/webalizer*.conf;
do webalizer -c $i;
done

Now, put this script into crontab, and in the future when adding a new domain, just create the *.conf file and your done.

In short...

Here is a small summarization of my setup:

  1. Each website on the server, has its own folder for statistics and its own webalizer configuration file.
  2. Each logfile is rotated 12 times a year, on the first day of each month
  3. Webalizer is running every hour at the 55th minute

Statistics for www.nerdgirl.dk

Keywords explained...

LogFile
This is the logfile from which Webalizer should extract data.

OutputDir
This is where Webalizer will place generated graphics and HTML files.

HistoryName
Name of file where history for each month is saved. This file is placed in specified OutputDir.

Incremental
This option will allow you to generate statistics from multiple logfiles. This is an advantage if you logfiles is rotated in between the executions of webalizer. If you rotate your logfiles immediatly after each execution of webalizer, this option should be set to 'no'.

ReportTitle
This is the title which is written at the top of each HTML report.

HostName
The HostName will be prepended to the ReportTitle, and is also used when generating links to the files in the url-tables of the statistics.

PageType
This will tell Webalizer which files to consider as a 'page'.

HTMLPre
This is the line(s) to insert at the to of every page in the statistics.

HTMLHead
This will be inserted in the HTML head-sektion of each page in the statistics.

HTMLBody
Here you can define the HTML <body> tag of each page.

HTMLPost
On each page in the statistics, a <hr> tag is inserted to separate the headline from the actual content. Anything defined here will be inserted immediately before this <hr> tag.

HTMLTail
Here you can specify HTML code to be inserted at the bottom of each page.

HTMLEnd
Anything specified here, will be inserted immediatly after any content in HTMLTail

HideURL
The HideURL keyword will prevent the specified files from being displayed in the 'Top' tables (tables with lists of the most visited files), but will still be counted in the main totals.

The above is just a small selection of configuration options. If you want to see what else you can do, take a look at /usr/local/share/doc/webalizer/README and man webalizer.