Easy Apache log statistics using Visitors

There are many Apache log analyzers that you can now choose from, but it can be long or confusing to install most of them. I wanted to try and find a simple log analyzer that just does its work using cronjobs. Visitors seems to fit the needs!
We’ll also use ip2host to resolve the IP addresses into domain names.
All of this will be run daily by a cronjob.

Screenshot of a report generated by Visitors
Screenshot of a report generated by Visitors

Requirements

Here’s what you need to keep going:

  • Visitors: homepage
  • ip2host: (I couldn’t find the homepage) it can be downloaded from here
  • cron, apache logs, … all the obvious!

Instructions

First, we need to create a folder where to store the ip2host DNS cache file.

sudo mkdir /var/cache/ip2host/

Then open a new file /etc/cron.daily/visitors and you can put your own variant of the following code:

#!/bin/bash

MYIP="99.99.99.99" # i want to exclude my home ip from the logs
SERVERIP="222.222.222.222" # my server's ip
REPORTDIR="/var/www/webstats" # folder where to store reports, this folder must exist
ALOGDIR="/var/log/apache2" # folder containing the logs
VISITORS="/usr/bin/visitors -A --exclude wp-cron.php --exclude robots.txt" # i exclude some files from the reports
IP2H="ip2host --cache=/var/cache/ip2host/cache.db"
GREPOPTIONS="-hv -e ^$MYIP -e ^$SERVERIP" # exclude my home ip and my server's ip from the logs

# we create a tmp file that will hold the logs
TMPFILE=$(mktemp)
if [[ ! -f "$TMPFILE" ]]; then
  echo "tmpfile doesn't exist."
  exit 1
fi

# if you only have one site, or you want all the logs in a single report
/bin/grep $GREPOPTIONS $ALOGDIR/access*.log{.1,} 2>/dev/null > $TMPFILE # get all the logs into the tmpfile, notice the GREPOPTIONS variable.
($IP2H  $REPORTDIR/stats.html  # resolve all the ips and generate the reports, note that "--trails --prefix http://www.domain.com" is optional it's only needed for generating trails stats

# -OR-
# if you have multiple vhosts/prefixes and want separate reports, you can use this:
# replace all the "www prefix1 prefix2 prefix3" by your own prefixes (as in http://PREFIX.domain.com)
for name in www prefix1 prefix2 prefix3; do
  /bin/grep $GREPOPTIONS $ALOGDIR/access-$name.log{.1,} 2>/dev/null > $TMPFILE
  ($IP2H  $REPORTDIR/stats-$name.html
done

rm -f $TMPFILE

If you use logrotate or other tool to rotate your logfiles, this cron job will use the last two log files (access*.log and access*.log.1). This usually means you get statistics for the current week and the last week altogether. And it gets updated everyday.

The first run might take some time as the ip2host cache needs to be built, but then it’s very quick.

By tweaking REPORTDIR, you can put your reports so you can access them from the internet like http://www.domain.com/webstats. Note that you might need to secure this folder, but this is left as an exercise! (hint: htpasswd!)

Leave a Reply