2007
03.16

Custom Apache Logging.

Apache’s HTTPd is an awesome app, nobody is going to deny that! At U-MYX, I needed to have a way to log the external URL’s which were accessing our mix streaming player (eg: somone just played mixID 1234 from myspace.com/someband). This is easily accomplished with a bit of apache config and some clever use of grep and cut in Linux.

First off, I needed to create a custom log file which was just going to store the bare essentials required for this task. That would be:

  • The requested URL (which will contain the mixID)
  • The refferer (eg: myspace.com/someband)
  • A Timestamp (because it’s always useful to know)

Now, you could just argue that all this information will be stored in the main /log/access_log anyway – that’s true. But there will be additional overhead involved in extracting just the streaming entries and ignoring all the extensive data that apache logs by default.

So, to set up this custom “logs/streaming” log, we need to add the following to our httpd.conf:

# The "style" the log file will be in (timestamp, "requested file", "referer")
LogFormat "%t \"%r\" \"%{Referer}i\"" streaming

# Use SetEnv.c module to filter "_stream/l/.swf" from the RequestURI
SetEnvIf Request_URI "^/_stream/l[.swf].+$" streamlog

# Create a new log using the "log_streaming" format and only logging entries
# that match the "streamlog" SetEnv condition above.
CustomLog logs/streaming_log streaming env=streamlog

The comments in the code give you an indication of what needs to be done. The SetEnvIf rule (streamlog) uses a regular expression to just catch requests for “/_stream/l.swf”. If you wanted this to do the opposite, and log everything BUT certain conditions, you can throw a ! into your “CustomLog evn=” statement (eg: env=!streamlog would log everything BUT l.swf requests.)

When you restart apache, your custom log file should now start filling up – good work – but that data is not exactly usable, so what next? That’s where a cron’d shell script comes in handy.

#!/bin/bash
# Streaming Refferer Log rotator
# Processes the streaming refferer log, emails it to us and then cycles the log file

# Create the Table Header.
echo "Freq. fileID Refferer" > /tmp/streaming_refferer.tmp
echo "===== ====== ========" >> /tmp/streaming_refferer.tmp

# Parse the apache access log
cat /var/log/httpd/streaming_log | cut -d ' ' -f 4,6 | fgrep -v \"-\" | tr -d \" | sed 's/\/_stream\/l.swf?m=//g' | sort -n | uniq -i -c | sort -nr  >> /tmp/streaming_refferer.tmp

# Transform the temp file into a pretty table layout
cat /tmp/streaming_refferer.tmp | column -t > /tmp/streaming_refferer.log

# Email it.
cat /tmp/streaming_refferer.log | mail -s "Refferer Logs"  you@youremail.com

# Tidy Up the temp file
rm -f /tmp/streaming_refferer.tmp

# Truncate the streaming log so it's fresh for tomorrow.
> /var/log/httpd/streaming_log

(Please note that I have attached this script. to this post, as Wordpress has no doubt screwed it up somewhere along the way).

The comments in the above script explain what’s going on. For this to be useful you would add it to your crontab so that it happens at 1am each morning. I am only interested in daily reports of streaming activity so once the results have been emailed I truncate the log file. Before you do that, you could of course process the individual entries and store them to get a “life record” for each file, etc.

Hope this has helped
Jonny.

5 comments so far

Add Your Comment
  1. Ahh, the joys of Apache configs…

    I’ve been (slowly) tidying up my vhost configs. I might even get around to setting up separate logs for each domain. That would make webstats a bit more useful…

  2. ^/_stream/l[.swf].+$

    That regex doesn’t look right. Square brackets making a character class out of “.swf”? Plus I don’t think you really need the “.+$” either.

  3. Hey Oktal. No, I am the first to agree that my regex is pretty basic ;) You’re right about the .+$, If I remove the $ it will only be matching the start of the string. The [.swf] bit came out of fustration – I originally tried ^/_stream/l\.swf but it refused to “escape” the full stop causing the regex to not match. encapsulating the . (and the swf for that matter) in square brackets did the trick and I had already spent too long on it by then :)

  4. Ahhh, regex…

    ..and split up that nasty long line so it’s readable.

  5. :)

    Thanks for reminding me of xkcd… myspace youtube

    Also: superfluous ‘cat’ statements! Respect for that mad long line… cut|grep|tr|sed|sort|uniq|sort… must be possible to simplify it somehow but then it is a quick hack so it’s all good.

    Am I a nerd if I laugh at this?