################################################################### # logsplitter v0.2.2 (c)2006-2007 Tim Jackson (tim@timj.co.uk) # ################################################################### INTRODUCTION This is a PHP-based splitter for Squid and Pound logs. It's intended primarily for situations where Squid is being used as a reverse proxy (HTTP accelerator) or where Pound is being used as a load balancer. It's pretty simple really; it takes the normal Squid/Pound access log as an input and then splits it into a log file that looks like an Apache "combined" access log, with the addition of Squid hit/miss information on the end in the case of Squid logs (because this is useful for other purposes, like producing hit/miss statistics). For security purposes and so that related logs can be grouped, the list of valid hostnames which you are expecting to see in the access log must be defined using a "hostname list". This is a newline-separated text file with a list of hostnames. If you wish to group multiple hostnames together into a single output file (for example to group www.example.com and subdomain.example.com into one output file) you can space-separate multiple hostnames on a line. You can also use simple wildcards by using asterisks. Example: ===== example hostname list file www.example.com www.example.net www.example.org *.example.org *.otherdomain.example.com ===== It is anticipated that this file will normally be automatically generated by some external tool. INSTALLATION There are only two files to install really: - logsplitter (which is the file you run from the command line) - Text/LogSplitter.php (main logic in a class) Out of the box, it will run from the directory you extracted it into. You will need to install the PEAR module Console_Getopt in order for the main logsplitter CLI program to work. CONFIGURATION There is a simple configuration file (logsplitter.ini) which simply defines the input and output file locations. A commented example config (logsplitter.ini.sample) is provided. By default, logsplitter reads from the configuration file "logsplitter.ini". USAGE Simply run "logsplitter" on the command line. It will read the defined config file and process the defined access log file, outputting as it goes. To get some summary statistics, use the "-v" (verbose) option. To override which config file is to be used, use the "-c" (config file) option. Examples: $ logsplitter -v 5 lines in 0.00 seconds (12314 lines/sec, 2283 kbytes/sec) $ logsplitter -v -c /path/to/myconfig.ini 5 lines in 0.00 seconds (12314 lines/sec, 2283 kbytes/sec) NOTES ABOUT CANONICALISATION logsplitter canonicalises all hostnames found in input files to lowercase. It also removes any port number specifications from the hosts logged in the file. (The assumption is that you are not going to run two completely different sites which need to be treated separately for logging purposes on http://www.example.com/ and http://www.example.com:1234/)