BRDSTATS read me (v1.50) (2001/11/27)
Description:
This program scans proxy server common log files, and creates HTML files containing
these statistics:
- A summary
- The Top 20 Users
- The Top 20 Web sites (URL)
- The analysis of each Top 5 Users (Top 10 URL for each of the Top 5 Users)
- The analysis of each Top 5 URL (Top 10 Users for each of the Top 5 URL)
- The Top 20 IP address
- 24 hour traffic analysis
- 7days/24hour traffic analysis
- Top 20 proxy return codes (for example error 404=not found)
- Top 20 file types (for example .gif, .html, .jpg)
- Top file sizes
(*)These are the default settings, they are customisable.
Requirements:
- Novell BorderManager Proxy server or any other proxy that support Common Log format.
- Common log files available to analyse. In BorderManager for example, configure HTTP
Proxy Logging (common format - rollover by date for example 7 days).
- HTTP Proxy authentication needed for 'Top users'.
Note: This utility is free to use, please keep the author posted if you like it.
In case of problems please read this entire file.
Installation and usage:
- Copy BRDSTATS.EXE directly in your log directory (For BorderManager it is by default
SYS:\ETC\PROXY\LOG\HTTP\COMMON). Take note that the program allways works with the current
directory. You can copy the executable anywhere you want, just remember to CD to the
directory where your log files resides before running it.
- From any DOS/Win9x/WinNT PC, open a DOS box and go into that directory. Run BRDSTATS or
BRDSTATS [filename]. The logs must be closed, any open log is unaccessible.
- The program will then read and summarize the entire log file. Please
be patient! The program can analyse more than 1000 lines/sec, depending on your
PC and the number of statistics. It takes in my case 10 minutes to analyse 1 week proxy
activity (about 20 MB log file). You can abort the program at any time with the ALT-C key.
- The output file written has the same name than the log file, but with the extention
.HTM. If no filename is specified, it will scan ALL log files (*.LOG) in
the current directory that doesn't already have an equivalent .HTM file. If you want to
redo an HTM file, just delete it, and rerun BRDSTATS.
- BRDSTATS will also create a INDEX.HTM file containing links to all other .HTM files
available in the directory. This INDEX.HTM is recreated from scratch every time BRDSTATS
is run and at least 1 log file is analysed.
- Configuration is made through BRDSTATS.INI, which is automatically created on the first
time the program is run, with all the defaults. After the INI has been created, just use
Notepad to modify it to suit your needs. All Top xx numbers can be set from 0 to 1000. If
you want to remove a stat, put "0" to deactivate it. If any parameter is missing
or mispelled in the ini file, defaults are used. You can delete the INI file and it will
be recreated with defaults.
IMPORTANT: If you are upgrading from a previous version you should
delete the .INI file or at least rename it so it would be automatically created with the
latest settings. Parameters and also documentation within the .ini file changes in all
major version, and this is the only way to have that info. Also look at the
"History" section at the end of this document for new features.
Statistics details
- In all statistics, MB refers to the number of Megabytes (1024*1024 bytes) sent to the
client by the proxy server. This is used to give an idea of bandwidth utilisation of the
proxy. This cannot give a precise idea of bandwidth used on the internet wan link - see
below.
- Hits refers to the number of single file request. Each line in the input log files count
as 1 hit. A web page normally has more than 10 elements, like images, logos, buttons, etc.
This statistic is used to give an idea of the time passed on the internet
- The URL summary is based on the root url. For example http://www.123.com/main.htm and
http://www.123.com/images/header.gif are counted as "http://www.123.com".
- The User stats uses the login name for the top 20. If your users are not authenticated
to your proxy, you will have only 1 user, named "Unknown".
- The Top URL and Top User analysis gives details about the Top URL and users. It gives
who got to all of the top 5 urls, and also whhere do those Top 5 users have gone. In the
INI file you can control how many Top user or url to analyse, and how many items in each
will be detailed.
- The Top IP addresses gives stats for a specific machine. This is interesting for those
who lacks usernames.
- Traffic analysis denotes traffic in Megabytes on 24h or 7 days / 24 hours. Each hour
starts at 00m00s and ends at 59m59s. For example hour 16 starts at 16h00.00 and ends at
16h59.59.
- Proxy return codes is an internal control code of the proxy. This code is some kind of a
result code stating if the url request succeeded or not. These codes can be used to dig in
specific problems.
- Top file types gives the most downloaded files by extention type. Note that
"None", ".html" and ".htm" could be summed as they all are
html documents. This statistic can give you hints of what type of traffic is big. In my
case I used it to find which file type I needed to block access.
- File size analysis gives the most downloaded files by size. I don't think of any use of
this, if you find one, tell me.
- The proxy log does not tell if the data has been served from cache or from the internet.
A file accessed 10 times will be downloaded from the internet only once, then read from
the cache the 9 other times. The proxy stats will show the 10 times, thus you cannot use
the proxy stats to evaluate your internet traffic. The proxy return code 304 Not Modified
seems to give some hint on cache "hits". But these cache Hits does not account
for all cache hits of the proxy server. These hits turns around 20 to 30% of all hits,
while BorderManager stats normally shows 70% cache hits.
Additional info:
- You can create a custom log file to obtain a specific analysis. I use the grep command
to create a specific log file when specific needs arise. For example, if I need an
analysis of the web site "yahoo", I do:
grep -i yahoo (logfile) > yahoo.log
Then rerun BRDSTATS and the yahoo.log will be analysed. (grep is a unix command also
available for DOS/Windows)
- If you wish to automate BRDSTATS, I suggest you use a simple batch file that will CD to
the Logs directory, run BRDSTATS, then copy all .HTM files to the desired web server
directory.
- The speed BRDSTATS runs depends on the PC and also the number of stats to produce.
Normally you should get a speed of 500 to 1000 lines/sec on a recent PC. If you need more
speed, disable unused stats from the INI config file, starting with the more hungry ones:
Top users/URL analysis and file type analysis. Note that you need to disable a feature
(set to No or to 0) to gain more speed. Whether there's 1 or 40 items selected on any
statistic, the time spent analyzing is the same.
Troubleshooting:
- BRDSTATS will reject a log file if there is more than 50% errors. If you need to see the
lines that are rejected, set the "Debug" option to "Yes" in
BRDSTATS.INI.
- You may have a URL named "/ (Local file system)" or "http://(your local
web server here)". This means that your users pass through your proxy server to get
to the local web server. This is either wanted this way or a it is a web browser
configuration problem.
- In some cases, the log file reports a file size of 2GB. This really affects the
statistics! Usually it is a video stream. Of course the user didn't downloaded that much
data, but the transfer has started. At this time, I don't have any answer for this,
besides using a file editor and manually delete those lines from the log file.
- BM tends to put garbage in the logfile when the server isn't shut down properly.
BRDSTATS can skip through any garbage in the log file (since v1.50). However if you have a
problem with a log file that BRDSTATS stops running before end, set the "Debug"
option to "Yes" in BRDSTATS.INI and try to see if there's any garbage in the log
file. Debug option will help you pinpoint where's the problem in the logfile. Use a good
text editor like PFE32 (Freeware) and try to correct the log.
- BRDSTATS has been tested with BorderManager 3.5. Some users have tested it on BM version
3.0. Any other proxy server welcome, as I use the common log format.
- BRDSTATS uses DBF file format to sum stats. These files are left there after the program
is run and that can be imported in any database or spreadsheet to get more detailed
analysis. There's 4 files that are allways overwritten for each log analysed, so if
BRDSTATS analyse 2 or more logs, only the last analyse is left. In short, there's BRDURL,
which contains a record for each specific URL, BRDUSR with a record for each unique user,
BRDIP with a record per IP address, and finally BRDBOTH which has a record for each unique
USER And URL. All records contains summed info about Hits and MB. You can easily
import these files in Excel and make out a Graph or anything else you can think
of.
Send any Comments / Suggestions / Ask for source code (Clipper 5.3) to:
Simon Begin
History
Version 1.50 (20011127)
- Top xx IP Adresses - Enabled at 20 by default.
- Engine is now able to filter any garbage in the logs, which occurs when the proxy isn't
shut down properly. The program now tells when there's garbage in the input log file and
how many bytes were skipped.
- Skipping file clip$err.log if there. This file contains run-time errors of Brdstats.
Version 1.40 (20010528)
- Parsing of lines revised and optimised: more speed and precision.
- New Top file types statistics (for example .gif, .html, .jpg)
- New Top file sizes
- New global parameter to select default sort order for statistics.
"DefaultSort" can be set to Hits or MB, and affects all statistics that contains
MB and Hits data. Default is Hits, and reflects more a "time spent" on the
internet. If set to "MB", statistics will be sorted on file size, for those like
me who prefer to check who's using all bandwidth, and not who's passing all his time on
the net...
- There is now only 1 "Top nn URL" and 1 "Top nn User" section, which
are sorted depending on the DefaultSort parameter.
- "Clickable" URLs
- Reverse order in the index.htm file, so the newer entries are at the top of the file.
- Some minor bugs fixed.
Version 1.30 (20001213):
- New INI file to setup report output
- 24x7 traffic analysis
- Readme updated and put into BRDSTATS.HTM.. Troubleshooting tips added
Version 1.23 (20001019):
- First published version, translated to english
- Analyses all logs within the current directory that doesn't already have a .HTM
equivalent
- New index.htm with links to all reports in the directory
Future enhancements
- Filter options. The desired result is to include and/or exclude some string from the log
files. It could be a url, a user, or everything which is in the log. This was first
scheduled for release in v1.50 but postponed due to lack of time.
- Support for IPPKTLOG - logging of firewall packet logging. I intend to make a full
support for this. It should be similar to BRDSTATS proxy stats.
-