1.0 Introduction
Given an nginx web server log file, we might like to know who are the visitors to our website. One can go line by line through the log file, but that is tedious. It would be nice if we can get one line per client visiting the website, giving the IP address of the visitor and how many times that client has visited. Finally, if the output is sorted in the descending order of number of visits, we get the list of most frequent visitors to the website. Here is a script which does this.
2.0 Script
Our script is written in Perl. It uses a hash with the IP address as the key. The hash keeps the number of visits for IP addresses. For each line of the log file, the corresponding count for the IP address is incremented. The script is,
#!/usr/bin/perl # # processlog: process the log file passed as an argument. # if (@ARGV < 1) { die ("Usage: processlog logfile1 [logfile2 [...]]\n"); } my (%count); while (<>) { chomp; /(\S+)/; $count{$1} += 1; } foreach $ipaddress (keys %count) { print "$ipaddress: $count{$ipaddress}\n"; }
We can run the above script with an access.log file.
$ ./processlog access.log 2001:db8:fe17:a000:dacb:8aff:fee1:905b: 14 2001:db8:bdde:6480:6868:bc67:87d:64f7: 12 198.51.100.3: 18 2001:db8:938f:85bf:c309:68fd:ae91:2c82: 14 203.0.113.0: 2 ...
This gives the totals for the IP addresses of visitors but is not sorted on the number of visits descending. We can pass the output of the script through the sort command to get the most frequent visitors to the website.
$ ./processlog access.log | sort -nr -k2,2 -k1,1 198.51.100.45: 10083 203.0.113.22: 7365 192.0.2.31: 5927 203.0.113.44: 3972 198.51.100.10: 3857 ...