Notes techniques

Site Tools



Home > Computers > linux > logsearch | About

One line of sed is worth 25 lines of python

sed -n '/Feb 23 13:55/,/Feb 23 14:00/p' /var/log/mail.log
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: connect from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: lost connection after CONNECT from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: disconnect from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie pop3d: Connection, ip=[::ffff:127.0.0.1]
...

How sed works

the -n switch

tells sed to not output each line of the file it reads (default behaviour)

the last p

(right after the last /) tells it to print the line. What line ? that's what the previous expression is for.

'/pattern/,/pattern/'

will print everything that is between first pattern and second pattern.

A sed program consists of adresses and commands
sed [options] address command. 

For example : sed -n /PATTERN/p will tell sed to print (the p at the end) only those lines of input stream that match the address given by PATTERN. In this example, p is the command and /PATTERN/ is the address.

Line numbers and patterns : two ways to specify addresses

If addresses are given between two forward slashes they are considered as regular expressions, otherwise they are considered as line numbers, so 0 for first line, N for the nth line and $ for the last line.

So for example, this code

 sed -n 5p /var/log/infomaniak/bottle.log 

Will print the 5th line of /var/log/infoamniak/bottle.log. Here's an output of head -5 that shows that 5th line is really that line:

ychaouche@ychaouche-PC ~ $ sed -n 5p /var/log/infomaniak/bottle.log 
2015-03-11 07:56:23,280 [CRITICAL] status file not created for chaine1
ychaouche@ychaouche-PC ~ $ head -5 /var/log/infomaniak/bottle.log
2015-03-11 07:55:06,196 [INFO] Reading config file Ok...
2015-03-11 07:56:05,623 [INFO] Reading config file Ok...
2015-03-11 07:56:05,626 [CRITICAL] status file not created for chaine1
2015-03-11 07:56:23,277 [INFO] Reading config file Ok...
2015-03-11 07:56:23,280 [CRITICAL] status file not created for chaine1
ychaouche@ychaouche-PC ~ $ 

One single pattern is equivalent to mulitple line numbers, which are all the line numbers of the lines where that pattern is found.

Address ranges

Sed also accepts address ranges, that is two addresses separated by a comma. In our example, we used that form of address :

sed -n /PATTERN1/,/PATTERN2/p inputfile.

When an address range is specified, sed will print every line that is in that range. So if for example PATTERN1 is found at line 10 and PATTERN2 is found at line 430, then all the lines 10,11,…430 are all printed.

The equivalent script in python

This little script searches logfiles for all the messages that happened between two datetimes that you specify on the command line. For example, if you want all the log messages from /var/log/syslog that happened on Feb 23 between 13:50 and 15:17, you'd call it like this :

ychaouche@ychaouche-PC ~/CODE $ python logextract.py "Feb 23 13:50" "Feb 23 15:17"  /var/log/syslog 
Feb 23 13:50:06 ychaouche-PC kernel: [24455.249826] hub 1-1:1.0: port 1 disabled by hub (EMI?), re-enabling...
Feb 23 13:50:06 ychaouche-PC kernel: [24455.249838] usb 1-1.1: USB disconnect, device number 9
Feb 23 13:50:06 ychaouche-PC acpid: input device has been disconnected, fd 6
Feb 23 13:50:06 ychaouche-PC acpid: input device has been disconnected, fd 7
Feb 23 13:50:06 ychaouche-PC kernel: [24455.520977] usb 1-1.1: new low-speed USB device number 10 using ehci-pci
Feb 23 13:50:06 ychaouche-PC kernel: [24455.624107] usb 1-1.1: New USB device found, idVendor=1c4f, idProduct=0002
Feb 23 13:50:06 ychaouche-PC kernel: [24455.624113] usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb 23 13:50:06 ychaouche-PC kernel: [24455.624115] usb 1-1.1: Product: USB Keykoard
Feb 23 13:50:06 ychaouche-PC kernel: [24455.624118] usb 1-1.1: Manufacturer: USB
Feb 23 13:50:06 ychaouche-PC kernel: [24455.627157] input: USB USB Keykoard as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.11-1.1:1.0/input/input22
Feb 23 13:50:06 ychaouche-PC kernel: [24455.627373] hid-generic 0003:1C4F:0002.000D: input,hidraw0: USB HID v1.10 Keyboard [USB USB Keykoard] on usb-0000:00:1a.0-1.1/input0
Feb 23 13:50:06 ychaouche-PC kernel: [24455.630013] input: USB USB Keykoard as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.1/1-1.1:1.1/input/input23
Feb 23 13:50:06 ychaouche-PC kernel: [24455.630213] hid-generic 0003:1C4F:0002.000E: input,hidraw1: USB HID v1.10 Device [USB USB Keykoard] on usb-0000:00:1a.0-1.1/input1
Feb 23 13:50:06 ychaouche-PC kernel: [24455.630580] usb 1-1.2: USB disconnect, device number 8
Feb 23 13:50:06 ychaouche-PC mtp-probe: checking bus 1, device 10: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.1"
Feb 23 13:50:06 ychaouche-PC mtp-probe: bus: 1, device: 10 was not an MTP device
Feb 23 13:50:11 ychaouche-PC kernel: [24460.568805] usb 1-1.2: new low-speed USB device number 11 using ehci-pci
Feb 23 13:50:11 ychaouche-PC kernel: [24460.665924] usb 1-1.2: New USB device found, idVendor=1bcf, idProduct=0007
Feb 23 13:50:11 ychaouche-PC kernel: [24460.665931] usb 1-1.2: New USB device strings: Mfr=0, Product=2, SerialNumber=0
Feb 23 13:50:11 ychaouche-PC kernel: [24460.665934] usb 1-1.2: Product: USB Optical Mouse
Feb 23 13:50:11 ychaouche-PC kernel: [24460.672912] input: USB Optical Mouse as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/input/input24
Feb 23 13:50:11 ychaouche-PC kernel: [24460.673292] hid-generic 0003:1BCF:0007.000F: input,hiddev0,hidraw2: USB HID v1.10 Mouse [USB Optical Mouse] on usb-0000:00:1a.0-1.2/input0
Feb 23 13:50:11 ychaouche-PC mtp-probe: checking bus 1, device 11: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2"
Feb 23 13:50:11 ychaouche-PC mtp-probe: bus: 1, device: 11 was not an MTP device
Feb 23 14:09:01 ychaouche-PC CRON[7416]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Feb 23 14:17:01 ychaouche-PC CRON[7513]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Feb 23 14:39:01 ychaouche-PC CRON[7643]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Feb 23 15:09:01 ychaouche-PC CRON[7869]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
Feb 23 15:17:01 ychaouche-PC CRON[8027]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
ychaouche@ychaouche-PC ~/CODE $ 

The script will simply do a grep on the file to locate the first occurence of the starting datetime you put as first argument, then seeks to that position in the file and start reading and outputing all subsequent lines until the first occurence of the ending datetime you put as second argument. This means that the starting AND the ending time must actually figure in the logfile. if you put 15:20 and there's no 15:20 in the logfile, it will not work !

THE SCRIPT

import optparse
import commands

#print "usage : logextract.py start-date end-date logfile maxlines (default = 1000)"

parser = optparse.OptionParser()
options,args = parser.parse_args()

pattern = args[0]
end     = args[1]
logfile = args[2]
limit   = 1000
if len(args) > 3:
    limit   = args[3]

output = commands.getoutput("grep -b '%s' %s | head -1" % (pattern,logfile))
offset = output.split(":")[0]
fd = open(logfile)
fd.seek(int(offset))
i = 0
lines = []

while i < limit :
    line = fd.readline()
    if line.startswith(end) or not line:
        break
    lines.append(line)
    i+=1

print "\n".join(lines)

contact : @ychaouche yacinechaouche at yahoocom