Home > Computers > linux > logsearch | About
One line of sed is worth 25 lines of python
sed -n '/Feb 23 13:55/,/Feb 23 14:00/p' /var/log/mail.log Feb 23 13:55:01 messagerie postfix/smtpd[20964]: connect from localhost[127.0.0.1] Feb 23 13:55:01 messagerie postfix/smtpd[20964]: lost connection after CONNECT from localhost[127.0.0.1] Feb 23 13:55:01 messagerie postfix/smtpd[20964]: disconnect from localhost[127.0.0.1] Feb 23 13:55:01 messagerie pop3d: Connection, ip=[::ffff:127.0.0.1] ...
How sed works
the -n switch
tells sed to not output each line of the file it reads (default behaviour)
the last p
(right after the last /) tells it to print the line. What line ? that's what the previous expression is for.
'/pattern/,/pattern/'
will print everything that is between first pattern and second pattern.
A sed program consists of adresses and commands
sed [options] address command.
For example : sed -n /PATTERN/p
will tell sed to print (the p at the end) only those lines of input stream that match the address given by PATTERN. In this example, p is the command and /PATTERN/ is the address.
Line numbers and patterns : two ways to specify addresses
If addresses are given between two forward slashes they are considered as regular expressions, otherwise they are considered as line numbers, so 0 for first line, N for the nth line and $ for the last line.
So for example, this code
sed -n 5p /var/log/infomaniak/bottle.log
Will print the 5th line of /var/log/infoamniak/bottle.log. Here's an output of head -5 that shows that 5th line is really that line:
ychaouche@ychaouche-PC ~ $ sed -n 5p /var/log/infomaniak/bottle.log 2015-03-11 07:56:23,280 [CRITICAL] status file not created for chaine1 ychaouche@ychaouche-PC ~ $ head -5 /var/log/infomaniak/bottle.log 2015-03-11 07:55:06,196 [INFO] Reading config file Ok... 2015-03-11 07:56:05,623 [INFO] Reading config file Ok... 2015-03-11 07:56:05,626 [CRITICAL] status file not created for chaine1 2015-03-11 07:56:23,277 [INFO] Reading config file Ok... 2015-03-11 07:56:23,280 [CRITICAL] status file not created for chaine1 ychaouche@ychaouche-PC ~ $
One single pattern is equivalent to mulitple line numbers, which are all the line numbers of the lines where that pattern is found.
Address ranges
Sed also accepts address ranges, that is two addresses separated by a comma. In our example, we used that form of address :
sed -n /PATTERN1/,/PATTERN2/p inputfile.
When an address range is specified, sed will print every line that is in that range. So if for example PATTERN1 is found at line 10 and PATTERN2 is found at line 430, then all the lines 10,11,…430 are all printed.
The equivalent script in python
This little script searches logfiles for all the messages that happened between two datetimes that you specify on the command line. For example, if you want all the log messages from /var/log/syslog that happened on Feb 23 between 13:50 and 15:17, you'd call it like this :
ychaouche@ychaouche-PC ~/CODE $ python logextract.py "Feb 23 13:50" "Feb 23 15:17" /var/log/syslog Feb 23 13:50:06 ychaouche-PC kernel: [24455.249826] hub 1-1:1.0: port 1 disabled by hub (EMI?), re-enabling... Feb 23 13:50:06 ychaouche-PC kernel: [24455.249838] usb 1-1.1: USB disconnect, device number 9 Feb 23 13:50:06 ychaouche-PC acpid: input device has been disconnected, fd 6 Feb 23 13:50:06 ychaouche-PC acpid: input device has been disconnected, fd 7 Feb 23 13:50:06 ychaouche-PC kernel: [24455.520977] usb 1-1.1: new low-speed USB device number 10 using ehci-pci Feb 23 13:50:06 ychaouche-PC kernel: [24455.624107] usb 1-1.1: New USB device found, idVendor=1c4f, idProduct=0002 Feb 23 13:50:06 ychaouche-PC kernel: [24455.624113] usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Feb 23 13:50:06 ychaouche-PC kernel: [24455.624115] usb 1-1.1: Product: USB Keykoard Feb 23 13:50:06 ychaouche-PC kernel: [24455.624118] usb 1-1.1: Manufacturer: USB Feb 23 13:50:06 ychaouche-PC kernel: [24455.627157] input: USB USB Keykoard as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.11-1.1:1.0/input/input22 Feb 23 13:50:06 ychaouche-PC kernel: [24455.627373] hid-generic 0003:1C4F:0002.000D: input,hidraw0: USB HID v1.10 Keyboard [USB USB Keykoard] on usb-0000:00:1a.0-1.1/input0 Feb 23 13:50:06 ychaouche-PC kernel: [24455.630013] input: USB USB Keykoard as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.1/1-1.1:1.1/input/input23 Feb 23 13:50:06 ychaouche-PC kernel: [24455.630213] hid-generic 0003:1C4F:0002.000E: input,hidraw1: USB HID v1.10 Device [USB USB Keykoard] on usb-0000:00:1a.0-1.1/input1 Feb 23 13:50:06 ychaouche-PC kernel: [24455.630580] usb 1-1.2: USB disconnect, device number 8 Feb 23 13:50:06 ychaouche-PC mtp-probe: checking bus 1, device 10: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.1" Feb 23 13:50:06 ychaouche-PC mtp-probe: bus: 1, device: 10 was not an MTP device Feb 23 13:50:11 ychaouche-PC kernel: [24460.568805] usb 1-1.2: new low-speed USB device number 11 using ehci-pci Feb 23 13:50:11 ychaouche-PC kernel: [24460.665924] usb 1-1.2: New USB device found, idVendor=1bcf, idProduct=0007 Feb 23 13:50:11 ychaouche-PC kernel: [24460.665931] usb 1-1.2: New USB device strings: Mfr=0, Product=2, SerialNumber=0 Feb 23 13:50:11 ychaouche-PC kernel: [24460.665934] usb 1-1.2: Product: USB Optical Mouse Feb 23 13:50:11 ychaouche-PC kernel: [24460.672912] input: USB Optical Mouse as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2:1.0/input/input24 Feb 23 13:50:11 ychaouche-PC kernel: [24460.673292] hid-generic 0003:1BCF:0007.000F: input,hiddev0,hidraw2: USB HID v1.10 Mouse [USB Optical Mouse] on usb-0000:00:1a.0-1.2/input0 Feb 23 13:50:11 ychaouche-PC mtp-probe: checking bus 1, device 11: "/sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2" Feb 23 13:50:11 ychaouche-PC mtp-probe: bus: 1, device: 11 was not an MTP device Feb 23 14:09:01 ychaouche-PC CRON[7416]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime)) Feb 23 14:17:01 ychaouche-PC CRON[7513]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Feb 23 14:39:01 ychaouche-PC CRON[7643]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime)) Feb 23 15:09:01 ychaouche-PC CRON[7869]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime)) Feb 23 15:17:01 ychaouche-PC CRON[8027]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) ychaouche@ychaouche-PC ~/CODE $
The script will simply do a grep on the file to locate the first occurence of the starting datetime you put as first argument, then seeks to that position in the file and start reading and outputing all subsequent lines until the first occurence of the ending datetime you put as second argument. This means that the starting AND the ending time must actually figure in the logfile. if you put 15:20 and there's no 15:20 in the logfile, it will not work !
THE SCRIPT
import optparse import commands #print "usage : logextract.py start-date end-date logfile maxlines (default = 1000)" parser = optparse.OptionParser() options,args = parser.parse_args() pattern = args[0] end = args[1] logfile = args[2] limit = 1000 if len(args) > 3: limit = args[3] output = commands.getoutput("grep -b '%s' %s | head -1" % (pattern,logfile)) offset = output.split(":")[0] fd = open(logfile) fd.seek(int(offset)) i = 0 lines = [] while i < limit : line = fd.readline() if line.startswith(end) or not line: break lines.append(line) i+=1 print "\n".join(lines)
contact : @ychaouche yacinechaouche at yahoocom