How to preserve quoted strings in python split
If you want to analyse for example apache log files and split the lines by space by using the usual “split” method, you will see that split doesn’t respect quoted strings. For example if you have a line like below;
192.168.2.1 – – [06/Mar/2012:10:02:22 +0100] “GET /2011/10/19/jncip-sec-exam/ HTTP/1.1” 200 3331 “-” “mm”
You can’t get the HTTP_REQUEST easily with split. There is a very nice module named
shlex which allows you to split strings by space and treats quoted strings as single columns. Below is an example of my code which shows how you can fetch HTTP REQUEST from an apache log.
#!/usr/bin/python import sys import shlex file_path=sys.argv[1] log_fh=open(file_path,'r') for line in log_fh.readlines(): http_request=shlex.split(line)[5] print http_request log_fh.close()