How to preserve quoted strings in python split

If you want to analyse for example apache log files and split the lines by space by using the usual “split” method, you will see that split doesn’t respect quoted strings. For example if you have a line like below;

192.168.2.1 – – [06/Mar/2012:10:02:22 +0100] “GET /2011/10/19/jncip-sec-exam/ HTTP/1.1” 200 3331 “-” “mm”

You can’t get the HTTP_REQUEST easily with split. There is a very nice module named
shlex which allows you to split strings by space and treats quoted strings as single columns. Below is an example of my code which shows how you can fetch HTTP REQUEST from an apache log.

#!/usr/bin/python
import sys
import shlex

file_path=sys.argv[1]
log_fh=open(file_path,'r')
for line in log_fh.readlines():
  http_request=shlex.split(line)[5]
  print http_request
log_fh.close()

About: rtoodtoo

Worked for more than 10 years as a Network/Support Engineer and also interested in Python, Linux, Security and SD-WAN // JNCIE-SEC #223 / RHCE / PCNSE


You have a feedback?

Discover more from RtoDto.net

Subscribe now to keep reading and get access to the full archive.

Continue reading