Parsing logfiles With Python

13 Aug 2023 - rich

not everything here, but just some of the typical stuff you might use to parse logfiles (standard as well as customized) using python. also, perhaps, some of the regular expressions needed to parse as well.

Regular expressions

very much a challenge (not with using python, but trying to keep regular expressions straight since the world changed when perl did regex) so, keeping track of regex and which utility does which versions… bash e.g. did regex long before perl did… hence a bit different.

as an example, a (half-hearted) attempt to parse access log using sed…

cat /var/log/nginx/access.log | sed -r 's/(^[0-9.]+) .+(\[.+\]) \"([EGOPST\"]+) (\/.+)HTTP\/1\.1\" ([0-9]+)+ .+/\4 \5/' 

[13/Aug/2023:00:06:00 -0400] GET /.env  404
[13/Aug/2023:00:06:01 -0400] POST /  405
[13/Aug/2023:00:12:02 -0400] GET /  301
[13/Aug/2023:00:23:39 -0400] GET /.env  301
[13/Aug/2023:00:23:39 -0400] POST /  301
[13/Aug/2023:00:34:43 -0400] GET /  301
[13/Aug/2023:00:34:44 -0400] GET /  200