Skip to content
Go back

awk for Log Parsing: 5 Patterns You'll Actually Use

· Updated:
By SumGuy 6 min read
awk for Log Parsing: 5 Patterns You'll Actually Use

awk Is a Mini-Language

Most people treat awk as a line processor. It is—but it’s also a full programming language with variables, loops, and functions. For logs, you barely need that. Five patterns cover everything.

The basic structure:

Terminal window
awk 'pattern { action }' file.log

If the pattern matches, the action runs. If you omit the pattern, the action runs for every line.

1. Filter Lines by Condition

Extract 404 errors from an Apache log:

Terminal window
awk '$9 == 404' access.log

$9 is the 9th field (the HTTP status code). This prints every line where status is 404.

More complex: status code AND a path:

Terminal window
awk '$9 == 500 && $7 ~ /api/' access.log

$7 is the path. ~ means “matches regex”. This gets 500 errors from /api/* paths.

Need the opposite? Use !~:

Terminal window
awk '$7 !~ /favicon|static/' access.log

Exclude favicon and static requests.

2. Count Occurrences

How many 404s are in your log?

Terminal window
awk '$9 == 404' access.log | wc -l

But awk is faster:

Terminal window
awk '$9 == 404 { count++ } END { print count }' access.log

{ count++ } runs for each matching line. END runs after all lines. print count outputs the total.

Count by status code:

Terminal window
awk '{ status[$9]++ } END { for (code in status) print code, status[code] }' access.log

status[$9] is an associative array keyed by HTTP status. After processing all lines, loop through and print counts.

Output:

200 15432
404 234
500 12

3. Sum a Field

Your app logs request latency in milliseconds. What’s the total? The average?

Terminal window
awk -F: '{ total += $3; count++ } END { print "Total:", total, "ms | Average:", total/count, "ms" }' latency.log

-F: sets the field separator to : (useful if your log format uses colons). $3 is the latency field. Accumulate in total, count lines, then divide for average at the end.

Input log line:

request:user123:145
request:user456:89
request:user789:201

Output:

Total: 435 ms | Average: 145 ms

4. Extract and Reformat

You have tab-separated logs. Extract name and email, reformat as CSV:

Terminal window
awk -F'\t' '{print $2 "," $3}' users.log

Input (tab-separated):

ID Name Email

Output:

More advanced: extract a date range:

Terminal window
awk -F'[: ]' '$4 >= "09:00" && $4 < "17:00"' access.log

-F'[: ]' uses multiple delimiters (colon or space). $4 is the hour. This gets logs between 9 AM and 5 PM.

5. Conditional Formatting

Print lines longer than 1000 characters with line numbers:

Terminal window
awk 'length > 1000 { print NR": " $0 }' largefile.log

length is the line length. NR is the line number. $0 is the entire line.

Another common one: print lines matching a pattern with context (2 lines before and after):

Terminal window
awk '/ERROR/ { for (i = 1; i <= 2; i++) if (NR - i in a) print a[NR - i]; print NR": " $0; next } { a[NR] = $0 }' app.log

Actually, that’s getting gnarly. For context, use grep:

Terminal window
grep -B 2 -A 2 'ERROR' app.log

But within awk? Mark important lines:

Terminal window
awk '/ERROR|FATAL/ { print "*** " $0 " ***" } !/ERROR|FATAL/ { print $0 }' app.log

Real Example: Analyzing a Web Server Log

You have an Apache log. You want:

  1. Count requests per HTTP status
  2. Find the slowest requests
  3. Show only non-2xx/3xx responses
Terminal window
# 1. Counts by status
awk '{ status[$9]++ } END { for (s in status) print s ": " status[s] }' access.log | sort -t: -k2 -rn
# 2. Top 10 slowest requests
awk '{ print $10, $7 }' access.log | sort -rn | head -10
# 3. Filter to error codes
awk '$9 ~ /^[45]/' access.log

Line 1: count by $9 (status), sort by count descending. Line 2: print response time ($10), then path ($7). Sort by time descending. Line 3: regex match—if status starts with 4 or 5, print it.

When to Stop Using awk

If your log parsing needs:

Then use jq (JSON) or switch to Python. awk is fast and scriptable, but it has limits.

For everything else—filtering, summing, reformatting—awk is the 20-year-old tool that still outperforms the new hotness.

The Gotcha That Will Bite You: Field Separators and Whitespace

Here’s the thing people get wrong with awk at 2 AM: the default field separator is “any whitespace,” which means consecutive spaces get collapsed into a single delimiter. That’s usually fine—until you’re parsing something with intentional empty fields.

Say your app log looks like this:

Terminal window
$ cat app.log
2025-06-17 ERROR api/users 42ms
2025-06-17 INFO api/health 5ms
2025-06-17 ERROR 15ms

See that empty field on line 3? With the default separator, $3 won’t be empty—awk squashes the whitespace and shifts every field left. Your latency ends up in the wrong column.

Fix it with a literal tab separator if you control the log format:

Terminal window
$ awk -F'\t' '{ print $2, $4 }' app.log
ERROR 42ms
INFO 5ms
ERROR 15ms

Or if you’re stuck with the existing format and need to detect empty fields, use a regex separator that preserves structure:

Terminal window
$ awk -F' {2,}' '{ print NF, $0 }' app.log

-F' {2,}' splits on two or more spaces, so a single space inside a field doesn’t accidentally become a delimiter. This is the approach that saves you when logs use aligned columns padded with spaces.

One more: if you’re piping awk into sort and the numbers aren’t sorting correctly, you’re probably sorting lexicographically. Add -k1,1n (numeric sort) to fix it:

Terminal window
$ awk '{ print $10, $7 }' access.log | sort -k1,1n | tail -10

That’s the difference between “top 10 by bytes” and “top 10 alphabetically by the first digit of bytes.” Your 9999ms request ends up below 10ms otherwise. Fun times.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Appwrite Backend-as-a-service (BaaS)
Next Post
Bash One-Liners Worth Remembering

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts