awk for Log Parsing: 5 Patterns You'll Actually Use

awk Is a Mini-Language

Most people treat awk as a line processor. It is—but it’s also a full programming language with variables, loops, and functions. For logs, you barely need that. Five patterns cover everything.

The basic structure:

awk 'pattern { action }' file.log

If the pattern matches, the action runs. If you omit the pattern, the action runs for every line.

1. Filter Lines by Condition

Extract 404 errors from an Apache log:

awk '$9 == 404' access.log

$9 is the 9th field (the HTTP status code). This prints every line where status is 404.

More complex: status code AND a path:

awk '$9 == 500 && $7 ~ /api/' access.log

$7 is the path. ~ means “matches regex”. This gets 500 errors from /api/* paths.

Need the opposite? Use !~:

awk '$7 !~ /favicon|static/' access.log

Exclude favicon and static requests.

2. Count Occurrences

How many 404s are in your log?

awk '$9 == 404' access.log | wc -l

But awk is faster:

awk '$9 == 404 { count++ } END { print count }' access.log

{ count++ } runs for each matching line. END runs after all lines. print count outputs the total.

Count by status code:

awk '{ status[$9]++ } END { for (code in status) print code, status[code] }' access.log

status[$9] is an associative array keyed by HTTP status. After processing all lines, loop through and print counts.

Output:

200 15432
404 234
500 12

3. Sum a Field

Your app logs request latency in milliseconds. What’s the total? The average?

awk -F: '{ total += $3; count++ } END { print "Total:", total, "ms | Average:", total/count, "ms" }' latency.log

-F: sets the field separator to : (useful if your log format uses colons). $3 is the latency field. Accumulate in total, count lines, then divide for average at the end.

Input log line:

request:user123:145
request:user456:89
request:user789:201

Output:

Total: 435 ms | Average: 145 ms

4. Extract and Reformat

You have tab-separated logs. Extract name and email, reformat as CSV:

awk -F'\t' '{print $2 "," $3}' users.log

Input (tab-separated):

ID  Name  Email
1  alice  [email protected]
2  bob  [email protected]

Output:

Name,Email
alice,[email protected]
bob,[email protected]

More advanced: extract a date range:

awk -F'[: ]' '$4 >= "09:00" && $4 < "17:00"' access.log

-F'[: ]' uses multiple delimiters (colon or space). $4 is the hour. This gets logs between 9 AM and 5 PM.

5. Conditional Formatting

Print lines longer than 1000 characters with line numbers:

awk 'length > 1000 { print NR": " $0 }' largefile.log

length is the line length. NR is the line number. $0 is the entire line.

Another common one: print lines matching a pattern with context (2 lines before and after):

awk '/ERROR/ { for (i = 1; i <= 2; i++) if (NR - i in a) print a[NR - i]; print NR": " $0; next } { a[NR] = $0 }' app.log

Actually, that’s getting gnarly. For context, use grep:

grep -B 2 -A 2 'ERROR' app.log

But within awk? Mark important lines:

awk '/ERROR|FATAL/ { print "*** " $0 " ***" } !/ERROR|FATAL/ { print $0 }' app.log

Real Example: Analyzing a Web Server Log

You have an Apache log. You want:

Count requests per HTTP status
Find the slowest requests
Show only non-2xx/3xx responses

# 1. Counts by status
awk '{ status[$9]++ } END { for (s in status) print s ": " status[s] }' access.log | sort -t: -k2 -rn

# 2. Top 10 slowest requests
awk '{ print $10, $7 }' access.log | sort -rn | head -10

# 3. Filter to error codes
awk '$9 ~ /^[45]/' access.log

Line 1: count by $9 (status), sort by count descending. Line 2: print response time ($10), then path ($7). Sort by time descending. Line 3: regex match—if status starts with 4 or 5, print it.

When to Stop Using awk

If your log parsing needs:

Complex regex with named groups
Multiple passes over the data
JSON parsing

Then use jq (JSON) or switch to Python. awk is fast and scriptable, but it has limits.

For everything else—filtering, summing, reformatting—awk is the 20-year-old tool that still outperforms the new hotness.

The Gotcha That Will Bite You: Field Separators and Whitespace

Here’s the thing people get wrong with awk at 2 AM: the default field separator is “any whitespace,” which means consecutive spaces get collapsed into a single delimiter. That’s usually fine—until you’re parsing something with intentional empty fields.

Say your app log looks like this:

$ cat app.log
2025-06-17 ERROR  api/users  42ms
2025-06-17 INFO   api/health 5ms
2025-06-17 ERROR             15ms

See that empty field on line 3? With the default separator, $3 won’t be empty—awk squashes the whitespace and shifts every field left. Your latency ends up in the wrong column.

Fix it with a literal tab separator if you control the log format:

$ awk -F'\t' '{ print $2, $4 }' app.log
ERROR 42ms
INFO 5ms
ERROR 15ms

Or if you’re stuck with the existing format and need to detect empty fields, use a regex separator that preserves structure:

$ awk -F' {2,}' '{ print NF, $0 }' app.log

-F' {2,}' splits on two or more spaces, so a single space inside a field doesn’t accidentally become a delimiter. This is the approach that saves you when logs use aligned columns padded with spaces.

One more: if you’re piping awk into sort and the numbers aren’t sorting correctly, you’re probably sorting lexicographically. Add -k1,1n (numeric sort) to fix it:

$ awk '{ print $10, $7 }' access.log | sort -k1,1n | tail -10

That’s the difference between “top 10 by bytes” and “top 10 alphabetically by the first digit of bytes.” Your 9999ms request ends up below 10ms otherwise. Fun times.

awk for Log Parsing: 5 Patterns You'll Actually Use

awk Is a Mini-Language

1. Filter Lines by Condition

2. Count Occurrences

3. Sum a Field

4. Extract and Reformat

5. Conditional Formatting

Real Example: Analyzing a Web Server Log

When to Stop Using awk

The Gotcha That Will Bite You: Field Separators and Whitespace

Responses from around the web

Discussion

Related Posts

SumGuy’s Guide to Linux Log Analysis

Logrotate & Compression

Bulk File Renaming on Linux: rename, vidir, fd

Named Pipes (FIFOs) in Shell Scripts

awk for Log Parsing: 5 Patterns You'll Actually Use

awk Is a Mini-Language

1. Filter Lines by Condition

2. Count Occurrences

3. Sum a Field

4. Extract and Reformat

5. Conditional Formatting

Real Example: Analyzing a Web Server Log

When to Stop Using awk

The Gotcha That Will Bite You: Field Separators and Whitespace

Related Reading

Responses from around the web

Discussion

Related Posts

SumGuy’s Guide to Linux Log Analysis

Logrotate & Compression

Bulk File Renaming on Linux: rename, vidir, fd

Named Pipes (FIFOs) in Shell Scripts