Identifying main traffic sources with netstat and awk (one-liner explained)

March 27, 2011

This is just a short guide on how to find the main offenders in case of web server hammering.

Sample of eventual output:

netstat -natp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n | tail

     25 195.150.23.130
     25 67.222.164.140
     28 95.34.20.117
     31 72.45.232.204
     34 209.56.4.6
     36 64.27.200.208
    106 50.17.245.114
    112 209.234.226.230
    247 216.242.75.236
    283 184.106.21.219

Could be applied to any other sort of TCP abuse of course – just grep some other port (i.e. instead of “:80″ do “:25″ for mail for instance).

I will break it down with some basic explanations along the way for those less familiar with the command line.

First of all – how many connections are there to the web server:

netstat -natp | grep :80 | wc -l
459

netstat is a very versatile tool.
In this case, the flags being used state the following:

-n” Numerical representation of the hosts rather than attempt to resolve to addresses
-a” All traffic (listening and non-listening sockets)
-t” TCP traffic only (UDP is a whole other ballgame)
-p” The PID of the process using the port – Just a course of habit for me – since I usually want to know who is listening and taking up a port

grep :80

Since this example deals with a web server, port 80 is a pretty good representation of a connection to the web server.

wc -l

Count the lines (just to see what kind of numbers we are dealing with).

A typical output for netstat -natp | grep :80:

tcp        0      0 123.231.146.176:80          92.35.20.117:12205          TIME_WAIT   -                  
tcp        0      0 123.231.146.176:80          92.35.20.117:64428          TIME_WAIT   -                  
tcp        0      0 123.231.146.190:80          92.35.20.117:20645          TIME_WAIT   -                  
tcp        0   2885 123.231.146.176:80          92.35.20.117:57267          ESTABLISHED 10439/nginx: worker
tcp        0      0 123.231.146.190:80          92.35.20.117:50365          TIME_WAIT   -                  
tcp        0      0 123.231.146.176:80          92.35.20.117:52670          TIME_WAIT   -                  
tcp        0      0 123.231.146.176:40214       69.4.187.136:80             TIME_WAIT   -                  
tcp        0      0 123.231.146.176:40199       69.4.187.136:80             TIME_WAIT   -                  
tcp        0      0 123.231.146.176:40248       69.4.187.136:80             TIME_WAIT   -                  
tcp        0      0 123.231.146.176:40229       69.4.187.136:80             TIME_WAIT   -                  
tcp        0      0 123.231.146.176:40151       69.4.187.136:80             TIME_WAIT   -                  
tcp        0      0 123.231.146.176:40185       69.4.187.136:80             TIME_WAIT   -                  
tcp        0      0 123.231.146.176:40169       69.4.187.136:80             TIME_WAIT   -                  
tcp        0      0 123.231.146.176:80          92.35.20.117:10938          FIN_WAIT2   -                  
tcp        0      0 123.231.146.176:80          92.35.20.117:63684          TIME_WAIT   -                  
tcp        0      0 123.231.146.190:80          92.35.20.117:62401          TIME_WAIT   -

All addresses have been obfuscated to protect the innocent…

The column we are mainly interested in is the “Foreign Address” column.
Here we see the origin of the connections to our server.

A small tip that is a bit beyond the scope of this post, but is worth going off topic for a few moments – if you have an overwhelming amount of TIME_WAIT states – you might want to check the sysctl variable “net.ipv4.tcp_fin_timeout” which is by default 60 seconds, which can be quite a lot for heavy traffic web servers.

Anyhow, from here we will do a little text manipulation with the aid of *nix native tools: awk, sort and uniq – to get a nice representation of the top port 80 tcp offenders.

awk '{print $5}'

Will give us the fifth column:

92.35.20.117:12205
92.35.20.117:64428
92.35.20.117:20645
92.35.20.117:57267
92.35.20.117:50365
92.35.20.117:52670
69.4.187.136:80
69.4.187.136:80
69.4.187.136:80
69.4.187.136:80
69.4.187.136:80
69.4.187.136:80
69.4.187.136:80
92.35.20.117:10938
92.35.20.117:63684
92.35.20.117:62401

awk of course is a very powerful tool – however here we will just be using its most common function – printing a specific column.
We still need to clean it up a bit though – since we don’t really care about the remote port.

In this case we can either invoke another awk, or use the simple tool cut – here are the two options:

awk -F ":" '{print $1}'
cut -d: -f1

These two will basically do the same thing: in awk, the “-F” flag states the field delimiter (in this case the colon “:”) and print the first column.
With cut, the “-d” flag states the delimiter (in this case the colon), and “-f1” tells it to use the first field.

Now we finally have a simple clean list of lots of IPs.

All that is left is to sort them, count how many unique IPs there are and sort the output of that test.

sort | uniq -c | sort -n

First we must sort, otherwise uniq doesn’t work.

-c” tells uniq to count the occurences of each unique object.

In sort, “-n” tells it to do a proper numerical sorting rather than alphabetical, otherwise “10″ will come before “2″.

Finally the one-liner and its output:

netstat -natp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n | tail

     25 195.150.23.130
     25 67.222.164.140
     28 95.34.20.117
     31 72.45.232.204
     34 209.56.4.6
     36 64.27.200.208
    106 50.17.245.114
    112 209.234.226.230
    247 216.242.75.236
    283 184.106.21.219

This means that 184.106.21.219 and 216.242.75.236 alone comprise more than half of all connections to the server at port 80 – might raise a few red flags…

It may seem a bit cumbersome at first, but before you know it, one-liners like these become a second nature.

Hope you find it helpful!

Tom.

tags: , , , , , , , , , , ,
posted in one-liners by tom

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Leave Your Comment

 
Powered by Wordpress. Theme by Shlomi Noach, openark.org
Hosted by Evolution On-line