Linux Technical Track

SYSQL: SysAdmin Query Language

20 min read

Jul 20, 2021

What is SYSQL?

Recently Tanel Poder and I were discussing the use of Linux command pipelines for data analysis, and were wondering what to call that process.

We debated a few names, until we came to SYSQL: SysAdmin Query Language.

We liked the name, and it stuck. So SYSQL is what I’m calling the use of Linux command pipelines for data analysis.

After some time working on Linux and other Unix-based systems, most folks will become familiar with command line pipelines. Sometimes this may also be referred to as “chaining” commands.

A pipeline is nothing more than piping the output of one command to the input of another program, using the pipeline operator: |.

If you’d like to learn how to use the simple concept of command pipelines to perform some surprisingly complex data analysis, this is the post for you.

Let’s start with some simple examples.

For instance, counting the number of processes started by the grid user on a remote server:

$ ssh oracle@orasrvr ps -ugrid | wc -l
59

We can easily extend the pipeline to run through several commands, getting just the data we want.

The following command gets all the unique process names started by the grid user on orasrvr:

$ ssh oracle@orasrvr ps -ugrid | tail -n +2 | awk '{ print $NF }' | sort -u
asm_asmb_+asm1
asm_ckpt_+asm1
asm_clmn_+asm1
...
oracle_29995_+a
oracle_30009_+a
scriptagent.bin
tnslsnr

In the same manner, we can use command pipelines for ad hoc data analysis on log files, trace files and most any other text-based file.

In each of these examples, we could use some quoting to reduce the network traffic, as ssh is sending the data back to the local machine to be processed by tail, awk, etc.

Here are the commands quoted so that all runs remotely. Something to keep in mind is that there may be extra processes appearing in the output when the logon user and the target user are the same, as the ssh login and the command being run will both appear to ps.

In this case, the logon user is oracle, and the target user is grid, so the output doesn’t differ from the previous examples.

$  ssh oracle@ora192rac01 "ps -ugrid | wc -l"
60

$ ssh oracle@ora192rac01 "ps -ugrid | tail -n +2 | awk '{ print \$NF }' | sort -u"
asm_asmb_+asm1
asm_ckpt_+asm1
...
scriptagent.bin
tnslsnr

If the target and logon user are both oracle, the extra processes will appear:

$ ssh oracle@orasrvr ps -uoracle | wc -l
120

$ ssh oracle@orasrvr 'ps -uoracle | wc -l'
122

Tools of the trade

I’m going to assume you have some familiarity with the tools used here.

This won’t be a tutorial on their use, but rather, I’ll provide some examples on how you may expand on the use of these tools (for those of you not already doing so, that is).

Following are some, if not most of the tools you can use for command line analysis of files.

grep

The name of the grep command is an acronym of sorts. It stands for “globally search for regular expressions and print.”

tail

The tail command is used to display only a set number of lines from the end of its input.

head

The head command displays only a set number of lines from the beginning of a file.

sed

The sed command is a stream editor.

sed uses regular expressions to include, exclude and transform data in a stream.

cut

The cut command is used to select columns from a line, based on a delimiting character.

$ echo get,just,the,fourth,word | cut -d, -f4
fourth

$ echo get just the fourth word | cut -d' ' -f4
fourth

awk

The awk command takes its name from the authors: Alfred Aho, Peter Weinberger and Brian Kernighan.

awk is a programming language designed to work on data. Its use has been somewhat supplanted in recent years by Perl, Python and even advanced Bash features.

Even so, awk is still quite useful at times, as it was designed for ease of use in command pipelines.

A common use of awk is to cut columns from a line of text, similar to cut.

There are a few additional things awk can do.

The following example gets just the fourth word, as well as printing the number of words there are in total:

$ echo get just the fourth word | awk '{ print "fourth word: " , $4, " - number of words: ",  NF }'
fourth word:  fourth  - number of words:  5

The built-in variable NF refers to the number of fields in the line.

When prefaced with “$,” it will return the last field in the line:

$  echo get just the fourth word | awk '{ print "fourth word: " , $4, " - the last word: ",  $NF }'
fourth word:  fourth  - the last word:  word

perl

Perl is an interpreted programming language.

$ echo get just the fourth word | perl -e '$i=<STDIN>; @a = split(/\s/,$i); print "$a[3]\n"'
fourth

While Perl is quite powerful, and I have made extensive use of Perl over the years, I don’t often use CLI Perl in pipelines.

For that purpose, awk is usually easier.

The exception is the use of Perl scripts that are created as filters. I won’t be demonstrating that in this article, though.

sort

The sort command does just what the name implies; it sorts data.

The data can be sorted by the entire line in the simplest case.

You can also perform sorts on “keys” found in the line.

The data can be sorted as text or numbers, and can be shown in reverse order.

uniq

The uniq command is used to deduplicate data sent to it.

Optionally, you may also use uniq to count the number of occurrences of a string.

wc

The wc command counts lines, words and characters.

tr

You can use the tr utility to transform characters. It isn’t as robust as sed, but nonetheless useful at times.

FETCH and EXE times

Let’s use SYSQL to get some FETCH and EXE information from an Oracle SQL trace file.

The trace file trace/cdb1_ora_5689_SELECT.trc contains some trace data for an SQL we’re interested in.

Most Oracle trace files contain multiple SQL statements. In addition, the cursor number may be reused.

If the cursor number has been reused, that complicates things a bit, and a one-line probe of the trace probably won’t have the desired results.

First, we find the cursor handle.

By using grep -A 6, I can see the next six lines after the target word of PARSING:

  grep -A 6 PARSING trace/cdb1_ora_5689_SELECT.trc
PARSING IN CURSOR #140154018147080 len=52 dep=0 uid=92 oct=47 lid=92 tim=532738009488 hv=1029988163 ad='838081e8' sqlid='9babjv8yq8ru3'
BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
END OF STMT
PARSE #140154018147080:c=44,e=44,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=532738009487
BINDS #140154018147080:

 Bind#0
--
PARSING IN CURSOR #140154018122048 len=97 dep=1 uid=0 oct=3 lid=0 tim=532740566203 hv=791757000 ad='96aeea08' sqlid='87gaftwrm2h68'
select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname from obj$ o where o.obj#=:1
END OF STMT
PARSE #140154018122048:c=1,e=434,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=532740566202
WAIT #140154018122048: nam='PGA memory operation' ela= 10 p1=65536 p2=2 p3=0 obj#=-1 tim=532740566836
WAIT #140154018122048: nam='PGA memory operation' ela= 4 p1=65536 p2=1 p3=0 obj#=-1 tim=532740567003
WAIT #140154018122048: nam='PGA memory operation' ela= 3 p1=65536 p2=1 p3=0 obj#=-1 tim=532740567193
--
PARSING IN CURSOR #140154018097232 len=153 dep=0 uid=92 oct=3 lid=92 tim=532740569568 hv=1226046019 ad='6a8c9b20' sqlid='dzmytmp4j7yk3'
select
        s.username,
        s.sid,
        p.spid
from v$session s, v$process p
where s.sid = sys_context('userenv','sid')
--
PARSING IN CURSOR #140154018098000 len=104 dep=0 uid=92 oct=47 lid=92 tim=532743008720 hv=2439783906 ad='658980a0' sqlid='2s2qwhy8qs9g2'
declare
        vDate date;
begin
        for i in 1..1e6
        loop
                select sysdate into vDate from dual;
--
PARSING IN CURSOR #140154018142216 len=24 dep=1 uid=92 oct=3 lid=92 tim=532743014936 hv=124468195 ad='7e551ea0' sqlid='c749bc43qqfz3'
SELECT SYSDATE FROM DUAL
END OF STMT
PARSE #140154018142216:c=3689,e=5599,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=1388734953,tim=532743014934
EXEC #140154018142216:c=59,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=532743015228
FETCH #140154018142216:c=34,e=35,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=1,plh=1388734953,tim=532743015381
STAT #140154018142216 id=1 cnt=1 pid=0 pos=1 obj=0 op='FAST DUAL  (cr=0 pr=0 pw=0 str=1 time=2 us cost=2 size=0 card=1)'

The SQL I’m interested in is “SELECT SYSDATE FROM DUAL.”

The cursor for this SQL (cursor #140154018142216) is being called 1,000,000 times by the PL/SQL block (cursor #140154018098000).

I can verify this in the following way:

$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc | wc -l
1000000

As expected, the SQL that selects from DUAL appears 1,000,000 times.

Get EXEC and FETCH times

In a performance engagement I’ll typically use a tool such as Method R Workbench to fully analyze Oracle trace files.

At times though, I may want a simpler method to get at this data.

There are a couple of reasons I might use CLI tools for initial analysis:

I’m not allowed to transfer trace files off the server.
I’m not allowed to install any software on the server, even in my home directory.

And even when I can make use of Method R, I may want to use some simple tools to get an idea of where my efforts should be focused.

So, let’s get the FETCH and EXE times for this SQL.

To do this, I need to know which fields in these lines are of interest.

Here’s an EXEC line:

EXEC #140154018142216:c=59,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=532743015228

And here’s a FETCH line:

FETCH #140154018142216:c=34,e=35,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=1,plh=1388734953,tim=532743015381

The item of interest in each line is the value “e=N,” where “e” refers to the elapsed timed of the operation.

In each this appears to be the second field of comma delimited data. Let’s verify:

$  grep 'EXEC #140154018142216' trace/cdb1_ora_5689_SELECT.trc  | head -1 | cut -d, -f2
e=59

$  grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc  | head -1 | cut -d, -f2
e=35

Now it’s necessary to separate the value from its label.

It’s convenient to initially limit the amount of data used via the head command, as the trace file may be very large.

Here I’m using the cut command again to separate the value from “e=N.”

$  grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc  | head -10 | cut -d, -f2 | cut -f2 -d=
35
7
10
6
5
7
8
5
9
4

Seeing 10 of those is OK, but I can’t really make any sense of 1,000,000 appearing on the console, so I’ll use awk to help clarify the data.

For now, the head -10 remains, just to make experimentation run quickly, as the trace file is 256MB in size.

For readability purposes, I’ll write some of these command lines on several continuation lines.

$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc \
  | head -10 \
  | cut -d, -f2 \
  | cut -f2 -d= \
  | awk 'BEGIN{t=0;a=0}{ t += $1}END{a=t / NR;  print t,a}'
96 9.6

Here, I used the awk variable NR (number of records) to compute the average.

This has printed the (t)otal and (a)verage values for elapsed time.

If you’re familiar with awk, you may recall that it uses the “-F” option to set a field delimiter.

Using “-F=,” you can eliminate the second “cut” command:

$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc  \
  | head -10 \
  | cut -d, -f2 \
  | awk -F= 'BEGIN{t=0;a=0}{ t += $2}END{a=t / NR;  print t,a}'
96 9.6

In fact, you could also eliminate the first cut, and you could use the string splitting feature of awk.

However, that starts to be somewhat complicated, and one of the goals of SYSQL is to be simple to use. If I was building a shell script to run regularly, I’d be more concerned about that type of efficiency. For ad hoc analysis on the command line though, simplicity and ease of use is the rule of the day.

Finally, we can use printf to beautify the output a bit.

Doing so isn’t strictly necessary, but if you already know how to use printf, it’s a nice touch.

$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc  \
  | head -10 \
  | cut -d, -f2 \
  | awk -F= 'BEGIN{t=0;a=0}{ t += $2}END{a=t / NR;  printf("Total Elapsed microseconds: %10.2f\nAverage microseconds: %10.2f\n", t,a)}'
Total Elapsed microseconds:      96.00
Average microseconds:       9.60

Now, I’ll remove the data-limiting head command, and we can see the full report for EXEC and FETCH:

EXEC:

$ grep 'EXEC #140154018142216' trace/cdb1_ora_5689_SELECT.trc  \
  | cut -d, -f2 \
  | awk -F= 'BEGIN{t=0;a=0}{ t += $2}END{a=t / NR;  printf("Total Elapsed microseconds: %10.2f\nAverage microseconds: %10.2f\n", t,a)}'
Total Elapsed microseconds: 5362065.00
Average microseconds:       5.36

FETCH:

$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc  \
  | cut -d, -f2 \
  | awk -F= 'BEGIN{t=0;a=0}{ t += $2}END{a=t / NR;  printf("Total Elapsed microseconds: %10.2f\nAverage microseconds: %10.2f\n", t,a)}'
Total Elapsed microseconds: 1172276.00
Average microseconds:       1.17

Log writer (LGWR) trace file

Every time the Oracle LGWR process takes longer than 500 ms to write to the REDO log, the log writer trace file records the event.

Now let’s use SYSQL to learn something about LGWR performance.

There are four LGWR trace files:

$  ls -l trace/*lg*.trc
-rw-r----- 1 jkstill dba 36008 Jun  8 14:00 trace/cdb1_lg00_7689.trc
-rw-r----- 1 jkstill dba 28693 Jun  8 14:00 trace/cdb1_lg01_7697.trc
-rw-r----- 1 jkstill dba 39654 Jun  8 14:00 trace/cdb2_lg00_15674.trc
-rw-r----- 1 jkstill dba 30028 Jun  8 14:00 trace/cdb2_lg01_15678.trc

The following command line will get the lines showing underperforming writes, get the time field and remove non-numeric characters:

$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -10 | cut -d' ' -f6 | sed -e 's/ms,//'
1234
616
707
1806
513
556
501
1406
1794
585

For a simple transformation such as this one, you could use the tr command instead of sed.

$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -10 | cut -d' ' -f6 | tr -d '[ms,]'
1234
616
707
1806
513
556
501
1406
1794
585

In this case, tr deleted all characters in the “[ms,]” character class.

The head command is now removed from the command line chain, and the output can be piped to awk to get an average value:

$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc \
   | cut -d' ' -f6 \   
   | tr -d '[ms,]' \
   | awk 'BEGIN{t=0; a=0} { t += $1 } END{ a=t/NR; print "avg: ", a }' 
avg:  743.794

By expanding the awk command a bit, I can get the minimum, maximum and average values:

$  grep -h 'Warning: log write elapsed time' trace/*lg*.trc | cut -d' ' -f6 | tr -d '[ms,]' \
  | awk '
BEGIN{t=0; a=0; min=999999999; max=0}
{
        if ($1 > max) (max=$1);
        if ($1 < min) (min=$1);
        t += $1
}
END{ a=t/NR; print "min: ",min,"\nmax:",max,"\navg: ", a }
'
min:  500
max: 4258
avg:  743.794

I can even create a histogram from the command line.

In the BEGIN section of the following awk command a string of “*” characters is created:

$ echo null | awk 'BEGIN{s="*"; for(i=0;i<6;++i) s=s s } {print s}'
****************************************************************

Now let’s create the histogram:

grep -h 'Warning: log write elapsed time' trace/*lg*.trc \
  | cut -d' ' -f6 \
  | tr -d '[ms,]' \
  | awk '{  bucket=$1-($1%100); print bucket }' \
  | sort -n \
  | uniq -c \
  | awk 'BEGIN{s="*"; for(i=0;i<10;++i) s=s s } { hlen=int($1/10)+1; print $1,$2, substr(s,1,hlen) }'
495 500 **************************************************
282 600 *****************************
178 700 ******************
103 800 ***********
68  900 *******
49 1000 *****
28 1100 ***
26 1200 ***
11 1300 **
11 1400 **
 7 1500 *
 3 1600 *
 4 1700 *
 2 1800 *
 4 1900 *
 2 2200 *
 2 2500 *
 1 2600 *
 4 2800 *
 1 3000 *
 2 3100 *
 1 3200 *
 1 4200 *

Let’s break that command down a bit so as to be more understandable.

Get the required lines from the trace files

$ grep -h ‘Warning: log write elapsed time’ trace/*lg*.trc | head -1
Warning: log write elapsed time 1234ms, size 24KB

$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -1
Warning: log write elapsed time 1234ms, size 24KB

Get just the sixth field

Add cut to the pipeline to capture just the sixth field.

$  grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -1 | cut -d' ' -f6 
1234ms,

Remove “ms,” from the data

The tr command is used to remove characters in the character class [ms,].

$  grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -1 | cut -d' ' -f6 | tr -d '[ms,]'
1234

Transform the milliseconds to a bucket size

The histogram will be based on bucket sizes that increment by 100.

Here, awk is used to create the bucket by subtracting the value of modulo (value,100) from the value.

awk '{ bucket=$1-($1%100); print bucket }'

Sort and count

Now to sort and count the buckets.

The data is sorted numerically, then counted with uniq -c.

grep -h 'Warning: log write elapsed time' trace/*lg*.trc \
  | cut -d' ' -f6 \
  | tr -d '[ms,]' \
  | awk '{  bucket=$1-($1%100); print bucket }' \
  | sort -n \
  | uniq -c
    495 500
    282 600
    178 700
    103 800
     68 900
     49 1000
     28 1100
     26 1200
     11 1300
     11 1400
      7 1500
      3 1600
      4 1700
      2 1800
      4 1900
      2 2200
      2 2500
      1 2600
      4 2800
      1 3000
      2 3100
      1 3200
      1 4200

No further sorting will be needed, as the data is already sorted by the bucket size.

Build the histogram string

As shown previously, the BEGIN section is just building a string of “*” characters.

awk 'BEGIN{s="*"; for(i=0;i<10;++i) s=s s } ...

The remainder of the awk one-liner determines the length of the histogram line based on the count ($1) of the bucket ($2) size.

Create the histogram source line. The substr() determines the length.

The length is calculated as the number of entries for this bucket divided by 10. Then 1 is added so there is always at least one “*.”

{ hlen=int($1/10)+1; print $1,$2, substr(s,1,hlen) }'

Additionally, the formatting can also be cleaned up a little with printf.

I’ve also broken the command into multiple lines so as to be easier to read:

$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc \
    | cut -d' ' -f6  \
    | tr -d '[ms,]'  \
    | awk '{  bucket=$1-($1%100); print bucket }'  \
    | sort -n  \
    | uniq -c  \
    | awk 'BEGIN{s="*"; for(i=0;i<10;++i) s=s s } { hlen=int($1/10)+1; printf("%6d %6d %-50s\n", $1,$2, substr(s,1,hlen)) }'

   495    500 **************************************************
   282    600 *****************************
   178    700 ******************
   103    800 ***********
    68    900 *******
    49   1000 *****
    28   1100 ***
    26   1200 ***
    11   1300 **
    11   1400 **
     7   1500 *
     3   1600 *
     4   1700 *
     2   1800 *
     4   1900 *
     2   2200 *
     2   2500 *
     1   2600 *
     4   2800 *
     1   3000 *
     2   3100 *
     1   3200 *
     1   4200 *

If you’re unfamiliar with some of these commands, that command line may appear somewhat daunting.

The key is practice. Once you start making use of command line tools for quick data analysis, it really becomes fairly easy.

The commands I’ve shown are all commands that I’ve used on a regular basis. It would take only a couple of minutes to construct these, ad hoc, as needed.

Let’s consider another example.

Fail2Ban

The next example is some ad hoc security analysis on a Linux server. As it’s exposed to the Internet, there are constant malicious attempts to login via ssh.

Fail2Ban is a security tool used to ban IP addresses when there appears to be malicious intent due to frequent failed login attempts.

When there are too many failed login attempts, Fail2Ban will ban the originating IP from attempting again (for 10 minutes by default).

Fail2Ban makes use of the logfile /var/log/auth.log, and writes output to /var/log/fail2ban.log.

What I want to do is compare malicious attempts found in /var/log/auth.log to actions taken as seen in /var/log/fail2ban.log.

Initially, I’m just looking at user names of invalid login attempts:

$  grep 'Invalid user' log/auth.log  | awk '{ print $8 }'| sort | uniq -c | sort -n | tail -20
     10 ftpuser
     10 vagrant
     11 administrator
     11 guest
     14 test
     15 MikroTik
     15 tech
     16 user1
     22 admin1
     26 web
     28 support
     29 ubuntu
     40 demo
     41 es
     42 pi
    184 telecomadmin
    205 ubnt
    231 profile1
    348 admin
    545 user

You’ll find that the pattern of “something | sort | uniq -c | sort -n” is a fairly common usage.

Let’s see where the requests have been coming from.

$ grep 'Invalid user' log/auth.log  \
  | awk '{ print $10 }'\
  | sort \
  | uniq -c \
  | awk '{ if ($1 >= 10) print $2 }' | grep -v '^$' | sort -n

2.133.117.116
14.167.34.197
...
210.187.213.31
217.197.182.103

There are 64 addresses where apparently malicious ssh attempts originate, where there are 10 or more such attempts.

$ grep 'Invalid user' log/auth.log  \
  | awk '{ print $10 }'\
  | sort \
  | uniq -c \
  | awk '{ if ($1 >= 10) print $2 }' \
  | grep -v '^$' \
  | sort -n\
  | wc -l
     64

Given the current Fail2Ban configuration, the tool should have banned these addresses from further logon attempts.

Is that the case?

The grep command has a -f option that allows taking the patterns to match from some file.

For example, the following command could be used to get only nonsystem usernames from /etc/passwd.

Given this filter file:

$ cat /tmp/system-users.filter
^root
^daemon
...
^sshd
^statd
^postfix

You can easily see the nonsystem users:

$ grep -v -f /tmp/system-users.filter /etc/passwd
dnsmasq:x:111:65534:dnsmasq,,,:/var/lib/misc:/bin/false
mdm:x:117:124:MDM Display Manager:/var/lib/mdm:/bin/false
nm-openvpn:x:118:126:NetworkManager OpenVPN,,,:/var/lib/openvpn/chroot:/bin/false
saned:x:120:128::/var/lib/saned:/bin/false
perlbrew:x:1001:1001:perlbrew,,,:/opt/perlbrew:/bin/bash
oracle:x:1002:1002::/home/oracle:
ssopun:x:1003:1002::/home/ssopun:
strongswan:x:125:65534::/var/lib/strongswan:/usr/sbin/nologin
vboxadd:x:999:1::/var/run/vboxadd:/bin/false

When the shell is Bash, you can use process substitution.

That is, you can use the output of a command enclosed in <(...) as if it were a file.

To filter out all users with a UID of 1000 or less, you can use the following awk command as a pattern source:

$ awk -F: '{ if ($3 < 1000) print "^"$1 }' /etc/passwd
^root
^daemon
^bin
...
^postfix
^strongswan
^vboxadd

Now, use it in the overall command:

$ grep -v -f <(awk -F: '{ if ($3 < 1000) print "^"$1 }' /etc/passwd) /etc/passwd
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
jkstill:x:1000:1002:jared,,,:/home/jkstill:/bin/bash
perlbrew:x:1001:1001:perlbrew,,,:/opt/perlbrew:/bin/bash
jkstill-new:x:1000:1000::/home/jkstill-new:
oracle:x:1002:1002::/home/oracle:
ssopun:x:1003:1002::/home/ssopun:

You get the same results, but there was no need to manually create a temporary file for the filter patterns.

Now, apply that same logic to the Fail2Ban log:

$ grep -f <( \
    grep 'Invalid user' log/auth.log \
    | awk '{ print $10 }' \
    | sort | uniq -c \
    | awk '{ if ($1 >= 10) print $2 }' \
    | grep -v '^$' \
) log/fail2ban.log
...
2021-06-06 05:08:52,152 fail2ban.actions        [1398]: NOTICE  [sshd] Ban 36.78.79.106
2021-06-06 05:18:52,182 fail2ban.actions        [1398]: NOTICE  [sshd] Unban 36.78.79.106
2021-06-06 07:17:23,709 fail2ban.filter         [1398]: INFO    [sshd] Found 178.62.237.221
2021-06-06 07:17:59,768 fail2ban.filter         [1398]: INFO    [sshd] Found 178.62.237.221
2021-06-06 07:18:36,847 fail2ban.filter         [1398]: INFO    [sshd] Found 178.62.237.221
2021-06-06 07:19:11,909 fail2ban.filter         [1398]: INFO    [sshd] Found 178.62.237.221
2021-06-06 07:19:46,972 fail2ban.filter         [1398]: INFO    [sshd] Found 178.62.237.221
2021-06-06 07:19:47,550 fail2ban.actions        [1398]: NOTICE  [sshd] Ban 178.62.237.221
2021-06-06 07:29:48,531 fail2ban.actions        [1398]: NOTICE  [sshd] Unban 178.62.237.221

While this is interesting, it doesn’t really tell me what I want to know.

What I want to know is this—were all of these addresses banned?

This is an example of a “Ban” line in log/fail2ban.log:

2021-06-04 21:59:02,690 fail2ban.actions [1398]: NOTICE [sshd] Ban 36.69.234.183

Here, I can show all the unique addresses Fail2Ban has banned:

$ grep -E -f <( \
    grep 'Invalid user' log/auth.log \
    | awk '{ print $10 }' \
    | sort | uniq -c \
    | awk '{ if ($1 >= 10) print $2 }' \
    | grep -v '^$' \
    | awk '{print "Ban\\s+"$1}' \
) log/fail2ban.log \
| awk '{ print $NF }' |  sort -n | sort -u

And a count of those addresses:

$ grep -E -f <( \
    grep 'Invalid user' log/auth.log \
    | awk '{ print $10 }' \
    | sort | uniq -c \
    | awk '{ if ($1 >= 10) print $2 }' \
    | grep -v '^$' \
    | awk '{print "Ban\\s+"$1}' \
) log/fail2ban.log \
| awk '{ print $NF }' |  sort -n | sort -u | wc -l
63

You may recall earlier that malicious login attempts were found to originate from 64 different addresses.

As Fail2Ban appears to have only banned 63 addresses, let’s find out which addresses with 10 or more login attempts it didn’t ban.

For this I’ll create two temp files: one for the addresses where login attempts originated, and one for the addresses that were banned.

$ grep 'Invalid user' log/auth.log \
    | awk '{ print $10 }' \
    | sort | uniq -c \
    | awk '{ if ($1 >= 10) print $2 }' \
    | grep -v '^$' \
    | sort -u | sort -n > attempts.txt

$ grep -E -f <( \
    grep 'Invalid user' log/auth.log \
    | awk '{ print $10 }' \
    | sort | uniq -c \
    | awk '{ if ($1 >= 10) print $2 }' \
    | grep -v '^$' \
    | awk '{print "Ban\\s+"$1}' ) log/fail2ban.log \
    | awk '{ print $NF }' \
    | sort -u | sort -n > banned.txt

Running diff on the files reveals an address that appears in attempts.txt that doesn’t appear in banned.txt.

$   diff attempts.txt banned.txt
35d34
< 141.98.10.193

Let’s check the Fail2Ban.log file for this address:

$ grep -n '141.98.10.193' log/fail2ban.log
...
3963:2021-06-05 08:44:12,810 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
3964:2021-06-05 08:44:22,841 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
3965:2021-06-05 08:44:32,870 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4094:2021-06-05 15:45:34,123 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4095:2021-06-05 15:45:42,141 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4096:2021-06-05 15:45:52,161 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4097:2021-06-05 15:46:02,187 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4101:2021-06-05 16:02:42,612 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4102:2021-06-05 16:02:52,640 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4103:2021-06-05 16:03:01,664 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4104:2021-06-05 16:03:11,684 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4122:2021-06-05 18:38:15,491 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4123:2021-06-05 18:38:24,512 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4124:2021-06-05 18:38:33,538 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4125:2021-06-05 18:38:43,565 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4184:2021-06-05 22:45:58,140 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4186:2021-06-05 22:46:07,179 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4187:2021-06-05 22:46:18,208 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193
4188:2021-06-05 22:46:27,226 fail2ban.filter         [1398]: INFO    [sshd] Found 141.98.10.193

Lines 4101 to 4125 are within a time period of approximately three minutes.

The default value for the Fail2Ban parameter findtime is 600 seconds. This value defines the window size that Fail2Ban considers when determining if an IP should be banned.

The default value for the Fail2Ban parameter maxretry is five. If maxretry attempts occur within findtime seconds from the same IP, that IP should be banned.

Though it appears that 141.98.10.193 should have been banned, it wasn’t. At this time, I don’t yet know why Fail2Ban didn’t ban it.

In the meantime, I’ve manually banned this IP:

$ fail2ban-client set sshd banip 141.98.10.193


$ iptables -L -n -v

...

Chain f2b-sshd (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 REJECT     all  --  *      *       141.98.10.193        0.0.0.0/0            reject-with icmp-port-unreachable
 1120 89095 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

What I found later was that Fail2Ban had “unbanned” this IP, so getting that to work as I want is a different adventure.

Though lengthy command line pipelines may appear somewhat intimidating at first, the previous explanations of a few somewhat complex pipelines should make it easier to do this yourself.

More on SYSQL

We chose the name SYSQL as a concise name that brings to mind both the goal (data analysis) and the method (linux command pipelines).

Plus, it’s tiring to keep saying “linux command pipelines.”

How do you pronounce SYSQL?

Well, there are only two possible pronunciations that come to mind:

SYS-Q-L, with each of the letters Q and L pronounced.

SYS-KEL. Rhymes with “fiscal.”

Which do I prefer? Honestly, I haven’t made up my mind on that.

If you have an opinion on the pronunciation of SYSQL, or any questions or thoughts on the above, please feel free to leave a comment!

← Two Cases of Troubleshooting with psn

Data preparation with dbt and BigQuery →

Get Email Notifications

No Comments Yet

Let us know what you think