What is SYSQL?
Recently Tanel Poder and I were discussing the use of Linux command pipelines for data analysis, and were wondering what to call that process.
We debated a few names, until we came to SYSQL: SysAdmin Query Language.
We liked the name, and it stuck. So SYSQL is what I’m calling the use of Linux command pipelines for data analysis.
After some time working on Linux and other Unix-based systems, most folks will become familiar with command line pipelines. Sometimes this may also be referred to as “chaining” commands.
A pipeline is nothing more than piping the output of one command to the input of another program, using the pipeline operator: |.
If you’d like to learn how to use the simple concept of command pipelines to perform some surprisingly complex data analysis, this is the post for you.
Let’s start with some simple examples.
For instance, counting the number of processes started by the grid user on a remote server:
$ ssh oracle@orasrvr ps -ugrid | wc -l 59
We can easily extend the pipeline to run through several commands, getting just the data we want.
The following command gets all the unique process names started by the grid user on orasrvr:
$ ssh oracle@orasrvr ps -ugrid | tail -n +2 | awk '{ print $NF }' | sort -u asm_asmb_+asm1 asm_ckpt_+asm1 asm_clmn_+asm1 ... oracle_29995_+a oracle_30009_+a scriptagent.bin tnslsnr
In the same manner, we can use command pipelines for ad hoc data analysis on log files, trace files and most any other text-based file.
In each of these examples, we could use some quoting to reduce the network traffic, as ssh is sending the data back to the local machine to be processed by tail, awk, etc.
Here are the commands quoted so that all runs remotely. Something to keep in mind is that there may be extra processes appearing in the output when the logon user and the target user are the same, as the ssh login and the command being run will both appear to ps.
In this case, the logon user is oracle, and the target user is grid, so the output doesn’t differ from the previous examples.
$ ssh oracle@ora192rac01 "ps -ugrid | wc -l" 60 $ ssh oracle@ora192rac01 "ps -ugrid | tail -n +2 | awk '{ print \$NF }' | sort -u" asm_asmb_+asm1 asm_ckpt_+asm1 ... scriptagent.bin tnslsnr
If the target and logon user are both oracle, the extra processes will appear:
$ ssh oracle@orasrvr ps -uoracle | wc -l 120 $ ssh oracle@orasrvr 'ps -uoracle | wc -l' 122
Tools of the trade
I’m going to assume you have some familiarity with the tools used here.
This won’t be a tutorial on their use, but rather, I’ll provide some examples on how you may expand on the use of these tools (for those of you not already doing so, that is).
Following are some, if not most of the tools you can use for command line analysis of files.
grep
The name of the grep command is an acronym of sorts. It stands for “globally search for regular expressions and print.”
tail
The tail command is used to display only a set number of lines from the end of its input.
head
The head command displays only a set number of lines from the beginning of a file.
sed
The sed command is a stream editor.
sed uses regular expressions to include, exclude and transform data in a stream.
cut
The cut command is used to select columns from a line, based on a delimiting character.
$ echo get,just,the,fourth,word | cut -d, -f4 fourth $ echo get just the fourth word | cut -d' ' -f4 fourth
awk
The awk command takes its name from the authors: Alfred Aho, Peter Weinberger and Brian Kernighan.
awk is a programming language designed to work on data. Its use has been somewhat supplanted in recent years by Perl, Python and even advanced Bash features.
Even so, awk is still quite useful at times, as it was designed for ease of use in command pipelines.
A common use of awk is to cut columns from a line of text, similar to cut.
There are a few additional things awk can do.
The following example gets just the fourth word, as well as printing the number of words there are in total:
$ echo get just the fourth word | awk '{ print "fourth word: " , $4, " - number of words: ", NF }' fourth word: fourth - number of words: 5
The built-in variable NF refers to the number of fields in the line.
When prefaced with “$,” it will return the last field in the line:
$ echo get just the fourth word | awk '{ print "fourth word: " , $4, " - the last word: ", $NF }' fourth word: fourth - the last word: word
perl
Perl is an interpreted programming language.
$ echo get just the fourth word | perl -e '$i=<STDIN>; @a = split(/\s/,$i); print "$a[3]\n"' fourth
While Perl is quite powerful, and I have made extensive use of Perl over the years, I don’t often use CLI Perl in pipelines.
For that purpose, awk is usually easier.
The exception is the use of Perl scripts that are created as filters. I won’t be demonstrating that in this article, though.
sort
The sort command does just what the name implies; it sorts data.
The data can be sorted by the entire line in the simplest case.
You can also perform sorts on “keys” found in the line.
The data can be sorted as text or numbers, and can be shown in reverse order.
uniq
The uniq command is used to deduplicate data sent to it.
Optionally, you may also use uniq to count the number of occurrences of a string.
wc
The wc command counts lines, words and characters.
tr
You can use the tr utility to transform characters. It isn’t as robust as sed, but nonetheless useful at times.
FETCH and EXE times
Let’s use SYSQL to get some FETCH and EXE information from an Oracle SQL trace file.
The trace file trace/cdb1_ora_5689_SELECT.trc contains some trace data for an SQL we’re interested in.
Most Oracle trace files contain multiple SQL statements. In addition, the cursor number may be reused.
If the cursor number has been reused, that complicates things a bit, and a one-line probe of the trace probably won’t have the desired results.
First, we find the cursor handle.
By using grep -A 6, I can see the next six lines after the target word of PARSING:
grep -A 6 PARSING trace/cdb1_ora_5689_SELECT.trc PARSING IN CURSOR #140154018147080 len=52 dep=0 uid=92 oct=47 lid=92 tim=532738009488 hv=1029988163 ad='838081e8' sqlid='9babjv8yq8ru3' BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END; END OF STMT PARSE #140154018147080:c=44,e=44,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=532738009487 BINDS #140154018147080: Bind#0 -- PARSING IN CURSOR #140154018122048 len=97 dep=1 uid=0 oct=3 lid=0 tim=532740566203 hv=791757000 ad='96aeea08' sqlid='87gaftwrm2h68' select o.owner#,o.name,o.namespace,o.remoteowner,o.linkname,o.subname from obj$ o where o.obj#=:1 END OF STMT PARSE #140154018122048:c=1,e=434,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=532740566202 WAIT #140154018122048: nam='PGA memory operation' ela= 10 p1=65536 p2=2 p3=0 obj#=-1 tim=532740566836 WAIT #140154018122048: nam='PGA memory operation' ela= 4 p1=65536 p2=1 p3=0 obj#=-1 tim=532740567003 WAIT #140154018122048: nam='PGA memory operation' ela= 3 p1=65536 p2=1 p3=0 obj#=-1 tim=532740567193 -- PARSING IN CURSOR #140154018097232 len=153 dep=0 uid=92 oct=3 lid=92 tim=532740569568 hv=1226046019 ad='6a8c9b20' sqlid='dzmytmp4j7yk3' select s.username, s.sid, p.spid from v$session s, v$process p where s.sid = sys_context('userenv','sid') -- PARSING IN CURSOR #140154018098000 len=104 dep=0 uid=92 oct=47 lid=92 tim=532743008720 hv=2439783906 ad='658980a0' sqlid='2s2qwhy8qs9g2' declare vDate date; begin for i in 1..1e6 loop select sysdate into vDate from dual; -- PARSING IN CURSOR #140154018142216 len=24 dep=1 uid=92 oct=3 lid=92 tim=532743014936 hv=124468195 ad='7e551ea0' sqlid='c749bc43qqfz3' SELECT SYSDATE FROM DUAL END OF STMT PARSE #140154018142216:c=3689,e=5599,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=1388734953,tim=532743014934 EXEC #140154018142216:c=59,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=532743015228 FETCH #140154018142216:c=34,e=35,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=1,plh=1388734953,tim=532743015381 STAT #140154018142216 id=1 cnt=1 pid=0 pos=1 obj=0 op='FAST DUAL (cr=0 pr=0 pw=0 str=1 time=2 us cost=2 size=0 card=1)'
The SQL I’m interested in is “SELECT SYSDATE FROM DUAL.”
The cursor for this SQL (cursor #140154018142216) is being called 1,000,000 times by the PL/SQL block (cursor #140154018098000).
I can verify this in the following way:
$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc | wc -l 1000000
As expected, the SQL that selects from DUAL appears 1,000,000 times.
Get EXEC and FETCH times
In a performance engagement I’ll typically use a tool such as Method R Workbench to fully analyze Oracle trace files.
At times though, I may want a simpler method to get at this data.
There are a couple of reasons I might use CLI tools for initial analysis:
- I’m not allowed to transfer trace files off the server.
- I’m not allowed to install any software on the server, even in my home directory.
And even when I can make use of Method R, I may want to use some simple tools to get an idea of where my efforts should be focused.
So, let’s get the FETCH and EXE times for this SQL.
To do this, I need to know which fields in these lines are of interest.
Here’s an EXEC line:
EXEC #140154018142216:c=59,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=532743015228
And here’s a FETCH line:
FETCH #140154018142216:c=34,e=35,p=0,cr=0,cu=0,mis=0,r=1,dep=1,og=1,plh=1388734953,tim=532743015381
The item of interest in each line is the value “e=N,” where “e” refers to the elapsed timed of the operation.
In each this appears to be the second field of comma delimited data. Let’s verify:
$ grep 'EXEC #140154018142216' trace/cdb1_ora_5689_SELECT.trc | head -1 | cut -d, -f2 e=59 $ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc | head -1 | cut -d, -f2 e=35
Now it’s necessary to separate the value from its label.
It’s convenient to initially limit the amount of data used via the head command, as the trace file may be very large.
Here I’m using the cut command again to separate the value from “e=N.”
$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc | head -10 | cut -d, -f2 | cut -f2 -d= 35 7 10 6 5 7 8 5 9 4
Seeing 10 of those is OK, but I can’t really make any sense of 1,000,000 appearing on the console, so I’ll use awk to help clarify the data.
For now, the head -10
remains, just to make experimentation run quickly, as the trace file is 256MB in size.
For readability purposes, I’ll write some of these command lines on several continuation lines.
$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc \ | head -10 \ | cut -d, -f2 \ | cut -f2 -d= \ | awk 'BEGIN{t=0;a=0}{ t += $1}END{a=t / NR; print t,a}' 96 9.6
Here, I used the awk variable NR (number of records) to compute the average.
This has printed the (t)otal and (a)verage values for elapsed time.
If you’re familiar with awk, you may recall that it uses the “-F” option to set a field delimiter.
Using “-F=,” you can eliminate the second “cut” command:
$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc \ | head -10 \ | cut -d, -f2 \ | awk -F= 'BEGIN{t=0;a=0}{ t += $2}END{a=t / NR; print t,a}' 96 9.6
In fact, you could also eliminate the first cut, and you could use the string splitting feature of awk.
However, that starts to be somewhat complicated, and one of the goals of SYSQL is to be simple to use. If I was building a shell script to run regularly, I’d be more concerned about that type of efficiency. For ad hoc analysis on the command line though, simplicity and ease of use is the rule of the day.
Finally, we can use printf
to beautify the output a bit.
Doing so isn’t strictly necessary, but if you already know how to use printf
, it’s a nice touch.
$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc \ | head -10 \ | cut -d, -f2 \ | awk -F= 'BEGIN{t=0;a=0}{ t += $2}END{a=t / NR; printf("Total Elapsed microseconds: %10.2f\nAverage microseconds: %10.2f\n", t,a)}' Total Elapsed microseconds: 96.00 Average microseconds: 9.60
Now, I’ll remove the data-limiting head command, and we can see the full report for EXEC and FETCH:
EXEC:
$ grep 'EXEC #140154018142216' trace/cdb1_ora_5689_SELECT.trc \ | cut -d, -f2 \ | awk -F= 'BEGIN{t=0;a=0}{ t += $2}END{a=t / NR; printf("Total Elapsed microseconds: %10.2f\nAverage microseconds: %10.2f\n", t,a)}' Total Elapsed microseconds: 5362065.00 Average microseconds: 5.36
FETCH:
$ grep 'FETCH #140154018142216' trace/cdb1_ora_5689_SELECT.trc \ | cut -d, -f2 \ | awk -F= 'BEGIN{t=0;a=0}{ t += $2}END{a=t / NR; printf("Total Elapsed microseconds: %10.2f\nAverage microseconds: %10.2f\n", t,a)}' Total Elapsed microseconds: 1172276.00 Average microseconds: 1.17
Log writer (LGWR) trace file
Every time the Oracle LGWR process takes longer than 500 ms to write to the REDO log, the log writer trace file records the event.
Now let’s use SYSQL to learn something about LGWR performance.
There are four LGWR trace files:
$ ls -l trace/*lg*.trc -rw-r----- 1 jkstill dba 36008 Jun 8 14:00 trace/cdb1_lg00_7689.trc -rw-r----- 1 jkstill dba 28693 Jun 8 14:00 trace/cdb1_lg01_7697.trc -rw-r----- 1 jkstill dba 39654 Jun 8 14:00 trace/cdb2_lg00_15674.trc -rw-r----- 1 jkstill dba 30028 Jun 8 14:00 trace/cdb2_lg01_15678.trc
The following command line will get the lines showing underperforming writes, get the time field and remove non-numeric characters:
$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -10 | cut -d' ' -f6 | sed -e 's/ms,//' 1234 616 707 1806 513 556 501 1406 1794 585
For a simple transformation such as this one, you could use the tr command instead of sed.
$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -10 | cut -d' ' -f6 | tr -d '[ms,]' 1234 616 707 1806 513 556 501 1406 1794 585
In this case, tr deleted all characters in the “[ms,]” character class.
The head command is now removed from the command line chain, and the output can be piped to awk to get an average value:
$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc \ | cut -d' ' -f6 \ | tr -d '[ms,]' \ | awk 'BEGIN{t=0; a=0} { t += $1 } END{ a=t/NR; print "avg: ", a }' avg: 743.794
By expanding the awk command a bit, I can get the minimum, maximum and average values:
$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | cut -d' ' -f6 | tr -d '[ms,]' \ | awk ' BEGIN{t=0; a=0; min=999999999; max=0} { if ($1 > max) (max=$1); if ($1 < min) (min=$1); t += $1 } END{ a=t/NR; print "min: ",min,"\nmax:",max,"\navg: ", a } ' min: 500 max: 4258 avg: 743.794
I can even create a histogram from the command line.
In the BEGIN section of the following awk command a string of “*” characters is created:
$ echo null | awk 'BEGIN{s="*"; for(i=0;i<6;++i) s=s s } {print s}'
****************************************************************
Now let’s create the histogram:
grep -h 'Warning: log write elapsed time' trace/*lg*.trc \ | cut -d' ' -f6 \ | tr -d '[ms,]' \ | awk '{ bucket=$1-($1%100); print bucket }' \ | sort -n \ | uniq -c \ | awk 'BEGIN{s="*"; for(i=0;i<10;++i) s=s s } { hlen=int($1/10)+1; print $1,$2, substr(s,1,hlen) }' 495 500 ************************************************** 282 600 ***************************** 178 700 ****************** 103 800 *********** 68 900 ******* 49 1000 ***** 28 1100 *** 26 1200 *** 11 1300 ** 11 1400 ** 7 1500 * 3 1600 * 4 1700 * 2 1800 * 4 1900 * 2 2200 * 2 2500 * 1 2600 * 4 2800 * 1 3000 * 2 3100 * 1 3200 * 1 4200 *
Let’s break that command down a bit so as to be more understandable.
Get the required lines from the trace files
$ grep -h ‘Warning: log write elapsed time’ trace/*lg*.trc | head -1
Warning: log write elapsed time 1234ms, size 24KB
$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -1 Warning: log write elapsed time 1234ms, size 24KB
Get just the sixth field
Add cut
to the pipeline to capture just the sixth field.
$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -1 | cut -d' ' -f6 1234ms,
Remove “ms,” from the data
The tr command is used to remove characters in the character class [ms,].
$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc | head -1 | cut -d' ' -f6 | tr -d '[ms,]' 1234
Transform the milliseconds to a bucket size
The histogram will be based on bucket sizes that increment by 100.
Here, awk is used to create the bucket by subtracting the value of modulo (value,100) from the value.
awk '{ bucket=$1-($1%100); print bucket }'
Sort and count
Now to sort and count the buckets.
The data is sorted numerically, then counted with uniq -c.
grep -h 'Warning: log write elapsed time' trace/*lg*.trc \ | cut -d' ' -f6 \ | tr -d '[ms,]' \ | awk '{ bucket=$1-($1%100); print bucket }' \ | sort -n \ | uniq -c 495 500 282 600 178 700 103 800 68 900 49 1000 28 1100 26 1200 11 1300 11 1400 7 1500 3 1600 4 1700 2 1800 4 1900 2 2200 2 2500 1 2600 4 2800 1 3000 2 3100 1 3200 1 4200
No further sorting will be needed, as the data is already sorted by the bucket size.
Build the histogram string
As shown previously, the BEGIN section is just building a string of “*” characters.
awk 'BEGIN{s="*"; for(i=0;i<10;++i) s=s s } ...
The remainder of the awk one-liner determines the length of the histogram line based on the count ($1) of the bucket ($2) size.
Create the histogram source line. The substr()
determines the length.
The length is calculated as the number of entries for this bucket divided by 10. Then 1 is added so there is always at least one “*.”
{ hlen=int($1/10)+1; print $1,$2, substr(s,1,hlen) }'
Additionally, the formatting can also be cleaned up a little with printf.
I’ve also broken the command into multiple lines so as to be easier to read:
$ grep -h 'Warning: log write elapsed time' trace/*lg*.trc \ | cut -d' ' -f6 \ | tr -d '[ms,]' \ | awk '{ bucket=$1-($1%100); print bucket }' \ | sort -n \ | uniq -c \ | awk 'BEGIN{s="*"; for(i=0;i<10;++i) s=s s } { hlen=int($1/10)+1; printf("%6d %6d %-50s\n", $1,$2, substr(s,1,hlen)) }' 495 500 ************************************************** 282 600 ***************************** 178 700 ****************** 103 800 *********** 68 900 ******* 49 1000 ***** 28 1100 *** 26 1200 *** 11 1300 ** 11 1400 ** 7 1500 * 3 1600 * 4 1700 * 2 1800 * 4 1900 * 2 2200 * 2 2500 * 1 2600 * 4 2800 * 1 3000 * 2 3100 * 1 3200 * 1 4200 *
If you’re unfamiliar with some of these commands, that command line may appear somewhat daunting.
The key is practice. Once you start making use of command line tools for quick data analysis, it really becomes fairly easy.
The commands I’ve shown are all commands that I’ve used on a regular basis. It would take only a couple of minutes to construct these, ad hoc, as needed.
Let’s consider another example.
Fail2Ban
The next example is some ad hoc security analysis on a Linux server. As it’s exposed to the Internet, there are constant malicious attempts to login via ssh.
Fail2Ban is a security tool used to ban IP addresses when there appears to be malicious intent due to frequent failed login attempts.
When there are too many failed login attempts, Fail2Ban will ban the originating IP from attempting again (for 10 minutes by default).
Fail2Ban makes use of the logfile /var/log/auth.log
, and writes output to /var/log/fail2ban.log
.
What I want to do is compare malicious attempts found in /var/log/auth.log to actions taken as seen in /var/log/fail2ban.log.
Initially, I’m just looking at user names of invalid login attempts:
$ grep 'Invalid user' log/auth.log | awk '{ print $8 }'| sort | uniq -c | sort -n | tail -20 10 ftpuser 10 vagrant 11 administrator 11 guest 14 test 15 MikroTik 15 tech 16 user1 22 admin1 26 web 28 support 29 ubuntu 40 demo 41 es 42 pi 184 telecomadmin 205 ubnt 231 profile1 348 admin 545 user
You’ll find that the pattern of “something | sort | uniq -c | sort -n” is a fairly common usage.
Let’s see where the requests have been coming from.
$ grep 'Invalid user' log/auth.log \ | awk '{ print $10 }'\ | sort \ | uniq -c \ | awk '{ if ($1 >= 10) print $2 }' | grep -v '^$' | sort -n 2.133.117.116 14.167.34.197 ... 210.187.213.31 217.197.182.103
There are 64 addresses where apparently malicious ssh attempts originate, where there are 10 or more such attempts.
$ grep 'Invalid user' log/auth.log \ | awk '{ print $10 }'\ | sort \ | uniq -c \ | awk '{ if ($1 >= 10) print $2 }' \ | grep -v '^$' \ | sort -n\ | wc -l 64
Given the current Fail2Ban configuration, the tool should have banned these addresses from further logon attempts.
Is that the case?
The grep command has a -f option that allows taking the patterns to match from some file.
For example, the following command could be used to get only nonsystem usernames from /etc/passwd.
Given this filter file:
$ cat /tmp/system-users.filter ^root ^daemon ... ^sshd ^statd ^postfix
You can easily see the nonsystem users:
$ grep -v -f /tmp/system-users.filter /etc/passwd dnsmasq:x:111:65534:dnsmasq,,,:/var/lib/misc:/bin/false mdm:x:117:124:MDM Display Manager:/var/lib/mdm:/bin/false nm-openvpn:x:118:126:NetworkManager OpenVPN,,,:/var/lib/openvpn/chroot:/bin/false saned:x:120:128::/var/lib/saned:/bin/false perlbrew:x:1001:1001:perlbrew,,,:/opt/perlbrew:/bin/bash oracle:x:1002:1002::/home/oracle: ssopun:x:1003:1002::/home/ssopun: strongswan:x:125:65534::/var/lib/strongswan:/usr/sbin/nologin vboxadd:x:999:1::/var/run/vboxadd:/bin/false
When the shell is Bash, you can use process substitution.
That is, you can use the output of a command enclosed in <(...)
as if it were a file.
To filter out all users with a UID of 1000 or less, you can use the following awk command as a pattern source:
$ awk -F: '{ if ($3 < 1000) print "^"$1 }' /etc/passwd ^root ^daemon ^bin ... ^postfix ^strongswan ^vboxadd
Now, use it in the overall command:
$ grep -v -f <(awk -F: '{ if ($3 < 1000) print "^"$1 }' /etc/passwd) /etc/passwd nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin jkstill:x:1000:1002:jared,,,:/home/jkstill:/bin/bash perlbrew:x:1001:1001:perlbrew,,,:/opt/perlbrew:/bin/bash jkstill-new:x:1000:1000::/home/jkstill-new: oracle:x:1002:1002::/home/oracle: ssopun:x:1003:1002::/home/ssopun:
You get the same results, but there was no need to manually create a temporary file for the filter patterns.
Now, apply that same logic to the Fail2Ban log:
$ grep -f <( \ grep 'Invalid user' log/auth.log \ | awk '{ print $10 }' \ | sort | uniq -c \ | awk '{ if ($1 >= 10) print $2 }' \ | grep -v '^$' \ ) log/fail2ban.log ... 2021-06-06 05:08:52,152 fail2ban.actions [1398]: NOTICE [sshd] Ban 36.78.79.106 2021-06-06 05:18:52,182 fail2ban.actions [1398]: NOTICE [sshd] Unban 36.78.79.106 2021-06-06 07:17:23,709 fail2ban.filter [1398]: INFO [sshd] Found 178.62.237.221 2021-06-06 07:17:59,768 fail2ban.filter [1398]: INFO [sshd] Found 178.62.237.221 2021-06-06 07:18:36,847 fail2ban.filter [1398]: INFO [sshd] Found 178.62.237.221 2021-06-06 07:19:11,909 fail2ban.filter [1398]: INFO [sshd] Found 178.62.237.221 2021-06-06 07:19:46,972 fail2ban.filter [1398]: INFO [sshd] Found 178.62.237.221 2021-06-06 07:19:47,550 fail2ban.actions [1398]: NOTICE [sshd] Ban 178.62.237.221 2021-06-06 07:29:48,531 fail2ban.actions [1398]: NOTICE [sshd] Unban 178.62.237.221
While this is interesting, it doesn’t really tell me what I want to know.
What I want to know is this—were all of these addresses banned?
This is an example of a “Ban” line in log/fail2ban.log:
2021-06-04 21:59:02,690 fail2ban.actions [1398]: NOTICE [sshd] Ban 36.69.234.183
Here, I can show all the unique addresses Fail2Ban has banned:
$ grep -E -f <( \ grep 'Invalid user' log/auth.log \ | awk '{ print $10 }' \ | sort | uniq -c \ | awk '{ if ($1 >= 10) print $2 }' \ | grep -v '^$' \ | awk '{print "Ban\\s+"$1}' \ ) log/fail2ban.log \ | awk '{ print $NF }' | sort -n | sort -u
And a count of those addresses:
$ grep -E -f <( \ grep 'Invalid user' log/auth.log \ | awk '{ print $10 }' \ | sort | uniq -c \ | awk '{ if ($1 >= 10) print $2 }' \ | grep -v '^$' \ | awk '{print "Ban\\s+"$1}' \ ) log/fail2ban.log \ | awk '{ print $NF }' | sort -n | sort -u | wc -l 63
You may recall earlier that malicious login attempts were found to originate from 64 different addresses.
As Fail2Ban appears to have only banned 63 addresses, let’s find out which addresses with 10 or more login attempts it didn’t ban.
For this I’ll create two temp files: one for the addresses where login attempts originated, and one for the addresses that were banned.
$ grep 'Invalid user' log/auth.log \ | awk '{ print $10 }' \ | sort | uniq -c \ | awk '{ if ($1 >= 10) print $2 }' \ | grep -v '^$' \ | sort -u | sort -n > attempts.txt $ grep -E -f <( \ grep 'Invalid user' log/auth.log \ | awk '{ print $10 }' \ | sort | uniq -c \ | awk '{ if ($1 >= 10) print $2 }' \ | grep -v '^$' \ | awk '{print "Ban\\s+"$1}' ) log/fail2ban.log \ | awk '{ print $NF }' \ | sort -u | sort -n > banned.txt
Running diff on the files reveals an address that appears in attempts.txt that doesn’t appear in banned.txt.
$ diff attempts.txt banned.txt 35d34 < 141.98.10.193
Let’s check the Fail2Ban.log file for this address:
$ grep -n '141.98.10.193' log/fail2ban.log ... 3963:2021-06-05 08:44:12,810 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 3964:2021-06-05 08:44:22,841 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 3965:2021-06-05 08:44:32,870 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4094:2021-06-05 15:45:34,123 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4095:2021-06-05 15:45:42,141 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4096:2021-06-05 15:45:52,161 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4097:2021-06-05 15:46:02,187 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4101:2021-06-05 16:02:42,612 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4102:2021-06-05 16:02:52,640 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4103:2021-06-05 16:03:01,664 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4104:2021-06-05 16:03:11,684 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4122:2021-06-05 18:38:15,491 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4123:2021-06-05 18:38:24,512 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4124:2021-06-05 18:38:33,538 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4125:2021-06-05 18:38:43,565 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4184:2021-06-05 22:45:58,140 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4186:2021-06-05 22:46:07,179 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4187:2021-06-05 22:46:18,208 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193 4188:2021-06-05 22:46:27,226 fail2ban.filter [1398]: INFO [sshd] Found 141.98.10.193
Lines 4101 to 4125 are within a time period of approximately three minutes.
The default value for the Fail2Ban parameter findtime is 600 seconds. This value defines the window size that Fail2Ban considers when determining if an IP should be banned.
The default value for the Fail2Ban parameter maxretry is five. If maxretry attempts occur within findtime seconds from the same IP, that IP should be banned.
Though it appears that 141.98.10.193 should have been banned, it wasn’t. At this time, I don’t yet know why Fail2Ban didn’t ban it.
In the meantime, I’ve manually banned this IP:
$ fail2ban-client set sshd banip 141.98.10.193 $ iptables -L -n -v ... Chain f2b-sshd (1 references) pkts bytes target prot opt in out source destination 0 0 REJECT all -- * * 141.98.10.193 0.0.0.0/0 reject-with icmp-port-unreachable 1120 89095 RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
What I found later was that Fail2Ban had “unbanned” this IP, so getting that to work as I want is a different adventure.
Though lengthy command line pipelines may appear somewhat intimidating at first, the previous explanations of a few somewhat complex pipelines should make it easier to do this yourself.
More on SYSQL
We chose the name SYSQL as a concise name that brings to mind both the goal (data analysis) and the method (linux command pipelines).
Plus, it’s tiring to keep saying “linux command pipelines.”
How do you pronounce SYSQL?
Well, there are only two possible pronunciations that come to mind:
SYS-Q-L, with each of the letters Q and L pronounced.
SYS-KEL. Rhymes with “fiscal.”
Which do I prefer? Honestly, I haven’t made up my mind on that.
If you have an opinion on the pronunciation of SYSQL, or any questions or thoughts on the above, please feel free to leave a comment!
Share this
You May Also Like
These Related Stories
No Comments Yet
Let us know what you think