Quantcast
Channel: Karl Arao's Blog
Viewing all articles
Browse latest Browse all 63

IOsaturationtoolkit-v2 with Exadata IORM and AWESOME text graph

$
0
0
I’ve got a new version of IOsaturation toolkit which you can download here http://karlarao.wordpress.com/scripts-resources/ and it has a cool script called “smartscanloop” that shows you the Smart Scan MB/s per database across the Exadata  compute nodes.. it’s a per 2secs sample so that’s a pretty fine grained perf data and near real time text graph. Very useful for doing IORM demos and monitoring what database is currently hogging the IO resources and since it’s presented in a consolidated view you don’t have to go to each Enterprise Manager performance page and have a bunch of browser windows open.

The SECTION 1 is what I usually use to validate the IO numbers on the database side from my Orion (see oriontoolkit here) and Calibrate IO runs. I’ve been using it for quite a while on new RAC/non-RAC installations from client sites.. and I used it heavily on my R&D server while continuously enhancing the toolkit

The SECTION 2 gives you a standard tool to demonstrate the behavior of IORM (http://karlarao.tiddlyspot.com/#IORM).. so let’s say you are playing around with IORM percentage allocations for let’s say 3 databases the “saturate” script works well to generate load for each database and then you can observe the effects of the percentage allocation to the IO bandwidth/latency of each database.

when you run

./saturate 4 dbm1 2 exadb1
it will create 4 sessions on dbm1 and 2 sessions on exadb1 all doing parallel select and it outputs a log file for each database session. Each session log file has details on the start and end time, elapsed, MB/s which is pretty much everything you need to know to quantify the performance from a session level perspective. You’ll appreciate this session level output and be impressed on what IORM can do when you start investigating on IO prioritization as you see sessions from the other database having higher MB/s and lower elapsed times and as you play with different IORM scenarios.
cat *log | grep benchmark
Sample output below:
 benchmark ,instance       ,start            ,end              ,elapsed   ,MBs
 ----------,---------------,-----------------,-----------------,----------,-------
 benchmark ,dbm1           ,05/13/12 19:18:28,05/13/12 19:19:32,        64,    537
 benchmark ,dbm1           ,05/13/12 19:18:28,05/13/12 19:19:30,        62,    554
 benchmark ,dbm1           ,05/13/12 19:18:28,05/13/12 19:19:32,        63,    545
 benchmark ,dbm1           ,05/13/12 19:18:28,05/13/12 19:19:32,        64,    537
 benchmark ,exadb1         ,05/13/12 19:18:28,05/13/12 19:19:32,        64,    539
 benchmark ,exadb1         ,05/13/12 19:18:28,05/13/12 19:19:32,        64,    539
So the output of smartscanloop is the high level IO numbers across the cluster and the log files are your session detail numbers. Below is the simple output which just shows the SmartScan MB/s per database

This AWESOME text graph is similar to what you see in the Enterprise Manager performance page IO tab. Note that you’ll be seeing higher numbers of MB/s on the smartscanloop compared to EM because of a more fine grained interval (2secs) which is also the same behavior when you measure the IO latency as I explained here (avg latency issue)

then I’ve modified the script to have the advanced output that shows the Hierarchy of Exadata IO. See the updated README for more details on how to use it. Below is the output

What’s good about this is the numbers are about the same when you do a per 10secs snapshot of AWR.. compare the AAS and latency (avgwt ms) columns of the above image and below

Again, this is pretty useful for monitoring the high level smart scans IO that’s happening across your Exadata cluster, if you are on an environment where there’s separation of duties you can even hand off this script to the sys admins that are monitoring the storage cells with their home grown alerting scripts, kSar, or nagios.. so this will serve as their view on the database side of things.

And if any of your clients haven’t adopted the IORM, this is very useful for DEMOs to customer sites to showcase the IORM capabilities.. and if you don’t want to show the latency and other columns you can opt to just use the simple output which only shows the smart scans MB/s (see get_smartscan.simple on README ). Most of the time.. the simpler the output the easier for them (users) to understand.

Sweet! right?!?

Wait.. Does the toolkit work on non-Exadata DBs?

The SECTION 1 works on Exa and non-Exa (I’ve also mentioned this at the beginning of this post)

The SECTION 2 works well on Exa because the underlying scripts just makes use of dcli commands, some shell and SQL.. but if you are interested to have the smartscanloop output on non-Exa environment I have the script get_smartscan.nonexa on the toolkit that shows the “physical read total bytes” instead of the “cell physical IO bytes eligible for predicate offload”..

then on each node do this if it’s a non-Exa RAC

$ while : ; do ./get_smartscan.nonexa | grep "%" ; echo "--" ; sleep 2 ; done
%,05/12/12 12:57:33,DEMO1     ,      0,
%,05/12/12 12:57:35,dbm1      ,      0,
%,05/12/12 12:57:38,DBFS1     ,      0,
%,05/12/12 12:57:40,exadb1    ,      0,
%,05/12/12 12:57:42,PRIMARY1  ,      0,
--
%,05/12/12 12:57:46,DEMO1     ,      1,
%,05/12/12 12:57:48,dbm1      ,   2078, @@@@@@@@@@@@@@@@@@@@@
%,05/12/12 12:57:50,DBFS1     ,      0,
%,05/12/12 12:57:52,exadb1    ,    972, @@@@@@@@@@
%,05/12/12 12:57:55,PRIMARY1  ,      0,
--
%,05/12/12 12:57:59,DEMO1     ,      0,
%,05/12/12 12:58:01,dbm1      ,   1088, @@@@@@@@@@@
%,05/12/12 12:58:03,DBFS1     ,      0,
%,05/12/12 12:58:05,exadb1    ,   2356, @@@@@@@@@@@@@@@@@@@@@@@@
%,05/12/12 12:58:07,PRIMARY1  ,      0,
--
%,05/12/12 12:58:11,DEMO1     ,      0,
%,05/12/12 12:58:14,dbm1      ,   1000, @@@@@@@@@@
%,05/12/12 12:58:16,DBFS1     ,      0,
%,05/12/12 12:58:18,exadb1    ,   2336, @@@@@@@@@@@@@@@@@@@@@@@
%,05/12/12 12:58:20,PRIMARY1  ,      0,
--
%,05/12/12 12:58:24,DEMO1     ,      0,
%,05/12/12 12:58:26,dbm1      ,   1031, @@@@@@@@@@
%,05/12/12 12:58:29,DBFS1     ,      0,
%,05/12/12 12:58:31,exadb1    ,   2071, @@@@@@@@@@@@@@@@@@@@@
%,05/12/12 12:58:33,PRIMARY1  ,      0,
--

But wait! there’s more!

Since the output is in comma separated format.. you can just do the following

./smartscanloop > smartscanloop.txt

and leave it running… So you can do this when you want to characterize the IO of a batch run or reporting of multiple databases and the text file that will be generated is pretty small and it doesn’t hurt in terms of CPU/IO resources because it’s just doing distributed SSH and doing some SELECT on v$ views

then from the text file you can easily see what’s the highest MB/s by executing the command below


$ less smartscanloop.txt | sort -nk5 | tail
pd01db03: %,05/10/12 21:29:28,biprd2    ,   7358, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/11/12 08:57:12,biprd2    ,   7363, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/10/12 14:55:54,biprd2    ,   7454, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/11/12 09:29:57,biprd2    ,   7556, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/10/12 18:17:50,biprd2    ,   7785, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/10/12 19:12:19,biprd2    ,   7880, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/11/12 10:31:24,biprd2    ,   7886, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/10/12 19:15:48,biprd2    ,   8112, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/11/12 10:31:40,biprd2    ,   8138, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/10/12 18:49:46,biprd2    ,   9315, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

and since it’s a CSV file you can then easily graph it in Tableau! (just put a header on it first)


host       , time            , inst     , smartscan,
pd01db01: %,05/10/12 13:08:14,hcmprd1   ,      0,
pd01db01: %,05/10/12 13:08:16,paprd1    ,      0,
pd01db01: %,05/10/12 13:08:18,rmprd1    ,      0,
pd01db01: %,05/10/12 13:08:20,lmprd1    ,      0,
pd01db01: %,05/10/12 13:08:22,DBFSPRD1  ,      0,
pd01db01: %,05/10/12 13:08:24,hcrprd1   ,      0,
pd01db02: %,05/10/12 13:08:14,hcrprd2   ,      0,
pd01db02: %,05/10/12 13:08:16,rmprd2    ,      0,
pd01db02: %,05/10/12 13:08:18,paprd2    ,      0,
pd01db02: %,05/10/12 13:08:20,hcmprd2   ,      0,
pd01db02: %,05/10/12 13:08:22,DBFSPRD2  ,      0,
pd01db02: %,05/10/12 13:08:24,lmprd2    ,      0,
pd01db03: %,05/10/12 13:08:14,biprd2    ,   3412, @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
pd01db03: %,05/10/12 13:08:16,DBFSPRD3  ,      0,
pd01db04: %,05/10/12 13:08:14,DBFSPRD4  ,      0,
pd01db04: %,05/10/12 13:08:16,mvwprd2   ,      0,
pd01db04: %,05/10/12 13:08:18,fsprd2    ,    763, @@@@@@@@
pd01db04: %,05/10/12 13:08:20,mtaprd112 ,      0,
... output snipped ...

and hack the data! remember on the advanced output you can slice and dice it by host, time, inst, aas, latency, returned, ic, smartscan.. :)

Here’s the IO graph per instance

and IO graph per instance and host

Hope I’ve shared you some good stuff ;)












Viewing all articles
Browse latest Browse all 63

Trending Articles