25 February, 2008

OS Statistics from AWR Reports

There are occassions when it is important to review the "Operating System Statistics" section of the AWR Report. Examples can be :
1. The Database under review is not the only database on the server and one or more other database(s) and/or other application(s) are contending for CPU (and I/O !) resources. If you are reviewing an AWR report for a set of jobs which are CPU intensive (eg high "consistent gets", high "latch waits" and "latch spins" etc), then it is also important to know if adequate CPU was available for the database. You may or may not have been informed of the other database(s) and/or application(s) running on the same server before you begin reviewing AWR reports.
2. The application tier for the database under consideration is on the same server. When you are investigating high "SQL*Net message from client" waits for batch jobs where you know that neither network latency nor user think time would be the cause of these waits, you need to know if CPU resources are being consumed by the application as well.

Pre-10g, there was no method to obtain OS performance statistics from within Oracle. StatsPack would show elapsed time and Oracle time but not any information about CPU usage outside of 'CPU used by this session' and 'CPU used when call started' -- ie CPU usage reported against the same database's sessions.

10g AWR Reports do provide Operating System statistics.

Here are some examples.

Example 1 :
For these 11.8 minutes :


Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 211 24-Feb-08 22:45:15 21 3.1
End Snap: 212 24-Feb-08 22:57:03 18 1.9
Elapsed: 11.80 (mins)
DB Time: 0.31 (mins)


we see :



Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
RMAN backup & recovery I/O 1,681 82 49 437.9 System I/O
control file sequential read 3,800 20 5 109.4 System I/O
control file parallel write 526 5 9 24.2 System I/O
CPU time 3 16.2
db file sequential read 236 2 8 9.6 User I/O

Statistic Name Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time 15.5 82.7
inbound PL/SQL rpc elapsed time 3.5 18.6
PL/SQL execution elapsed time 3.3 17.5
DB CPU 3.0 16.2
RMAN cpu time (backup/restore) 2.0 10.8
parse time elapsed 0.5 2.9
hard parse elapsed time 0.5 2.6


So, Oracle CPU time was only about 3 seconds. Was the server also doing something else ?
These are the Operating System statistics from the AWR Report :



Statistic Total
-------------------------------- --------------------
AVG_BUSY_TIME 43,445
AVG_IDLE_TIME 27,531
AVG_SYS_TIME 4,977
AVG_USER_TIME 38,468
BUSY_TIME 43,445
IDLE_TIME 27,531
SYS_TIME 4,977
USER_TIME 38,468
...
NUM_CPUS 1


These OS CPU statistics account for 709.76seconds(BUSY+IDLE time).
If Oracle accounted for 3seconds of CPU time, then it seems as if some other processes accounted for 381.68 seconds(384.68-3).

Also, the Operating System statistics show that the CPU utilisation [including %sys] was 61.21% (434.45 of 709.76 seconds).

Example 2:
For these 3 hours :



Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 213 25-Feb-08 00:01:00 18 1.9
End Snap: 216 25-Feb-08 03:00:40 18 1.9
Elapsed: 179.67 (mins)
DB Time: 0.09 (mins)


we see :



Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
control file sequential read 6,096 7 1 124.0 System I/O
control file parallel write 3,593 4 1 77.7 System I/O
CPU time 3 50.4
db file parallel write 1,790 1 1 25.7 System I/O
db file sequential read 49 1 13 12.0 User I/O


Oracle CPU time was 3seconds.
But was the server CPU really idle ?



Statistic Total
-------------------------------- --------------------
AVG_BUSY_TIME 721,220
AVG_IDLE_TIME 356,777
AVG_SYS_TIME 5,531
AVG_USER_TIME 715,689
BUSY_TIME 721,220
IDLE_TIME 356,777
SYS_TIME 5,531
USER_TIME 715,689
NUM_CPUS 1


These statistics show us that CPU %utilisation [including %sys]
on the server was 66.90% (7212.22seconds out of 10,779.97seconds).
The server was, actually, quite busy -- busy doing something else. Those CPU cycles were consumed by some other application, not being Oracle.
{The other "application" was actually BOINC and BOINC Clients SETI_at_Home, Spinhenge_at_Home and malariacontrol.net running on my home PC at night}

8 comments:

Anand said...

Hi Sir,

I am having a AWR report for 1hr peak duration time and i see high "SQL*Net message from client" waits.How to investigate it?

Regards,
Anand

Hemant K Chitale said...

"SQL*Net message from client" is normally considered an "idle wait". It represents the time the database is NOT doing anything. It is waiting for the users/clients/appservers to send the next request to the database.

Only in some circumstances -- e.g. a batch job fetching a large number of rows from the database in small arrays -- may this wait event be worthy of investigation.

Hemant K Chitale

Ahmed AANGOUR said...

Hi Sir,

I have the following informations in my AWR report:
CPU time = 196479s
Busy_Time = 201096s

Does it means that 97.7% of the CPU used on the server is used by Oracle?
Does CPU time contain the CPU time used by both Background and server processes, or only by server processes?

Thanks in advance for your help

Regards,
Ahmed

Hemant K Chitale said...

Ahmed,
Yes, likely.

Raks said...

Its 2015 , Do you still think SQL*Net message from client is an Idel wait event?

Hemant K Chitale said...

Raks,
It depends on the context. For an interactive forms session of a user, it IS an idle event. As part of a batch initiated by an App Server, it is NOT an idle event as it is part of (a contributor to) the batch elapsed time that the Business side sees.

Hemant.

Aniimie said...

Hello Sir,
why you have taken the value in . secs for e.g.USER_TIME is 38,468 then why it is taken as 384.68 is it in %

as in my case the BUSY TIME IS VALUE is 5935943 and DB CPU under top 10 forground events by total wait time is 1554.3 so what will the calculation be

Hemant K Chitale said...

It is 38,468 centi-seconds.