Oracle CPU Time


 

 

oracle_cpu_bertrand_drouvot.png

Image from 

UPDATE: thanks to a discussion in the comments it’s come to my attention that there should be some more clarification on values used and what the mean at the beginning of this post.

Ever wonder where CPU wait comes from in EM performance screens as seen above? well the following discussion will give you a SQL query to calculate CPU wait. In the above image we see both CPU and CPU Wait as two colors (light green and dark green) in EM as well as two values from the output of the SQL script (CPU_ORA and CPU_ORA_WAIT).

Oracle CPU statistics are measured from calls to the OS to see how much CPU is burned over a given elapsed time (i.e. certain quantity of code). The CPU values are cycles used and not time, thus it does not include time on the run queue waiting to get the CPU.

Oracle ASH, on the other hand, lists all Oracle sessions that want to run from Oracle’s perspective, i.e. they aren’t idle and they aren’t waiting for a non-idle wait event like I/O. Thus ASH includes time spent both running on CPU burning cycles and time spent waiting to get on the CPU.

Thus we can take the the amount of time “On CPU” from ASH and subtract the amount of CPU in Oracle statistics for CPU usage then the remainder is roughly time spent by Oracle sessions waiting to get onto the CPU.

The two challenges to getting the value of “Wait for CPU” are getting  CPU cycles burned and ASH time “ON CPU” into the same units and making sure that we are measuring both over the same interval.

Oracle already reports CPU in % used and in centi-seconds used.  I like to transform this into Average Active Sessions on CPU so I can compare it to my main way of showing ASH data which is Average Active Sessions on “ON CPU” which includes wait for CPU.

There are 3 kinds of CPU in the Oracle stats.

  1. Oracle CPU used
  2. System CPU used
  3. Oracle demand for CPU

Starting in 10g Oracle records both the CPU used by the instance as well as the load on the system in v$sysmetric. This is awesome as we can see how busy the system is and how much of the CPU Oracle is responsible for:

col metric_name for a25
col metric_unit for a25
select metric_name, value, metric_unit from v$sysmetric where metric_name like'%CPU%' and group_id=2;
METRIC_NAME                         VALUE METRIC_UNIT
------------------------------ ---------- ------------------------------
CPU Usage Per Sec              251.067016 CentiSeconds Per Second
CPU Usage Per Txn              5025.52477 CentiSeconds Per Txn
Host CPU Utilization (%)       11.6985845 % Busy/(Idle+Busy)
Database CPU Time Ratio        76.3291033 % Cpu/DB_Time

Now the question is how do we convert these to something useful? For me I put it into the equivalent of AAS and compare it to the core count:

   select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                             AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                        CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)    sAAS
            from
              ( select value busy from v$sysmetric
                where metric_name='Host CPU Utilization (%)'
                 and group_id=2 ) prcnt,
             ( select value cpu_count
                 from v$parameter
                where name='cpu_count' )  parameter;

CLASS                  AAS
---------------- ----------
CPU_ORA_CONSUMED       .002
CPU_OS                 .022

An AAS of 1 is equivalent to 100% of a core, so, OS CPU is about 2% of a core and of that Oracle used 0.2% of a core.
Not a very active system, and we can look at an active system later, but what I wanted to point out is that this query is missing an important statistic: the demand for CPU by Oracle. We can only add that, AFAIK, by joining in ASH:

   select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                             AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                         CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)     AAS
            from
              ( select value busy from v$sysmetric
                 where metric_name='Host CPU Utilization (%)'
                   and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter
                 where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60));

CLASS                   AAS
---------------- ----------
CPU_ORA_CONSUMED       .001
CPU_ORA_DEMAND          .02
CPU_OS                 .019

So the demand for CPU was higher than the amount consumed. Now the demand for CPU is coming from ASH which is sampled so the accuracy is weak, but in larger sample sets or busier systems it’s pretty darn good. The demand alerts us to CPU starvation on a busy  system.

I like to wrap all this up into a query with all the wait classes to see the overall load on Oracle including CPU consumed by Oracle, CPU demanded by Oracle and CPU used at the OS level:

select
                 decode(n.wait_class,'User I/O','User I/O',
                                     'Commit','Commit',
                                     'Wait')                               CLASS,
                 sum(round(m.time_waited/m.INTSIZE_CSEC,3))                AAS
           from  v$waitclassmetric  m,
                 v$system_wait_class n
           where m.wait_class_id=n.wait_class_id
             and n.wait_class != 'Idle'
           group by  decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait')
          union
             select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                                     AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                                CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)          AAS
            from
              ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60));

CLASS                   AAS
---------------- ----------
CPU_ORA_CONSUMED       .002
CPU_ORA_DEMAND          .03
CPU_OS                 .023
Commit                    0
User I/O                  0
Wait                      0

Ideally I’d want the CPU stats to be subsets of each other so that I could have a graphically stack-able set of statistics

now rolling it all together
with AASSTAT as (
           select
                 decode(n.wait_class,'User I/O','User I/O',
                                     'Commit','Commit',
                                     'Wait')                               CLASS,
                 sum(round(m.time_waited/m.INTSIZE_CSEC,3))                AAS
           from  v$waitclassmetric  m,
                 v$system_wait_class n
           where m.wait_class_id=n.wait_class_id
             and n.wait_class != 'Idle'
           group by  decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait')
          union
             select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                                     AAS
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                                CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)          AAS
            from
              ( select value busy from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS
             from v$active_session_history ash
             where SAMPLE_TIME > sysdate - (60/(24*60*60))
)
select
       ( decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) +
       CPU_ORA_CONSUMED +
        decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED ))) CPU_TOTAL,
       decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) CPU_OS,
       CPU_ORA_CONSUMED CPU_ORA,
       decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED )) CPU_ORA_WAIT,
       COMMIT,
       READIO,
       WAIT
from (
select
       sum(decode(CLASS,'CPU_ORA_CONSUMED',AAS,0)) CPU_ORA_CONSUMED,
       sum(decode(CLASS,'CPU_ORA_DEMAND'  ,AAS,0)) CPU_ORA_DEMAND,
       sum(decode(CLASS,'CPU_OS'          ,AAS,0)) CPU_OS,
       sum(decode(CLASS,'Commit'          ,AAS,0)) COMMIT,
       sum(decode(CLASS,'User I/O'        ,AAS,0)) READIO,
       sum(decode(CLASS,'Wait'            ,AAS,0)) WAIT
from AASSTAT)
/

    CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ------------ ---------- ---------- ----------
       .02       .002            0          0          0          0

Now let’s run up some load on a machine and database.
Take two databases, run up the CPU demand on both and add some wait contention. The machine has 24 cores so there is a definitely a problem when the CPU_TOTAL goes over 24. I’m running 14 sessions each trying to burn a core on two different databases. The first few lines the test is ramping up

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    14.887       .387     13.753         .747          0          0       .023

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    21.989      7.469     12.909        1.611          0          0       .044

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    26.595     12.125     11.841        2.629          0          0       .025

SQL> /

 CPU_TOTAL     CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO       WAIT
---------- ---------- ---------- ------------ ---------- ---------- ----------
    27.045     12.125     11.841        3.079          0          0       .025

Historically CPU used by Oracle was derived from

v$sysstat.name=’CPU used by this session’

but this statistic had problems as the value was only updated every time a call ended. A call could be a 1 hour PL/SQL procedure which would thus report zero cpu usage in the stats until it finished and the CPU would spike off the scale.

ASH had always been the most stable way to gather CPU demand, though Oracle has made improvements in gathering CPU statistics. I believe that the time model gathers CPU every 5 seconds in 10g, and in 11g it’s possible that CPU stats are gathered every second

Here is a visual example of a machine that has server memory contention, massive amounts of paging. There is OS CPU being used, but hardly any CPU being used by Oracle which makes sense as it’s an idle database, but what is revealing is the massive amount of CPU wait by Oracle. Oracle only has a little bit of work to do to take care of an idle database but we can see that most of Oracle’s CPU time is wait for CPU time as when it wants to work, pages have to be read back in,

I have my doubts as to the clarity of the layout of the above graph. A possibly clearer graph would be simply adding a line representing available CPU and take out the OSCPU bars. In the above graph I’ve charted OSCPU usage as AAS, ie average active sessions, mixing AAS of the database with AAS at the OS level. I think a  possible clear representation would be to show the Core count line, and draw the OSCPU usage shown upside down from the # of core lines, thus the space from the bottom axis to where the OSCPU reaches down would be available CPU.

 UPDATE

Thanks to the eagle eyes of John Beresniewicz a small error was identified in the above script. The last script didn’t correlate the time windows of v$sysmetric with v$active_session history. They both reported the last minute of statistics but the last minute reported in v$sysmetric could be up to a minute behind those in v$active_session_history, so here is a version that tries to correlate to two time windows so they are in sync

with AASSTAT as (
           select
                 decode(n.wait_class,'User I/O','User I/O',
                                     'Commit','Commit',
                                     'Wait')                               CLASS,
                 sum(round(m.time_waited/m.INTSIZE_CSEC,3))                AAS,
                 BEGIN_TIME ,
                 END_TIME
           from  v$waitclassmetric  m,
                 v$system_wait_class n
           where m.wait_class_id=n.wait_class_id
             and n.wait_class != 'Idle'
           group by  decode(n.wait_class,'User I/O','User I/O', 'Commit','Commit', 'Wait'), BEGIN_TIME, END_TIME
          union
             select 'CPU_ORA_CONSUMED'                                     CLASS,
                    round(value/100,3)                                     AAS,
                 BEGIN_TIME ,
                 END_TIME
             from v$sysmetric
             where metric_name='CPU Usage Per Sec'
               and group_id=2
          union
            select 'CPU_OS'                                                CLASS ,
                    round((prcnt.busy*parameter.cpu_count)/100,3)          AAS,
                 BEGIN_TIME ,
                 END_TIME
            from
              ( select value busy, BEGIN_TIME,END_TIME from v$sysmetric where metric_name='Host CPU Utilization (%)' and group_id=2 ) prcnt,
              ( select value cpu_count from v$parameter where name='cpu_count' )  parameter
          union
             select
               'CPU_ORA_DEMAND'                                            CLASS,
               nvl(round( sum(decode(session_state,'ON CPU',1,0))/60,2),0) AAS,
               cast(min(SAMPLE_TIME) as date) BEGIN_TIME ,
               cast(max(SAMPLE_TIME) as date) END_TIME
             from v$active_session_history ash
              where SAMPLE_TIME >= (select BEGIN_TIME from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 )
               and SAMPLE_TIME < (select END_TIME from v$sysmetric where metric_name='CPU Usage Per Sec' and group_id=2 )
)
select
       to_char(BEGIN_TIME,'HH:MI:SS') BEGIN_TIME,
       to_char(END_TIME,'HH:MI:SS') END_TIME,
       CPU_OS CPU_TOTAL,
       decode(sign(CPU_OS-CPU_ORA_CONSUMED), -1, 0, (CPU_OS - CPU_ORA_CONSUMED )) CPU_OS,
       CPU_ORA_CONSUMED CPU_ORA,
       decode(sign(CPU_ORA_DEMAND-CPU_ORA_CONSUMED), -1, 0, (CPU_ORA_DEMAND - CPU_ORA_CONSUMED )) CPU_ORA_WAIT,
       COMMIT,
       READIO,
       WAIT 
       -- ,(  decode(sign(CPU_OS - CPU_ORA_CONSUMED), -1, 0, 
       --                (CPU_OS - CPU_ORA_CONSUMED))
       --    + CPU_ORA_CONSUMED +
       --  decode(sign(CPU_ORA_DEMAND - CPU_ORA_CONSUMED), -1, 0, 
       --             (CPU_ORA_DEMAND - CPU_ORA_CONSUMED ))) STACKED_CPU_TOTAL 
from ( 
        select 
                min(BEGIN_TIME) BEGIN_TIME,
                max(END_TIME) END_TIME, 
                sum(decode(CLASS,'CPU_ORA_CONSUMED',AAS,0)) CPU_ORA_CONSUMED, 
                sum(decode(CLASS,'CPU_ORA_DEMAND' ,AAS,0)) CPU_ORA_DEMAND, 
                sum(decode(CLASS,'CPU_OS' ,AAS,0)) CPU_OS, 
                sum(decode(CLASS,'Commit' ,AAS,0)) COMMIT, 
                sum(decode(CLASS,'User I/O' ,AAS,0)) READIO, 
                sum(decode(CLASS,'Wait' ,AAS,0)) WAIT 
         from AASSTAT) 
/

The output now looks like

BEGIN_TI END_TIME  CPU_TOTAL	 CPU_OS    CPU_ORA CPU_ORA_WAIT     COMMIT     READIO	    WAIT
-------- -------- ---------- ---------- ---------- ------------ ---------- ---------- ----------
07:23:35 07:24:35	.044	   .024       .002	   .018 	 0	    0	    .001

I’m still open that there might be some more tweaking to do, so your milage may vary. Test, Test, Test and only trust yourself!
As a reminder the code above is on Github as part of the ASH Masters project. Feel free to get a Github account, fork the code and make changes. If you find cool new code or errors in old code, let me know and we will merge it into the ASH Masters project

 

https://github.com/khailey/ashmasters/blob/master/cpu_consumed_verses_cpuwait.sql

Print Friendly
August 27th, 2013 by

facebook comments:

  • EB

    Hi Kyle
    if you want value in second why do you calculate waits in centisecond ?(m.time_waited/m.INTSIZE_CSEC) is in centi-second ?

    Best Regards

  • JDw

    Hi Kyle,

    I really didn’t understand if the CPU_TOTAL, CPU_OS, CPU_ORA and so on columns are expressed in porcentage of CPU utilization or AAS (average active sessions) or units like i.e if I get 1 on some column, this means = 100% of one core utilization ? If this last one is correct , I can’t understand.

    I have a 80 cpu’s (4 sockets 10-core and hyperthreading ) and my actual utilization is about 3-4% in the OS view.

    If the last one is right, are you saying I have 3 core’s on the 100% utilization ?

    Could you give an example using the below information ?

    BEGIN_TIME END_TIME CPU_TOTAL CPU_OS CPU_ORA CPU_ORA_WAIT COMMIT READIO WAIT
    ——————– ——————– ———- ———- ———- ———— ———- ———- ———-
    04:09:48 04:10:48 3.347 1.724 1.623 0 .243 .332 .363

    Thank you

  • khailey

    >> are you saying I have 3 core’s on the 100% utilization ?

    no – it means that you are burning the equivalent to 3.347 (CPU_OS + CPU_ORA) CPUs at a 100%. Now I have no idea how that load is spread across the CPUs. A couple of CPUs could be at 100% if a process is pegged on it, but could also be spread across the CPUs evening. That’s not the point here. The point here is to know how much CPU we are using at the OS level and at the Oracle level and to know if Oracle processes have to wait for CPU.

    An AAS based only on CPU activity of 1 is equivalent to 1 CPU at 100% but if you have many CPUs this load could be spread across those CPUs.

  • JDw

    Hi Kyle,

    You are my super teacher…. Tks again..
    So, in this information below, I have 3.347 CPU’s (quantity) being burning at this moment (right now), in which 1.724 are being burning by S.O and 1.623 by Oracle process.

    CPU_TOTAL = 3.347
    CPU_OS = 1.724
    CPU_ORA = 1.623

    All above its about quantity of number of CPU’s being used and NOT AVERAGE ACTIVE SESSION, right ? (Because I can get more than 1 session at the same time in a CPU).

    Regards

  • khailey

    yes thats right
    as far as measuring in AAS it makes much much more intuitive sense than micro seconds on CPU or CPU %. Those don’t tell me much.
    AAS can consist of many parts such as waiting for I/O or locks or latches
    In this case we are just talking about CPU so AAS in this discussion is just the number of processes that want to run on CPU.
    How many actually get to run is based on actual CPU burned.

  • Facebook
  • Google+
  • LinkedIn
  • Twitter