HP NZ Database Services


Library Cache Hit Ratio vs Cursor Versions

I was contacted for advice after a DBA had received alerts from database monitoring software about the Library Cache Hit Ratio being 93%.

According to a Metalink note:

Library Cache Hit Ratio

The hit ratio helps to measure the usage of the shared pool based on how many times a SQL/PLSQL statement needed to be parsed instead of being reused. The following SQL statement help you to calculate the library cache hit ratio:
If the ratio of misses to executions is more than 1%, then try to reduce the library cache misses by increasing the shared pool size.

So, the question was, should the shared pool be increased?

Below is my reply:

I don’t think increasing the shared pool necessary in this case.
The usual suspect (application not using bind variables) is not to blame in this case.
If you look at the number of
cursor versions, then you will see what I believe to be the problem:

col sql format a40
 version_count ,
 users_opening ,
 substr(sql_text,1,40) "SQL"
 FROM v$sqlarea
 WHERE version_count > 10
 order by version_count
 4cfmfucf91ctq           425             8               0 INSERT INTO zz_address_info(address_info
 gt3t2nz6bg3vt           494             8               0 INSERT INTO zzzzz_ship_addr(shipping_gro
 93zhuybvwamvv           585             5               0 INSERT INTO zz_address_info(address_info
 2hmfmuvfn4x40           632             9               0 INSERT INTO zzzzz_ship_addr(shipping_gro
 205xff0tkub6z           942             1               0 INSERT INTO zzzzz_ship_addr(shipping_gro

Many cursors have multiple versions. (Child cursors that can’t be shared).

Increasing the shared pool size might cause long hash chains because of all the child cursors and may even make performance worse for a database like this.

Let’s look at one example cursor in v$SQL that has only three versions:

sql_id: 0bs8t16vfdjf7
                                                                                                                 PARSING PARSING
 CHLD      SHARE      PERST        RUN LOADED KEPT OPEN            PARSE                                            USER  SCHEMA
 ---- ---------- ---------- ---------- ------ ---- ---- ----- ---- ----- ------------------- ------------------- ------- -------
    1      26998      10976       9016      1    0    0     2    1     1 2010-01-01/09:46:54 2010-01-11/11:36:28      35      35
    2      26998      10976       9016      1    0    0     2    1     2 2010-01-01/09:46:54 2010-01-11/11:40:59      35      35
    3      26998      10976       9016      1    0    1     1    0    10 2010-01-01/09:46:54 2010-01-11/11:18:53      35      35
 CHLD                 FULL       DISK     BUFFER                DIRECT                  ROWS
 ---- --------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------
    1         1          1          0         18          0          0          1          6         5
    2         4          4          0         72          0          0          4         24         5
    3         9          9          0        162          0          0          9         54         5
 SELECT t1.product_id,t1.version,t1.creation_date,t1.product_type,t1.admin_display,t1.display_name,t1.version,t1.description,t1.end_d
 FROM zzz_product t1
 WHERE t1.product_id IN (:1,:2,:3,:4,:5,:6)

We can see that the same user account executed the same SQL statement fourteen times, yet three child cursors were needed.
Why wasn’t the first one shared?

SQL> ( SYS @ somedb ) select * from V$SQL_SHARED_CURSOR where SQL_ID='0bs8t16vfdjf7';
 ------------- ---------------- ---------------- ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 S R P T M B M R O P M F L
 - - - - - - - - - - - - -
 0bs8t16vfdjf7 00000003A0BD18B0 0000000398D8E090    1 N N N N N N N N N N N N N N Y N N N N N N N N N N N N N N N N N N N N N N N N N
 N N N N N N N N N N N N N
 0bs8t16vfdjf7 00000003A0BD18B0 00000003A0501C10    2 N N N N N N N N N N N N N N Y N N N N N N N N N N N N N N N N N N N N N N N N N
 N N N N N N N N N N N N N
 0bs8t16vfdjf7 00000003A0BD18B0 000000039CE992A0    3 N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N
 N N N N N N N N N N N N N
3 rows selected.

So BIND_MISMATCH is the reason for not sharing cursors.

SQL> ( SYS @ somedb) select * from   v$sql_bind_metadata where ADDRESS='000000039CE992A0';
 ---------------- ---------- ---------- ---------- ---------- ------------------------------
 000000039CE992A0          6          1         32          0 6
 000000039CE992A0          5          1        128          0 5
 000000039CE992A0          4          1         32          0 4
 000000039CE992A0          3          1        128          0 3
 000000039CE992A0          2          1         32          0 2
 000000039CE992A0          1          1         32          0 1
6 rows selected.
SQL> ( SYS @ somedb) select * from   v$sql_bind_metadata where ADDRESS='00000003A0501C10';
 ---------------- ---------- ---------- ---------- ---------- ------------------------------
 00000003A0501C10          6          1         32          0 6
 00000003A0501C10          5          1        128          0 5
 00000003A0501C10          4          1        128          0 4
 00000003A0501C10          3          1        128          0 3
 00000003A0501C10          2          1        128          0 2
 00000003A0501C10          1          1        128          0 1
6 rows selected.
SQL> ( SYS @ somedb ) select * from   v$sql_bind_metadata where ADDRESS='0000000398D8E090';
 ---------------- ---------- ---------- ---------- ---------- ------------------------------
 0000000398D8E090          6          1         32          0 6
 0000000398D8E090          5          1        128          0 5
 0000000398D8E090          4          1         32          0 4
 0000000398D8E090          3          1        128          0 3
 0000000398D8E090          2          1         32          0 2
 0000000398D8E090          1          1        128          0 1

It looks like a fault with the application design.

The developers have set different sizes for the bind variables being used in the same SQL!

By the way, Oracle rounds the maximum length up to 32, 128, 2000 or ‘higher’.  (Search for “bind variable graduation” on this page).
A bind variable size of 33 will show as 128 in this view.

If the performance is a problem, then I suggest contacting the application developers to see if they can fix their code or check with Oracle Support to see if there is a workaround.
If the performance is not a problem, then calibrating the monitoring to the nature of the application might be the best option.


Don’t Blame the Database, part three

This post is about another example of where a performance problem was passed to us and we found that the real issue laid outside of the database.  There was a very similar issue discussed in this post, so I’ll just cover the differences in this case.

The users of a helpdesk application with an Oracle database backend were complaining about performance every Monday.  The first thing I did was establish the period in which performance was bad and compare it to a time when performance was normal.  I used Statspack reports to do this and found that I/O related waits were much higher, especially the average time spend waiting for redo log writes.  The overall workload was similar, but the significant difference was the I/O was taking much longer on Mondays.

I verified this theory using sar and vxstat to compare response times on Mondays vs Tuesdays-Fridays.

This is an example of vxstat output showing the problem:

vxstat  -i 60 -c 100 -d vol01
 OPERATIONS           BLOCKS        AVG TIME(ms)
TYP NAME              READ     WRITE      READ     WRITE   READ  WRITE

Mon Jan 18 15:32:15 2010
dm  cx00            31       219       496      3156   16.1    5.5
dm  cx01            27       247       432      4240    5.9    4.2
dm  cx02             1        20        16       320   50.0   26.0
dm  cx03             4        80        64      2800   12.5 8270.9

Mon Jan 18 15:33:15 2010
dm  cx00            29       171       560      1827    6.6    3.7
dm  cx01            29       232       464      5440    6.2 1818.0
dm  cx02            11        15       176       240   13.6   10.7
dm  cx03             2       180        32      4400   10.0  337.3

Mon Jan 18 15:34:15 2010
dm  cx00            38       131       608      1632   10.3 2299.0
dm  cx01            35       107       704      2304    7.1    8.8
dm  cx02            14         7       224       112   27.9    2.9
dm  cx03             4        94        64      2096   10.0    9.1

The effect for the user was waiting up to eight seconds for a commit now and then!

I requested SAN statistics, but these didn’t show the jump in response times that I was expecting to see.  In fact, the SAN reported better response times during the poor performance periods, and it couldn’t separate writes from reads.  It appeared that finding the cause from SAN statistics was not an option.

There were a number of Oracle databases using the same SAN, on multiple servers so I gathered sar and Statspack (or AWR for 10g), data for them, eg:

set pagesize 5000
set numwidth 15
select a.snap_time, b.TIME_WAITED_MICRO,total_waits from STATS$SNAPSHOT a, STATS$SYSTEM_EVENT b
where a.SNAP_ID=b.snap_id
and b.event='log file parallel write'
order by 1

This data showed that all of the databases using the same SAN had slower response times on Mondays, during the same 14 hour period.

Using sar -b and sar -u I could see that during the period of poor performance, there was a matching high amount of waiting for physical I/O (non-buffered).  There was no virtual memory paging, and the databases didn’t use direct I/O of any kind, so this was unexpected.

By watching the running processes on the box at the time the performance started going bad each Monday, I found that the a process called scc (system configuration collector) was running for the same 14 hour period.  It turns out that it was looping and running the format command repeatedly for hours, on a server with a thousand partitions/disk devices.  This activity must have slowed the SAN down enough to cause the response problems!

The scc script was rescheduled for Sundays and a code fix was suggested.

The important things in this investigation were:

  • Using historical information to create a baseline of normal performance for comparison.
  • Understanding that tweaking the database was not going to help when the disk write times varied by a factor of up to 1000, even if the general opinion is that the database must be at fault.
  • The format command can have an impact on SAN performance if there are many devices.


Troubleshooting Network Connection

Issue: Application not able to connect to the Oracle database. Connections are not going through and even when you try to telnet to the IP address, you get an error message:

telnet 172.32.362.171 1521

Connecting To 172.32.362.171…“Could not open connection to the host, on port 1521: Connect failed”

Troubleshooting steps

  • Checked for Firewall setting and checked fire wall logs to confirm that access is being allowed through the firewall. Firewall does show connection attempts going through it.
  • Checked for database listener and listener logs for any errors or connection attempts refused, however there were no errors or information on connection attempts.
  • Checked for listener and tnsnames entries for any incorrect information.
  • Traced network packets at the database host to confirm
    • Whether or not the database server is seeing the traffic from the application server
    • Whether database server is sending back the response to the application server.
    • After doing network trace it was confirmed that “traffic was only going in towards database, however database host server was not responding or acknowledging any packet requests”
    • On database host, done netstat –a, and from the results we could clearly see that packets are reaching from 172.32.334.100, however their status is SYN_RCVD, which means a remote host has asked for starting a connection however confirmation message is not being send to 172.32.334.100. This may be because of firewall from 172.32.362.171 to 172.32.334.100 or may be because of difference in routing. For successful connection netstat status must be “ESTABLISHED” instead of “SYN_RCVD”.

test1clust.1521      172.32.334.100.3934      0      0 49640      0 SYN_RCVD

test1clust.1521      172.32.334.100.3938      0      0 49640      0 SYN_RCVD

  • Above information proved that there are issues around database host server itself.  Possible scenarios:
    • Firewall issues (which we have already checked and confirmed that there are no issues around this)
    • Host-based firewall is not allowing these connections to go up to DB listener (hence the DB listener is not seeing any connection attempts and hence not responding back).
    • Database host doesn’t have static route to correctly route back response traffic back to the required subnet.

    Upon checking the route, it was found that there is no route for required subnet.
    The responses from database host server should leave via the same network interface (on the host) through which application server access-attempts are hitting database host.  Therefore the gateway configured for the destination subnet (172.32.334.100) should be the same as the gateway configured for this network interface.
    This was then determined to be an incorrect route on the DB host which is causing the problem.  We added a route on the DB server and that resolved the issue.


    Schema Consolidation vs Application Separation

    One of our customers is keen to reduce the number of databases they have by merging the schemata of many varying and mostly third-party applications into one (or a few) databases.  Their goal is to reduce operating costs and to use Oracle’s resource management to greater effect for applications using the same server.  However, an architect’s point of view can be very different than that of the operations staff.  Investigating problems, addressing bugs, arranging outages, tuning performance, applying upgrades, backup and recovery, etc, all would be affected by this strategy.  Our team is very aware of the operational realities of database support, and so we warn against schema consolidation as a default practice.

    A few years ago, our team won the support of a different client’s Oracle environment.  This client’s previous production DBA had squashed about thirty applications into one database, and we had to deal with all of the issues arising from his legacy.  We documented them as they arose, resulting in a list of reasons to keep applications in separate databases:

    Application Separation

    Each application must be stored in a dedicated database to make possible:

    1. Running applications with different versions and options (think licence cost) of Oracle.  Each application will have different requirements / certification.
    2. Independent Oracle upgrades or patching.
    3. Independent scheduled database outages (eg parameter change or Oracle patch application) – higher availability, less change request effort.
    4. Different service levels for different databases, eg H/A, RAC, DR etc add cost and complexity, so may not be suitable for all applications.
    5. Independent application recovery or database flashback to a previous point in time.  (Tablespace PITR is possible in some cases.  Restrictions apply, and extra planning and resources are required).
    6. Independent backup – quicker, smaller and more flexible.  (A backup regime for TSPITR is not standard practice, flexible, robust, efficient or a complete backup).
    7. Independent test database refreshes using RMAN, storage (eg SAN BCVs) or OS tools.  (Transportable tablespace might be an option, but not a great one).
    8. Customised and efficient configuration for each application (eg cursor_sharing, character set).
    9. Reduced inter-application contention, (eg same public synonym required, tablespace name, schema name, performance problems).
    10. System testing one application’s change without having to retest other applications (that share the same database).  Eg, if an upgrade script changes performance, or shared memory usage, or recreates a public synonym or schema name or database link or revokes public privileges… will that break another app inhabiting the same DB?
    11. Higher availability by restricting the outages caused by internal errors and corruption to one application.  (Eg what happens when an undo block is corrupted?  Or constant ORA-4031s?  Higher chance with multiple apps in one DB.  DB becomes SPOF – bug, file loss, corruption, latch deadlocks, etc).
    12. Lower data dictionary / fixed view overhead.  A higher number of extents, objects, etc make some regular queries much slower and heavier on resource use.  (Affects DBAs and monitoring tools).
    13. Clarity of which database objects and settings belong to which application, and which interdependencies exist (by way of database links and the privileges of the link users).  Helps when an application is decommissioned, upgraded or migrated separately.
    14. Better security between users of one application and another.  (Some applications may be designed to give the users high database level privileges which could be exploited to access data for other applications in the same database).
    15. Lower server requirements, (smaller databases need less resources, so less powerful servers are required).

    Possible Objections / Exceptions to the List Above

    • Less of the list above would be applicable when two applications essentially share the same data set.
    • Custom / in-house applications may be designed to mitigate some of the list above.
    • Some test or development schemata might be able to co-exist in one database when resources are limited and the environments can be recreated from source code or refreshed via export/import or transportable tablespaces.
    • Trivial schemata (one or two simple tables per application with little data or change) may be considered a low risk to merge into one database.
    • There will be some administration and resource overhead to having multiple databases instead of one.  Although, some management of a merged database will be more difficult – eg troubleshooting and performance tuning/investigation.  (Lots of sessions, segments, SQL etc to filter).
    • RAC allows scaling out instead of scaling up, so multiple smaller servers can still be used for a large database.  (Although RAC incurs higher costs, overhead and complexity).
    • For RAC databases, using services to partition the workload could mitigate some of the instance level contention and configuration issues.  If an instance is guaranteed, even during failover, to be exclusive to one application, then that instance could have customised initialisation parameters.
    • Using Oracle resource manager can address some of the performance / contention issues, but not all of them.  Eg, it only kicks in when CPU is maxed out, only controls foreground processes and it doesn’t manage all resources (I/O, shared pool, latches).


    Cases of Insensitivity

    I was asked to review customisations a DBA from another organisation had made to a database when installing a new application.  He had decided to deviate from the application vendor’s instructions, which had raised concerns.

    The changes that caught my attention were:

    alter system set NLS_SORT=BINARY_CI SCOPE=SPFILE;

    These setting change the way Oracle compares and sorts character strings, specifically to ignore the case (upper or lower).

    I remember in the old days (circa Oracle 7) I’d receive complaints about how applications that could run on different databases would be limited by a lack of case-insensitive searches if Oracle was used.  I used to suggest that the design could have met this requirement by storing an extra column that could be used for searches.  (Eg, all characters converted to uppercase, with spaces, apostrophes and hyphens removed, etc).  In more recent versions of Oracle, functional indexes and virtual columns could be considered instead.

    While it’s great that Oracle now provides the functionality to allow case-insensitive comparisons and sorting, one should research the feature and understand the ramifications before using it.  There is plenty of information about NLS_SORT and NLS_COMP on Metalink, blogs and the documentation, so I’ll be brief.

    If the application doesn’t support non-default NLS_SORT and NLS_COMP settings, then they should not be used.  The settings in this example can result in indexes not being used, or not being used as intended, because normal indexes are created using a binary sort order.  (Not binary_ci).  Sorting with values other than BINARY will be slower.  As usual, it pays to check for significant bugs before adopting a new feature, especially when deviating from an application vendor’s instructions.  In 10gR2 there are bugs that will return incorrect data when using the NLS_COMP & NLS_SORT values as set above.  (Eg, bugs 5225005, 5494008).
    Metalink note 227335.1 is a good source of further information on this topic.


    Supplementing Application Error Logging

    Database users may complain that something isn’t right with the application and request that the DBAs have a look in the database for problems.  Oracle doesn’t record every error reported to database clients in the alert log, just server related messages, (internal errors, snapshot too old, segment cannot extend, etc).  It is the application that should capture and/or report the full error stack returned to the database client, but unfortunately not all of them do.

    The following is an example of this situation, in which I had to track down the cause of a problem where the application should have made that clear from its error log.

    Application is Stuck – Example Case

    An application owner asked me to look at the database, because an job on the application server was stuck and appeared to  be looping.  I asked for the error messages and while waiting for them, I fired up SQL*Plus.

    The predominant and most unusual wait event for the recent and active sessions was “undo segment recovery”.  This happens when the session is waiting for the PMON to roll back a dead transaction.

    V$SESSION showed that the SQL being executed just before the wait was:

    select STATE from DUMMY where ID = :1  FOR UPDATE

    By this time I had received the application error logs.  They referred to a different select statement on the same table, but the only clue was “This may be due to a transient race condition. Rolling back and retrying the transaction.”

    The “rolling back” error text fitted with the “undo segment recovery” wait event I was seeing, even if the SQL didn’t match.  The application was trying to do the same thing with the same result repeatedly.

    I noticed that the number of “SQL*Net break/reset to client” wait events were steadily increasing.  This indicates error messages were being generated for some database sessions.

    At this point I could either try to trace from the client side, or from the server side.  I only had access to the database server, so I enabled SQL tracing for a couple of likely database sessions and soon had a trace file of a session that had been waiting on “undo segment recovery”:

    PARSING IN CURSOR #4 len=52 dep=0 uid=30 oct=3 lid=30 tim=52668306764175 hv=3327540542 ad='914b1fb8'
    select STATE from DUMMY where ID = :1  FOR UPDATE
    PARSE #4:c=0,e=941,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=52668306764164
    BINDS #4:
      oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
      oacflg=03 fl2=1000000 frm=01 csi=871 siz=24 off=0
      kxsbbbfp=ffffffff7ae74f60  bln=22  avl=04  flg=05
    WAIT #4: nam='undo segment recovery' ela= 2929424 segment#=5 tx flags=183 p3=0 obj#=27086 tim=52668309703725
    EXEC #4:c=10000,e=2939555,p=0,cr=7,cu=2,mis=1,r=0,dep=0,og=1,tim=52668309703911
    ERROR #4:err=1591 tim=1098267617
    WAIT #4: nam='SQL*Net break/reset to client' ela= 9 driver id=1952673792 break?=1 p3=0 obj#=27086 tim=52668309704412
    WAIT #4: nam='SQL*Net break/reset to client' ela= 973 driver id=1952673792 break?=0 p3=0 obj#=27086 tim=52668309705441
    WAIT #4: nam='SQL*Net message to client' ela= 5 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309705502
    WAIT #4: nam='SQL*Net message from client' ela= 2441 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309708092
    WAIT #28: nam='SQL*Net message to client' ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309708548
    WAIT #28: nam='SQL*Net message from client' ela= 2608 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309711219
    XCTEND rlbk=1, rd_only=1
    WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309711650

    And there is the error message!  ORA-1591 means “lock held by in-doubt distributed transaction”.

    Querying DBA_2PC_PENDING showed a couple of pending two-phase-commit transactions since the previous day.  I informed the application owner, who then asked me to ‘commit force’ those transactions.  Problem solved.

    Wrap Up

    If the users can’t provide the DBA with an ORA-NNNN error message, then the DBA can try to capture the error message by enabling SQL tracing and searching through the trace files for “err=“.


    Tune-n-Tools vs Oracle

    The Puzzle

    A UNIX administrator contacted us (all the way from San Diego) with a problem regarding semaphore configuration on a new HP-UX 11i v3 server running an Oracle database:

    The default semaphore settings were:

    SEMMSL=2048   (semaphores per set)
    SEMMNI=2048   (number of sets)

    He had read Oracle’s installation documentation which said that the product of these two parameters must be at least the number of PROCESSES as defined in the Oracle database’s init.ora.  By default SEMMSL x SEMMNI = 4,194,304; much higher than the database’s PROCESSES parameter of 10,000.

    Following Tune’n’Tools guidance, the UNIX administrator changed the parameters to:


    The new settings resulted in product of 1,048,576, which should support 10,000 database processes right?  So why did the database fail to start and report the following error?

    SQL> startup nomount pfile="/${ORACLE_SID}/oracle/admin/pfile/init${ORACLE_SID}.ora";
    ORA-27154: post/wait create failed
    ORA-27300: OS system dependent operation:semids_per_proc failed with status: 0
    ORA-27301: OS failure message: Error 0
    ORA-27302: failure occurred at: sskgpwcr2
    ORA-27303: additional information: semids = 136, maxprocs = 10000

    The Solution

    Oracle will try to get PROCESSES+4 semaphores per set/id, and then divide the result by two until it is smaller than SEMMSL.  In this case, we start with 10,004 and end up with 78 (74 for database and 4 overhead per set).  This requires 136 sets/ids.

    There is an Oracle compilation parameter SS_SEM_MAX which limits the maximum number of semaphore sets oracle will allocate, and this normally defaults to 10.  OS tuning tools may not take this into consideration when making recommendations.

    So, there is a hidden limit: Least(SEMMNI,10) x (SEMMSL-4) >= PROCESSES

    For the case above: 10 x 124 < 10,000 [error], but 10 x 2044 > 10,000 [no error].

    (See Metalink Doc 15566.1 for more about SS_SEM_MAX).

    Metalink Doc 15654.1 ( TECH: Calculating Oracle’s SEMAPHORE Requirements) gives this guidance:

    SEMMSL is an arbitrary figure which is best set to a round figure no smaller that the smallest ‘processes’ figure for any database on the system.

    Which makes sense given the SS_SEM_MAX limitation, and given that the init parameter “processes” for a DB may be increased after the initial installation.  I’d suggest SEMMSL be at least 4 + maximum expected PROCESSES value for all instances, or else Oracle’s “halve and try again” method may introduce inefficiencies.

    The default values of SEMMSL=2048 and SS_SEM_MAX=10 would support up to 16,387 processes for a single instance, or ten instances with 2,000 processes each.  Not many people would have run into the SS_SEM_MAX limitation with these default values.


    Beyond DB Time

    This post follows on from a previous one in which I used a Statspack report to show the value of the time model statistics. In this post I use some guess work to look outside the DB Time metric, to get a feel for what else might affect the response time for the application, and to gauge how important tuning the database (reducing DB Time) is from a holistic point of view.

    I should point out that I’m not as familiar with AIX (the OS in this case) as I am with Solaris, Linux, HP-UX or even Windows.  I had no access to the server in question during the period for the Statspack report.  These are theories.  Make up your own mind and send us feedback (see About us page for email address) if you have knowledge that you’d like to share.

    The system was described to me as “CPU constrained”.  The Statspack report shows that the database was directly responsible for only 24% of the 78% CPU utilisation (in other words increased server CPU utilisation by 19%).

    Host CPU  (CPUs: 2  Cores: 1  Sockets: 0)
    ~~~~~~~~              Load Average
                          Begin     End      User  System    Idle     WIO     WCPU
                        ------- -------   ------- ------- ------- ------- --------
                           5.90    5.92     44.49   34.03   21.47    1.47  309.01
    Instance CPU
    ~~~~~~~~~~~~                                       % Time (seconds)
                                                -------- --------------
                         Host: Total time (s):                  7,199.4
                      Host: Busy CPU time (s):                  5,653.5
                       % of time Host is Busy:      78.5
                 Instance: Total CPU time (s):                  1,364.2
              % of Busy CPU used for Instance:      24.1
            Instance: Total Database time (s):                  4,059.1
      %DB time waiting for CPU (Resource Mgr):       0.0
    OS Statistics
    -> ordered by statistic type (CPU use, Virtual Memory, Hardware Config), Name
    Statistic                                  Total
    ------------------------- ----------------------
    BUSY_TIME                                565,349
    IDLE_TIME                                154,590
    IOWAIT_TIME                               10,579
    SYS_TIME                                 245,021
    USER_TIME                                320,328
    OS_CPU_WAIT_TIME                       2,224,700
    RSRC_MGR_CPU_WAIT_TIME                         0
    PHYSICAL_MEMORY_BYTES              8,589,934,592
    NUM_CPUS                                       2
    NUM_CPU_CORES                                  1
    GLOBAL_RECEIVE_SIZE_MAX                1,048,576
    GLOBAL_SEND_SIZE_MAX                   1,048,576
    TCP_RECEIVE_SIZE_DEFAULT                  16,384
    TCP_RECEIVE_SIZE_MAX      ######################
    TCP_RECEIVE_SIZE_MIN                       4,096
    TCP_SEND_SIZE_DEFAULT                     16,384
    TCP_SEND_SIZE_MAX         ######################
    TCP_SEND_SIZE_MIN                          4,096
    OS Statistics - detail 
      Snap Snapshot
        Id Day Time          Load  %Busy  %User %System   %WIO  %WCPU
    ------ --------------- ------ ------ ------ ------- ------ ------
      1654 Tue 18 11:00:04    5.9
      1655 Tue 18 11:31:02    7.5   82.5   46.5    36.0    1.9
      1656 Tue 18 12:00:04    5.9   74.3   42.4    31.9    1.0

    This server is dedicated to the one database, and has no application component running on it.  So what was using the rest of the CPU?  Perhaps the database CPU measurements were not quite correct, (eg due to LOB OCI activity?), or perhaps the listener was using some CPU (tracing enabled?), but it seems to me there is more to it than that.  I see that 44% of the CPU busy time happened in system mode.  This could be related to I/O or network traffic (among other things).  It was subsequently discovered that the I/O was being cached by the OS, so perhaps that contributed to some degree.

    Finding the cause might offer some opportunity for tuning, greater than just trying to reduce background and DB CPU time.

    How about network traffic?

    I already know that this application uses LOBS, (direct I/O to LOBs showed up in the Statspack report), and  LOB access can result in a high number of round trips.

                                                                  Avg          %Total
                                              %Tim Total Wait   wait    Waits   Call
    Event                               Waits  out   Time (s)   (ms)     /txn   Time
    ---------------------------- ------------ ---- ---------- ------ -------- ------
    SQL*Net message from client    16,327,445    0    471,448     29     69.9
    Statistic                                      Total     per Second    per Trans
    --------------------------------- ------------------ -------------- ------------
    bytes received via SQL*Net from c      5,432,266,525    1,508,962.9     23,262.5
    bytes sent via SQL*Net to client       9,743,180,327    2,706,439.0     41,723.1
    SQL*Net roundtrips to/from client         16,329,496        4,536.0         69.9

    There were 16 million round trips and about 15GB of data transferred across the network during the hour.

    4,536 round trips per second means 9,072 SQL*Net packets (up to 8000bytes) arriving or departing per second, which may then be broken into multiple network packets (up to 1500bytes), each potentially causing CPU interrupts which have to be handled by the OS. (“Potentially” because of interrupt coalescing).  Surely this would contribute to the CPU time system in system mode.

    Further, it might show that the active database time is less significant to overall application performance than we might otherwise assume.

    The total wait time was 471,448 seconds, most of which will be idle application time, but not all of it.  Some of this time is network overhead, and some will be application think time.  It is not possible to calculate the network + application time from the data in the Statspack report, but we can make a safe guess at a lower limit using the wait event histogram information.

    Total ----------------- % of Waits ------------------
    Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
    -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
    SQL*Net more data to clien  203K  99.1    .2    .3    .3    .0    .0    .0
    SQL*Net message from clien   16M  16.1  29.5  25.4  19.3   6.5   1.6   1.4    .2
    SQL*Net message to client    16M 100.0    .0    .0    .0    .0    .0    .0
    SQL*Net more data from cli   60K  58.5    .2    .2    .2   2.3  38.1    .4

    If we decide to treat any SQL*Net wait event longer than one second as idle (application and DB doing nothing), and treat the rest as time during which the user would be waiting, then we can get a lower limit from the highlighted row above.  (I’ll ignore the less significant more data waits for this example).

    0ms <= 16.1% of waits < 1ms     0.161 * 0ms * 16,327,445 = 0s
    1ms <= 29.5% of waits < 2ms     0.295 * 1ms * 16,327,445 = 4,816s

    Taking the lower bound of each I calculate at that 45,701 sec is the lower limit.  Compare this to the total DB Time of 3,393 sec, and we can see that the time spent in the database is no more than 6.9% of the total response time of the app server + database server components.

    Note that this doesn’t include any time used for communication between the application server and the user, browser processing, etc.

    This extra information puts into perspective the scope of any benefit from reducing DB Time.  This information could be useful to the overall tuning effort by moving the focus to options that might help to reduce the number of round trips or make them more efficient.  (Eg bulk processing, pre-fetching, using varchar2 instead of LOB, increasing SDU, removing firewalls between app and DB, tuning interrupt coalescing parameters, application server tuning, etc).

    Best of Times or Worst of Times?

    When I first saw 10g’s time model views I wasn’t impressed.  The components overlapped, so they didn’t add up to 100% and there weren’t many categories.  I’d been using selected wait events + CPU statistics to approximate a breakdown of foreground response time ever since I’d first seen Steve Adams’ IXORA site.  I didn’t see anything extra offered by the time model views….. but I was wrong.

    I’ll use a Statspack report, that was sent to me for analysis, to demonstrate how the 10g time model statistics can be useful.

    Here are extracts from a statspack report for an database on a dual CPU AIX server over a one hour period:

    Top 5 Timed Events                                                    Avg %Total
    ~~~~~~~~~~~~~~~~~~                                                   wait   Call
    Event                                            Waits    Time (s)   (ms)   Time
    ----------------------------------------- ------------ ----------- ------ ------
    log file sync                                  230,767         868      4   49.6
    log file parallel write                        227,285         577      3   33.0
    CPU time                                                       198          11.3
    db file parallel write                          17,889          26      1    1.5
    SQL*Net more data to client                    203,568          22      0    1.2

    Looking at the top wait events first gives the impression that the database wasn’t very busy, and that commits and redo log I/O are the biggest targets for tuning.  However, if we then consider the DB Time and DB CPU metrics, a different picture starts to form:

       Elapsed:      60.00 (mins) Av Act Sess:       0.9
       DB time:      56.54 (mins)      DB CPU:      22.40 (mins)
    Time Model System Stats
    -> Ordered by % of DB time desc, Statistic name
    Statistic                                       Time (s) % DB time
    ----------------------------------- -------------------- ---------
    DB time                                          3,392.7
    DB CPU                                           1,344.1      39.6
    sql execute elapsed time                           862.5      25.4
    parse time elapsed                                 100.0       2.9
    PL/SQL execution elapsed time                       10.2        .3
    connection management call elapsed                   6.2        .2
    hard parse elapsed time                              3.3        .1
    hard parse (sharing criteria) elaps                  0.3        .0
    hard parse (bind mismatch) elapsed                   0.2        .0
    sequence load elapsed time                           0.1        .0
    repeated bind elapsed time                           0.0        .0
    background elapsed time                            666.4
    background cpu time                                 20.1

    We can now see that the database was much busier than the top waits section indicated.  Remember DB Time = DB CPU + I/O time + other waits (foreground), so if all the foreground I/O and wait events were timed and captured correctly, they should add up to 2,000 seconds.  It is obvious from the top timed events section that about 1,000 seconds of I/O or other wait time is missing.

    Should we just follow the usual step of jumping to the top SQL sections of the report and start investigating how to tune each of them?  In this case, the report goes on to show that the SQL captured by Statspack for the hour covers 100% of SQL executions and 97.3% of reads from the database buffer cache but only:

    • 6.4% of Total DB CPU  (foreground database sessions)
    • 3.1% of Total Disk Reads
    • 27.5% of Total DB Time  (foreground database sessions)

    So how useful are the top SQL sections of the report?

    I still want to know what the database was doing while it was busy, so that I can characterise the database and get a feel for where the opportunities for tuning are.

    Browsing through the rest of the report, I notice a fair amount of direct I/O, mostly LOB related.

    Statistic                                      Total     per Second    per Trans
    --------------------------------- ------------------ -------------- ------------
    physical reads                                53,787           14.9          0.2
    physical reads direct                         52,626           14.6          0.2
    physical reads direct (lob)                   47,504           13.2          0.2
    physical writes                               61,375           17.1          0.3
    physical writes direct                         8,773            2.4          0.0
    physical writes direct (lob)                   8,773            2.4          0.0

    Direct I/O is performed by the foreground processes, but the wait event times for direct I/O are not accurate.  (Inaccuracies vary depending on many factors, version/patch level, OS, mount options etc).  Also, LOB APIs are not associated with a user cursor (eg top SQL), so the time for some of the LOB work (and data transmission) can be hard to measure/identify.

    Related notes for those interested:
    268476.1 – LOB Performance Guideline – (see “CPU time and Elapsed time – not reported accurately”)
    223117.1 – Tuning I/O-related waits
    50415.1 – WAITEVENT: “direct path read” Reference Note
    50416.1 – WAITEVENT: “direct path write” Reference Note

    Note 223117.1 says: Due to the way in which time for these waits is recorded (it does not measure the time taken to perform the I/O), their relative position in listings such as Statspack’s “Top 5 Wait/Timed Events” cannot be used to evaluate their true impact.

    The DB Time metric does include the time the database sessions were performing direct I/O, so it becomes a useful sanity check when reporting on the break down of the wait events.  I like to compare (DB Time – DB CPU) to the total of the non-idle foreground wait events, to give a percentage of missing event time.  (About 30% in this case).

    So…. if I assume that all the missing time (1,000 seconds) is related to direct I/O, then I can estimate that DB Time is made up of about 60% I/O and 40% CPU.

    In this particular case, the DBAs supporting the database (and that sent our team the Statspack report), discovered that the database was not using direct or concurrent I/O for the JFS2 filesystem.  After testing with direct I/O and a larger database block cache, my next target for tuning (if required) would be to investigate the LOBs to see if any were suitable to be cached.  (Reduce direct I/O reads, and perform writes in the background).

    Final Notes on DB Time

    The first thing I do when logging into a UNIX or Linux database server is check the load.  I type “w” by habit, and compare it to my memory of usual loads for the system.  I find it a very good starting point for getting a ‘feel’ of how burdened the system is.

    DB Time allows us to calculate a very simple but useful metric equivalent to load: average active sessions, calculated from DB Time / clock time.  We may write more about how we use this metric to support our customers in a future post.

    While the DBA is mostly concerned with what the database is doing while it is busy, sometimes we can and should expand the scope wider than the information provided by DB Time and the time model statistics.  For example, the Statspack report example used for this post offered more useful information about the performance of the database server and the application as a whole.  (See this post).


    When the Wrong Index Works Better

    This week I saw a case where a poor choice of index may have actually improved performance…. temporarily.

    An application with performance problems had been investigated and a new index had been implemented which reportedly improved reduced the relevant query’s execution time from 40 seconds to between 6-15 seconds.  However, there were still complaints about the variable length of the execution time, so I began to review the situation.

    From V$SQL I could see hundreds of examples of the query that had run recently – after the index had been created.  The elapsed time for these took up to 103 seconds to complete.  The number of rows returned and physical/logical block gets also varied greatly.  I began to doubt the accuracy of the description of the new index’s impact, so I needed to verify that the new index was the best choice.

    The queries were similar to this (altered for simplicity and confidentiality):

    SELECT some_columns
    FROM  dummy_table
    AND AMOUNT1 + AMOUNT2 <>0  
    AND to_char(TIME_STAMP, 'yyyy-mm-dd HH24:MI:SS') <= '2009-08-18 17:03:54'

    Each query had a literal for the TIME_STAMP comparison.
    The first thing I noticed was the common developer error of putting the function on the column instead of the constant.  It should have been rewritten as:

    TIME_STAMP <= to_date('2009-08-18 17:03:54','yyyy-mm-dd HH24:MI:SS');

    This would allow Oracle to only convert between char and date once on the constant, rather than on the column for each row processed.  Also, it would allow the TIME_STAMP value to be used to access an index, if a suitable one existed.

    These were the relevant indexes:

                                                       DIST KEYS       LEAF BLKS LAST ANALYZED
    INDEX_NAME                                         (SELCTY%)   BPK (DEPTH)   (ESTIMATE %)   COLUMN (BYTES ASC/DESC)
    -------------------------------------------------- ---------- ---- --------- -------------- ---------------------------------------
    DUMMY_TABLE_IDX1                                   34K (0)    #### 243K (3)  15/08/09 (1)   FOREIGN_KEY (22 ASC)
    DUMMY_TABLE_IDX2                                   346K (1)    108 283K (3)  15/08/09 (1)   TIME_STAMP (7 ASC)
    DUMMY_TABLE_IDX3                                   25M (48)      1 192K (3)  15/08/09 (7)   FOREIGN_KEY (22 ASC)
                                                                                                TIME_STAMP (7 ASC)
    DUMMY_TABLE_PK (U)                                 51M (100)     1 274K (2)  15/08/09 (1)   PRIMARY_KEY (22 ASC)

    DUMMY_TABLE_IDX3 was the newly created index that was supposed to improve performance.

    Although the to_char function prevented the TIME_STAMP column from being used to access the index, it could still be used to filter rows before accessing the DUMMY_TABLE to get the rest of the columns.  The DUMMY_TABLE was big and could never reside completely in the cache, so avoiding unnecessary table accesses would boost performance.  But would filtering rows based on the TIME_STAMP achieve this aim?

    I am always suspicious of date_column < :date_time_value because it is hardly ever very selective.  In this case I noticed that the date and time compared to the TIME_STAMP column was always within a second or two of the actual execution time.  As one would expect from the name, the TIME_STAMP column has only past times in it.

    Select count(1) from dummy_table where time_stamp > sysdate;

    ...returned no rows.  So, the TIME_STAMP column is of no use in an index to improve the performance of this query.

    So why was there an impression that the index was beneficial?  One possible answer is that the new index is compact (dense) because it hasn't had much/any data deleted from it, where as the index that would have previously been used by the query (DUMMY_TABLE_IDX1) had become sparse after a lot of data was deleted.  The index information above shows that the old index is 26% larger than the new index (confirmed by DBA_SEGMENTS) despite the new index having an extra column!  At first the smaller index (DUMMY_TABLE_IDX3), with less empty blocks would have been more efficient than the old one (see this post about index performance).  But, after time passes and data is deleted, DUMMY_TABLE_IDX3 will become larger and perform worse because it stores an extra column (TIME_STAMP) which adds nothing for performance because it doesn't filter any rows.

    Simply coalescing the index would have provided more benefit than adding the new index, without the overhead.  Compressing it would have improved it even more.

    An alternative to the above:
    Replace DUMMY_TABLE_IDX3 with a compressed functional index on (FOREIGN_KEY, AMOUNT1 + AMOUNT2), because the criterion "AMOUNT1 + AMOUNT2 <> 0" does filter out a lot of rows for some of the queries, and so table accesses (sequential I/O) would be reduced.

    Next Page »

    The Rubric Theme Blog at WordPress.com.


    Get every new post delivered to your Inbox.