HP NZ Database Services

2009/09/13

Beyond DB Time

This post follows on from a previous one in which I used a Statspack report to show the value of the time model statistics. In this post I use some guess work to look outside the DB Time metric, to get a feel for what else might affect the response time for the application, and to gauge how important tuning the database (reducing DB Time) is from a holistic point of view.

I should point out that I’m not as familiar with AIX (the OS in this case) as I am with Solaris, Linux, HP-UX or even Windows.  I had no access to the server in question during the period for the Statspack report.  These are theories.  Make up your own mind and send us feedback (see About us page for email address) if you have knowledge that you’d like to share.

The system was described to me as “CPU constrained”.  The Statspack report shows that the database was directly responsible for only 24% of the 78% CPU utilisation (in other words increased server CPU utilisation by 19%).

Host CPU  (CPUs: 2  Cores: 1  Sockets: 0)
~~~~~~~~              Load Average
                      Begin     End      User  System    Idle     WIO     WCPU
                    ------- -------   ------- ------- ------- ------- --------
                       5.90    5.92     44.49   34.03   21.47    1.47  309.01

Instance CPU
~~~~~~~~~~~~                                       % Time (seconds)
                                            -------- --------------
                     Host: Total time (s):                  7,199.4
                  Host: Busy CPU time (s):                  5,653.5
                   % of time Host is Busy:      78.5
             Instance: Total CPU time (s):                  1,364.2
          % of Busy CPU used for Instance:      24.1
        Instance: Total Database time (s):                  4,059.1
  %DB time waiting for CPU (Resource Mgr):       0.0

OS Statistics
-> ordered by statistic type (CPU use, Virtual Memory, Hardware Config), Name

Statistic                                  Total
------------------------- ----------------------
BUSY_TIME                                565,349
IDLE_TIME                                154,590
IOWAIT_TIME                               10,579
SYS_TIME                                 245,021
USER_TIME                                320,328
OS_CPU_WAIT_TIME                       2,224,700
RSRC_MGR_CPU_WAIT_TIME                         0
PHYSICAL_MEMORY_BYTES              8,589,934,592
NUM_CPUS                                       2
NUM_CPU_CORES                                  1
GLOBAL_RECEIVE_SIZE_MAX                1,048,576
GLOBAL_SEND_SIZE_MAX                   1,048,576
TCP_RECEIVE_SIZE_DEFAULT                  16,384
TCP_RECEIVE_SIZE_MAX      ######################
TCP_RECEIVE_SIZE_MIN                       4,096
TCP_SEND_SIZE_DEFAULT                     16,384
TCP_SEND_SIZE_MAX         ######################
TCP_SEND_SIZE_MIN                          4,096
          -------------------------------------------------------------

OS Statistics - detail 

  Snap Snapshot
    Id Day Time          Load  %Busy  %User %System   %WIO  %WCPU
------ --------------- ------ ------ ------ ------- ------ ------
  1654 Tue 18 11:00:04    5.9
  1655 Tue 18 11:31:02    7.5   82.5   46.5    36.0    1.9
  1656 Tue 18 12:00:04    5.9   74.3   42.4    31.9    1.0
          -------------------------------------------------------------

This server is dedicated to the one database, and has no application component running on it.  So what was using the rest of the CPU?  Perhaps the database CPU measurements were not quite correct, (eg due to LOB OCI activity?), or perhaps the listener was using some CPU (tracing enabled?), but it seems to me there is more to it than that.  I see that 44% of the CPU busy time happened in system mode.  This could be related to I/O or network traffic (among other things).  It was subsequently discovered that the I/O was being cached by the OS, so perhaps that contributed to some degree.

Finding the cause might offer some opportunity for tuning, greater than just trying to reduce background and DB CPU time.

How about network traffic?

I already know that this application uses LOBS, (direct I/O to LOBs showed up in the Statspack report), and  LOB access can result in a high number of round trips.

                                                              Avg          %Total
                                          %Tim Total Wait   wait    Waits   Call
Event                               Waits  out   Time (s)   (ms)     /txn   Time
---------------------------- ------------ ---- ---------- ------ -------- ------
SQL*Net message from client    16,327,445    0    471,448     29     69.9

Statistic                                      Total     per Second    per Trans
--------------------------------- ------------------ -------------- ------------
bytes received via SQL*Net from c      5,432,266,525    1,508,962.9     23,262.5
bytes sent via SQL*Net to client       9,743,180,327    2,706,439.0     41,723.1
SQL*Net roundtrips to/from client         16,329,496        4,536.0         69.9

There were 16 million round trips and about 15GB of data transferred across the network during the hour.

4,536 round trips per second means 9,072 SQL*Net packets (up to 8000bytes) arriving or departing per second, which may then be broken into multiple network packets (up to 1500bytes), each potentially causing CPU interrupts which have to be handled by the OS. (“Potentially” because of interrupt coalescing).  Surely this would contribute to the CPU time system in system mode.

Further, it might show that the active database time is less significant to overall application performance than we might otherwise assume.

The total wait time was 471,448 seconds, most of which will be idle application time, but not all of it.  Some of this time is network overhead, and some will be application think time.  It is not possible to calculate the network + application time from the data in the Statspack report, but we can make a safe guess at a lower limit using the wait event histogram information.

Total ----------------- % of Waits ------------------
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
SQL*Net more data to clien  203K  99.1    .2    .3    .3    .0    .0    .0
SQL*Net message from clien   16M  16.1  29.5  25.4  19.3   6.5   1.6   1.4    .2
SQL*Net message to client    16M 100.0    .0    .0    .0    .0    .0    .0
SQL*Net more data from cli   60K  58.5    .2    .2    .2   2.3  38.1    .4

If we decide to treat any SQL*Net wait event longer than one second as idle (application and DB doing nothing), and treat the rest as time during which the user would be waiting, then we can get a lower limit from the highlighted row above.  (I’ll ignore the less significant more data waits for this example).

0ms <= 16.1% of waits < 1ms     0.161 * 0ms * 16,327,445 = 0s
1ms <= 29.5% of waits < 2ms     0.295 * 1ms * 16,327,445 = 4,816s
etc...

Taking the lower bound of each I calculate at that 45,701 sec is the lower limit.  Compare this to the total DB Time of 3,393 sec, and we can see that the time spent in the database is no more than 6.9% of the total response time of the app server + database server components.

Note that this doesn’t include any time used for communication between the application server and the user, browser processing, etc.

This extra information puts into perspective the scope of any benefit from reducing DB Time.  This information could be useful to the overall tuning effort by moving the focus to options that might help to reduce the number of round trips or make them more efficient.  (Eg bulk processing, pre-fetching, using varchar2 instead of LOB, increasing SDU, removing firewalls between app and DB, tuning interrupt coalescing parameters, application server tuning, etc).

Best of Times or Worst of Times?

When I first saw 10g’s time model views I wasn’t impressed.  The components overlapped, so they didn’t add up to 100% and there weren’t many categories.  I’d been using selected wait events + CPU statistics to approximate a breakdown of foreground response time ever since I’d first seen Steve Adams’ IXORA site.  I didn’t see anything extra offered by the time model views….. but I was wrong.

I’ll use a Statspack report, that was sent to me for analysis, to demonstrate how the 10g time model statistics can be useful.

Here are extracts from a statspack report for an 11.1.0.7 database on a dual CPU AIX server over a one hour period:

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
log file sync                                  230,767         868      4   49.6
log file parallel write                        227,285         577      3   33.0
CPU time                                                       198          11.3
db file parallel write                          17,889          26      1    1.5
SQL*Net more data to client                    203,568          22      0    1.2
          -------------------------------------------------------------

Looking at the top wait events first gives the impression that the database wasn’t very busy, and that commits and redo log I/O are the biggest targets for tuning.  However, if we then consider the DB Time and DB CPU metrics, a different picture starts to form:

   Elapsed:      60.00 (mins) Av Act Sess:       0.9
   DB time:      56.54 (mins)      DB CPU:      22.40 (mins)
Time Model System Stats
-> Ordered by % of DB time desc, Statistic name

Statistic                                       Time (s) % DB time
----------------------------------- -------------------- ---------
DB time                                          3,392.7
DB CPU                                           1,344.1      39.6
sql execute elapsed time                           862.5      25.4
parse time elapsed                                 100.0       2.9
PL/SQL execution elapsed time                       10.2        .3
connection management call elapsed                   6.2        .2
hard parse elapsed time                              3.3        .1
hard parse (sharing criteria) elaps                  0.3        .0
hard parse (bind mismatch) elapsed                   0.2        .0
sequence load elapsed time                           0.1        .0
repeated bind elapsed time                           0.0        .0
background elapsed time                            666.4
background cpu time                                 20.1
-------------------------------------------------------------

We can now see that the database was much busier than the top waits section indicated.  Remember DB Time = DB CPU + I/O time + other waits (foreground), so if all the foreground I/O and wait events were timed and captured correctly, they should add up to 2,000 seconds.  It is obvious from the top timed events section that about 1,000 seconds of I/O or other wait time is missing.

Should we just follow the usual step of jumping to the top SQL sections of the report and start investigating how to tune each of them?  In this case, the report goes on to show that the SQL captured by Statspack for the hour covers 100% of SQL executions and 97.3% of reads from the database buffer cache but only:

  • 6.4% of Total DB CPU  (foreground database sessions)
  • 3.1% of Total Disk Reads
  • 27.5% of Total DB Time  (foreground database sessions)

So how useful are the top SQL sections of the report?

I still want to know what the database was doing while it was busy, so that I can characterise the database and get a feel for where the opportunities for tuning are.

Browsing through the rest of the report, I notice a fair amount of direct I/O, mostly LOB related.

Statistic                                      Total     per Second    per Trans
--------------------------------- ------------------ -------------- ------------
physical reads                                53,787           14.9          0.2
physical reads direct                         52,626           14.6          0.2
physical reads direct (lob)                   47,504           13.2          0.2
physical writes                               61,375           17.1          0.3
physical writes direct                         8,773            2.4          0.0
physical writes direct (lob)                   8,773            2.4          0.0

Direct I/O is performed by the foreground processes, but the wait event times for direct I/O are not accurate.  (Inaccuracies vary depending on many factors, version/patch level, OS, mount options etc).  Also, LOB APIs are not associated with a user cursor (eg top SQL), so the time for some of the LOB work (and data transmission) can be hard to measure/identify.

Related notes for those interested:
268476.1 – LOB Performance Guideline – (see “CPU time and Elapsed time – not reported accurately”)
BUG 3504487 – DBMS_LOB/OCILob* CALL RESOURCE USAGE IS NOT REPORTED, AS THEY ARE NOT PART OF A CURSOR
223117.1 – Tuning I/O-related waits
50415.1 – WAITEVENT: “direct path read” Reference Note
50416.1 – WAITEVENT: “direct path write” Reference Note

Note 223117.1 says: Due to the way in which time for these waits is recorded (it does not measure the time taken to perform the I/O), their relative position in listings such as Statspack’s “Top 5 Wait/Timed Events” cannot be used to evaluate their true impact.

The DB Time metric does include the time the database sessions were performing direct I/O, so it becomes a useful sanity check when reporting on the break down of the wait events.  I like to compare (DB Time – DB CPU) to the total of the non-idle foreground wait events, to give a percentage of missing event time.  (About 30% in this case).

So…. if I assume that all the missing time (1,000 seconds) is related to direct I/O, then I can estimate that DB Time is made up of about 60% I/O and 40% CPU.

In this particular case, the DBAs supporting the database (and that sent our team the Statspack report), discovered that the database was not using direct or concurrent I/O for the JFS2 filesystem.  After testing with direct I/O and a larger database block cache, my next target for tuning (if required) would be to investigate the LOBs to see if any were suitable to be cached.  (Reduce direct I/O reads, and perform writes in the background).

Final Notes on DB Time

The first thing I do when logging into a UNIX or Linux database server is check the load.  I type “w” by habit, and compare it to my memory of usual loads for the system.  I find it a very good starting point for getting a ‘feel’ of how burdened the system is.

DB Time allows us to calculate a very simple but useful metric equivalent to load: average active sessions, calculated from DB Time / clock time.  We may write more about how we use this metric to support our customers in a future post.

While the DBA is mostly concerned with what the database is doing while it is busy, sometimes we can and should expand the scope wider than the information provided by DB Time and the time model statistics.  For example, the Statspack report example used for this post offered more useful information about the performance of the database server and the application as a whole.  (See this post).

2009/07/21

LOB vs Network

In a previous post,  I pointed out that SQL*Net message from client is not always an idle wait from the user’s point of view.

During a performance investigation, I traced the application that wasn’t performing to expectations.  Here is an excerpt from a trace file:

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse     1145      0.12       0.13          0          0          0           0
Execute   1145      0.28       0.33          2          7         37           4
Fetch     1645      0.46       0.52          0       4314          0        5242
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     3935      0.86       0.99          2       4321         37        5246

Misses in library cache during parse: 0

Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                    5197        0.00          0.02
 SQL*Net message from client                  5197        0.90         58.88
 SQL*Net more data to client                   516        0.00          0.05
 SQL*Net more data from client                  40        0.00          0.00
 direct path read                               28        0.00          0.02
 db file sequential read                         2        0.01          0.02
 log file sync                                   4        0.00          0.01

Most of the response time from the application/user’s point of view ocurred “outside” of the database.  What was surprising, was that so many round trips were required for what seems like a simple function of the application, and that the tkprof report doesn’t show the waits associated with a cursor.

Looking at the trace file, the SQL*Net round trips always followed certain SQL statements:

PARSING IN CURSOR #1 len=120 dep=0 uid=53 oct=3 lid=53 tim=19783142399235 hv=1095791004 ad='906c6620'
SELECT * FROM TABLE_A WHERE COL1=:a AND COL2=:b
END OF STMT
PARSE #1:c=0,e=195,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=19783142399223
EXEC #1:c=0,e=271,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=19783142399755
WAIT #1: nam='SQL*Net message to client' ela= 4 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142399974
WAIT #1: nam='SQL*Net more data to client' ela= 93 driver id=1952673792 #bytes=2001 p3=0 obj#=72685 tim=19783142400471
FETCH #1:c=0,e=374,p=0,cr=3,cu=0,mis=0,r=1,dep=0,og=1,tim=19783142400540
WAIT #1: nam='SQL*Net message from client' ela= 13126 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142413820
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142413956
WAIT #0: nam='SQL*Net message from client' ela= 8580 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142422616
WAIT #0: nam='SQL*Net message to client' ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142422836
WAIT #0: nam='SQL*Net message from client' ela= 12421 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142435416
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142435546
WAIT #0: nam='SQL*Net message from client' ela= 10679 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142446296
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142446409
WAIT #0: nam='SQL*Net message from client' ela= 10579 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142457049
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142457144
WAIT #0: nam='SQL*Net message from client' ela= 11231 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142468451
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142468561
WAIT #0: nam='SQL*Net message from client' ela= 12782 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142481436
WAIT #0: nam='SQL*Net message to client' ela= 9 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142481763
WAIT #0: nam='SQL*Net message from client' ela= 7636 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142489618
WAIT #0: nam='SQL*Net message to client' ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142489736
WAIT #0: nam='SQL*Net message from client' ela= 12990 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142502814
WAIT #0: nam='SQL*Net message to client' ela= 6 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142503127
WAIT #0: nam='SQL*Net message from client' ela= 10648 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142513875
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142514010
WAIT #0: nam='SQL*Net message from client' ela= 8445 driver id=1952673792 #bytes=1 p3=0 obj#=72685 tim=19783142522541

Notice the waits for cursor #0.  I guessed this was some sort of OCI call that happened after the rows for the cursor had been returned.  This seems to get used for low level work that isn’t associated directly to a user’s cursor.  When I examined TABLE_A I saw that it had several CLOBs and BLOBs in it.  Were they to blame for the high number of round trips?

I found some Metalink notes which explain what is going on:
Note:398380.1     Poor Performance When Using CLOBS and Oracle Net
Bug No. 3892362   CLOBS HAVE POOR NETWORK PERFORMANCE – REQUEST FOR PRODUCT ENHANCEMENT
Note:66431.1      LOBS – Storage, Redo and Performance Issues
Note:401589.1     Performance Problems Using Clob and Blobs
Bug:7341386

Basically, not only does having LOB columns in a select restrict each fetch to one row at a time, additional round trips are needed to handle LOB locators and to retrieve data from the LOBs.  This can cause performance problems with high latency networks because of the high number of round trips (over 5000 in our case).

However, in another Oracle note, it is said:

“If I write OCI code then I can define an array of OCILobLocators and use array fetch with this and you get multiple rows in each net packet. In summary I do not see any server bug here. Whether array fetch is used for CLOBs or not is up to the client code. “

From the manual http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28395/oci07lob.htm :

Array Interface for LOBs
You can use the OCI array interface with LOBs, just as with any other datatype. There are two ways of using the array interface.
1.    Using the data interface

You can bind or define arrays of character data for a CLOB column or RAW data for a BLOB column. You can use array bind and define interfaces to insert and select multiple rows with LOBs in one round trip to the server…..”

In the example traced above, the problem ended up being with the configuration of the virtual application server, (okay, so not strictly LOB vs Network this time), but any communication latency can be multiplied by applications that access LOBs via many round trips as in the example above.  This is another LOB related performance issue that can be difficult to identify.  (See the first one I posted about).

2009/06/28

The Secret Life of LOBs

Now and then we are passed a problem that other people have been working on for a while, but despite lots of changes, it hasn’t been resolved.  In some cases this makes it impossible to obtain a clear description of the symptoms and history, and may even have introduced other issues.  I much prefer a ‘diagnostic approach’ rather the ‘try the usual suspects first method’.  One might ‘get lucky’ by simply changing a common tuning parameter, but I prefer to consider the risks, costs and testing effort the customer will incur, and so attempt to understand the problem and provide the best solution.

The case I’m going to briefly cover in this post was one of these problems with more rumour than substance to the problem history.  Significant changes had been made to the Oracle database, including changing the optimiser back and forward between cost and rule and cursor sharing between exact and force.  I’m not even sure if the problem I investigated was the original one or a side-effect of the various attempts at a solution.

Obvious Symptoms

The highest component of response time was the enqueue wait event.

There were about ten blocked sessions with the same SQL statement:

UPDATE dummytable SET nextId = nextId + 1 WHERE schemaId = 123;

My first thought was that the locking was due to the application incrementing a number field instead of using a sequence.  Then I noticed that each blocker relinquished the lock after a minute or so, and one of the other ten sessions became the blocker for another minute, and so on.  Assuming the application design hadn’t just changed, it was probably this 60 second delay while holding a lock that was at the heart of the problem.

The Not-so-Obvious

Finding what was happening during that 60 second delay was trickier than usual.  How so?  I’ll start by sharing a portion of the trace file.  Pay attention to the timing parts in blue.

PARSE #1:c=0,e=980,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=3,tim=31325192050363
EXEC #1:c=0,e=158,p=0,cr=4,cu=1,mis=0,r=0,dep=0,og=3,tim=31325192050680
WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1413697536 p2=1 p3=0
FETCH #1:c=10000,e=64,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=3,tim=31325192050859
WAIT #1: nam='SQL*Net message from client' ela= 381 p1=1413697536 p2=1 p3=0
WAIT #33: nam='db file sequential read' ela= 103 p1=15 p2=1542515 p3=1
WAIT #33: nam='db file sequential read' ela= 317 p1=15 p2=1542520 p3=1
WAIT #33: nam='direct path write (lob)' ela= 178 p1=10 p2=2008755 p3=1
WAIT #0: nam='SQL*Net message to client' ela= 7 p1=1413697536 p2=1 p3=0
WAIT #0: nam='SQL*Net message from client' ela= 596 p1=1413697536 p2=1 p3=0
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='FOR UPDATE  (cr=4 r=0 w=0 time=44 us)'
STAT #1 id=2 cnt=2 pid=1 pos=1 obj=42798 op='TABLE ACCESS BY INDEX ROWID table_a (cr=8 r=0 w=0 time=97 us)'
STAT #1 id=3 cnt=2 pid=2 pos=1 obj=42799 op='INDEX UNIQUE SCAN table_a_index (cr=6 r=0 w=0 time=57 us)'
=====================
PARSING IN CURSOR #1 len=66 dep=0 uid=23 oct=6 lid=23 tim=31325202616132 hv=4020366925 ad='ff8b078'
UPDATE table_a SET C2 = EMPTY_CLOB() WHERE C1 = '000000003637834'
END OF STMT
PARSE #1:c=0,e=2286,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=3,tim=31325202616118
EXEC #1:c=0,e=670,p=0,cr=7,cu=6,mis=0,r=1,dep=0,og=3,tim=31325202616968
WAIT #1: nam='SQL*Net message to client' ela= 2 p1=1413697536 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 313 p1=1413697536 p2=1 p3=0
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='UPDATE  (cr=7 r=0 w=0 time=475 us)'
STAT #1 id=2 cnt=1 pid=1 pos=1 obj=42799 op='INDEX UNIQUE SCAN table_a_index (cr=3 r=0 w=0 time=114 us)'
=====================
PARSING IN CURSOR #1 len=63 dep=0 uid=23 oct=3 lid=23 tim=31325202618985 hv=1434031237 ad='156ec818'
SELECT C2 FROM table_a WHERE C1 = '000000003637834' FOR UPDATE
END OF STMT
PARSE #1:c=0,e=1203,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=3,tim=31325202618977
EXEC #1:c=0,e=217,p=0,cr=4,cu=1,mis=0,r=0,dep=0,og=3,tim=31325202619378
WAIT #1: nam='SQL*Net message to client' ela= 1 p1=1413697536 p2=1 p3=0
FETCH #1:c=0,e=90,p=0,cr=4,cu=0,mis=0,r=1,dep=0,og=3,tim=31325202619616
WAIT #1: nam='SQL*Net message from client' ela= 505 p1=1413697536 p2=1 p3=0
WAIT #34: nam='db file sequential read' ela= 146 p1=15 p2=1473089 p3=1
*** 2008-11-05 17:14:00.054
WAIT #34: nam='db file sequential read' ela= 950 p1=15 p2=1473090 p3=1
*** 2008-11-05 17:14:14.217
WAIT #34: nam='db file sequential read' ela= 138 p1=15 p2=1473091 p3=1
WAIT #34: nam='SQL*Net more data from client' ela= 110 p1=1413697536 p2=1 p3=0
WAIT #34: nam='direct path write (lob)' ela= 183 p1=10 p2=1595041 p3=1
WAIT #0: nam='SQL*Net message to client' ela= 8 p1=1413697536 p2=1 p3=0
WAIT #0: nam='SQL*Net message from client' ela= 579 p1=1413697536 p2=1 p3=0
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='FOR UPDATE  (cr=4 r=0 w=0 time=56 us)'
STAT #1 id=2 cnt=2 pid=1 pos=1 obj=42798 op='TABLE ACCESS BY INDEX ROWID table_a (cr=8 r=0 w=0 time=135 us)'
STAT #1 id=3 cnt=2 pid=2 pos=1 obj=42799 op='INDEX UNIQUE SCAN table_a_index (cr=6 r=0 w=0 time=81 us)'
=====================
PARSING IN CURSOR #1 len=66 dep=0 uid=23 oct=6 lid=23 tim=31325238065528 hv=3338015855 ad='165cdab0'

We can see that the blocking session is eating up time:

31325202616132 – 31325192050859 = 10565273, or 10 seconds.

31325238065528 – 31325202619616 = 35445912, or 35 seconds.

Whatever is causing the delay, it hasn’t been assigned a wait event name.  One clue the trace does have for us is that there are reads to a couple of segments during the mysterious delay and just before writes to a LOB.  Using DBA_EXTENTS we can see that these segments are LOB indexes for TABLE_A.

Watching the blocking sessions revealed that they clock up a lot of CPU and consistent gets.  The LOB indexes for TABLE_A have high logical read counts. The buffer cache is also full of copies of the LOB index blocks. The picture is becoming clearer – something must be wrong with the LOB indexes.

Theory, Confirmation, Resolution

My best guess at this point was that this performance issue was caused by a big delete of data in TABLE_A.  The LOB chunks were marked as deleted (not free), so each time there was a new insert Oracle had to find re-usable space, change the deleted chunks to free and update the LOB index, (as well as manage the various bitmap blocks and undo blocks).  According to the Metalink Note:365156.1 this process can take a long time because it “has to traverse the empty LOB INDEX leaf blocks created by all the previous transactions”.  This would explain the high number of logical reads recorded against the LOB indexes during each insert/update.
This Oracle database had a long statspack history, so I was able to write some ad-hoc queries to generate a graph of enqueue waits over time to see when the first big locking problems really started, and I was also able to find some large delete statements on TABLE_A.  The archive log history showed a huge spike at the time of one of these deletes, and so it looked like the theory was right.  This data helped to jog a few memories and we learned that 90% of TABLE_A’s data had been deleted around the time the ‘slowness’ started.

I can hardly do the explanation justice here, but if you know how Oracle LOBS and indexes work in general, then this may make sense:
LOB segments don’t use undo like everything else, but instead mark old chunks of LOB data as re-usable and write over it a later time when an insert or update needs some space.  These re-usable chunks are tracked by the LOB index – the same LOB index that tracks the current chunks of LOB data.  The current data is in the left side of the index and the re-usable/undo/consistent read data is in the right side.  When data is deleted, (or updated), the old LOB locator entries are deleted from the index and new re-usable index rows are entered into the right side of the index.  The emptied leaf blocks on the left side of the index are reclaimed and used for leaf blocks during inserts, while the “re-usable” index entries are removed from the tail of the right half of the index with each insert.  After a big delete (or many small ones in a row) the freelist is full of blocks from the left side of the index… so as data is inserted again over time, a gap opens up with empty leaf blocks between the left and right sides (used and re-usable) of the index.  The empty blocks are on the freelist but not removed from the index structure until they are taken from the freelist.
So… when an insert happens it checks for re-usable space.  The index is range scanned for the first free chunk.  (Re-usable space is indexed chronologically).  Unfortunately,
this process takes a lot of CPU time to complete when there are large numbers of empty leaf blocks in the index and multiple LOBs in the same table; over a minute in the example in this blog.  Sometimes the delay can be smaller – just a few seconds, and so even harder to identify.

A dump of the index showed the problem clearly, (although it helped to a dump of a few of the index blocks at key points to be sure where the current data index entries stopped and where re-usable chunk list started).

A rebuild of the LOBs for TABLE_A resolved this problem.    I set the LOBs to cache before rebuilding them to speed the process up.

Moral of the Story

My recommendation to prevent this from happening again was the usual ‘rebuild/repack database segments after mass deletions, but even better is to purge in small amounts regularly’.  This case was an extreme example of why that is important.

A performance tuning lesson emphasised here is to use a multiple diagnostic tools, deductive reasoning and research to identify the real problem so that the best solution can be found and a strategy for preventing its re-occurrence can be designed.  Also, tuning LOB related activity can be tricky for those relying on the standard tuning tools and statistics.  In a future post, I’ll give another example of how LOB activity cause performance problems that aren’t obvious.

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.