HP NZ Database Services

2009/10/31

Schema Consolidation vs Application Separation

One of our customers is keen to reduce the number of databases they have by merging the schemata of many varying and mostly third-party applications into one (or a few) databases.  Their goal is to reduce operating costs and to use Oracle’s resource management to greater effect for applications using the same server.  However, an architect’s point of view can be very different than that of the operations staff.  Investigating problems, addressing bugs, arranging outages, tuning performance, applying upgrades, backup and recovery, etc, all would be affected by this strategy.  Our team is very aware of the operational realities of database support, and so we warn against schema consolidation as a default practice.

A few years ago, our team won the support of a different client’s Oracle environment.  This client’s previous production DBA had squashed about thirty applications into one database, and we had to deal with all of the issues arising from his legacy.  We documented them as they arose, resulting in a list of reasons to keep applications in separate databases:

Application Separation

Each application must be stored in a dedicated database to make possible:

  1. Running applications with different versions and options (think licence cost) of Oracle.  Each application will have different requirements / certification.
  2. Independent Oracle upgrades or patching.
  3. Independent scheduled database outages (eg parameter change or Oracle patch application) – higher availability, less change request effort.
  4. Different service levels for different databases, eg H/A, RAC, DR etc add cost and complexity, so may not be suitable for all applications.
  5. Independent application recovery or database flashback to a previous point in time.  (Tablespace PITR is possible in some cases.  Restrictions apply, and extra planning and resources are required).
  6. Independent backup – quicker, smaller and more flexible.  (A backup regime for TSPITR is not standard practice, flexible, robust, efficient or a complete backup).
  7. Independent test database refreshes using RMAN, storage (eg SAN BCVs) or OS tools.  (Transportable tablespace might be an option, but not a great one).
  8. Customised and efficient configuration for each application (eg cursor_sharing, character set).
  9. Reduced inter-application contention, (eg same public synonym required, tablespace name, schema name, performance problems).
  10. System testing one application’s change without having to retest other applications (that share the same database).  Eg, if an upgrade script changes performance, or shared memory usage, or recreates a public synonym or schema name or database link or revokes public privileges… will that break another app inhabiting the same DB?
  11. Higher availability by restricting the outages caused by internal errors and corruption to one application.  (Eg what happens when an undo block is corrupted?  Or constant ORA-4031s?  Higher chance with multiple apps in one DB.  DB becomes SPOF – bug, file loss, corruption, latch deadlocks, etc).
  12. Lower data dictionary / fixed view overhead.  A higher number of extents, objects, etc make some regular queries much slower and heavier on resource use.  (Affects DBAs and monitoring tools).
  13. Clarity of which database objects and settings belong to which application, and which interdependencies exist (by way of database links and the privileges of the link users).  Helps when an application is decommissioned, upgraded or migrated separately.
  14. Better security between users of one application and another.  (Some applications may be designed to give the users high database level privileges which could be exploited to access data for other applications in the same database).
  15. Lower server requirements, (smaller databases need less resources, so less powerful servers are required).

Possible Objections / Exceptions to the List Above

  • Less of the list above would be applicable when two applications essentially share the same data set.
  • Custom / in-house applications may be designed to mitigate some of the list above.
  • Some test or development schemata might be able to co-exist in one database when resources are limited and the environments can be recreated from source code or refreshed via export/import or transportable tablespaces.
  • Trivial schemata (one or two simple tables per application with little data or change) may be considered a low risk to merge into one database.
  • There will be some administration and resource overhead to having multiple databases instead of one.  Although, some management of a merged database will be more difficult – eg troubleshooting and performance tuning/investigation.  (Lots of sessions, segments, SQL etc to filter).
  • RAC allows scaling out instead of scaling up, so multiple smaller servers can still be used for a large database.  (Although RAC incurs higher costs, overhead and complexity).
  • For RAC databases, using services to partition the workload could mitigate some of the instance level contention and configuration issues.  If an instance is guaranteed, even during failover, to be exclusive to one application, then that instance could have customised initialisation parameters.
  • Using Oracle resource manager can address some of the performance / contention issues, but not all of them.  Eg, it only kicks in when CPU is maxed out, only controls foreground processes and it doesn’t manage all resources (I/O, shared pool, latches).

Feedback Welcome

We don’t usually accept comments, but this time I’m seeking feedback.  Are there any challenges or benefits you’ve found by consolidating schemata into one or a few databases?  Do you think recent database features assisted to make this strategy more successful?  Is anything missing from the list?

2009/10/03

Cases of Insensitivity

I was asked to review customisations a DBA from another organisation had made to a database when installing a new application.  He had decided to deviate from the application vendor’s instructions, which had raised concerns.

The changes that caught my attention were:

alter system set NLS_SORT=BINARY_CI SCOPE=SPFILE;
alter system set NLS_COMP=LINGUISTIC SCOPE=SPFILE;

These setting change the way Oracle compares and sorts character strings, specifically to ignore the case (upper or lower).

I remember in the old days (circa Oracle 7) I’d receive complaints about how applications that could run on different databases would be limited by a lack of case-insensitive searches if Oracle was used.  I used to suggest that the design could have met this requirement by storing an extra column that could be used for searches.  (Eg, all characters converted to uppercase, with spaces, apostrophes and hyphens removed, etc).  In more recent versions of Oracle, functional indexes and virtual columns could be considered instead.

While it’s great that Oracle now provides the functionality to allow case-insensitive comparisons and sorting, one should research the feature and understand the ramifications before using it.  There is plenty of information about NLS_SORT and NLS_COMP on Metalink, blogs and the documentation, so I’ll be brief.

If the application doesn’t support non-default NLS_SORT and NLS_COMP settings, then they should not be used.  The settings in this example can result in indexes not being used, or not being used as intended, because normal indexes are created using a binary sort order.  (Not binary_ci).  Sorting with values other than BINARY will be slower.  As usual, it pays to check for significant bugs before adopting a new feature, especially when deviating from an application vendor’s instructions.  In 10gR2 there are bugs that will return incorrect data when using the NLS_COMP & NLS_SORT values as set above.  (Eg, bugs 5225005, 5494008).
Metalink note 227335.1 is a good source of further information on this topic.

2009/09/26

Supplementing Application Error Logging

Database users may complain that something isn’t right with the application and request that the DBAs have a look in the database for problems.  Oracle doesn’t record every error reported to database clients in the alert log, just server related messages, (internal errors, snapshot too old, segment cannot extend, etc).  It is the application that should capture and/or report the full error stack returned to the database client, but unfortunately not all of them do.

The following is an example of this situation, in which I had to track down the cause of a problem where the application should have made that clear from its error log.

Application is Stuck – Example Case

An application owner asked me to look at the database, because an job on the application server was stuck and appeared to  be looping.  I asked for the error messages and while waiting for them, I fired up SQL*Plus.

The predominant and most unusual wait event for the recent and active sessions was “undo segment recovery”.  This happens when the session is waiting for the PMON to roll back a dead transaction.

V$SESSION showed that the SQL being executed just before the wait was:

select STATE from DUMMY where ID = :1  FOR UPDATE

By this time I had received the application error logs.  They referred to a different select statement on the same table, but the only clue was “This may be due to a transient race condition. Rolling back and retrying the transaction.”

The “rolling back” error text fitted with the “undo segment recovery” wait event I was seeing, even if the SQL didn’t match.  The application was trying to do the same thing with the same result repeatedly.

I noticed that the number of “SQL*Net break/reset to client” wait events were steadily increasing.  This indicates error messages were being generated for some database sessions.

At this point I could either try to trace from the client side, or from the server side.  I only had access to the database server, so I enabled SQL tracing for a couple of likely database sessions and soon had a trace file of a session that had been waiting on “undo segment recovery”:

=====================
PARSING IN CURSOR #4 len=52 dep=0 uid=30 oct=3 lid=30 tim=52668306764175 hv=3327540542 ad='914b1fb8'
select STATE from DUMMY where ID = :1  FOR UPDATE
END OF STMT
PARSE #4:c=0,e=941,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=52668306764164
BINDS #4:
kkscoacd
 Bind#0
  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1000000 frm=01 csi=871 siz=24 off=0
  kxsbbbfp=ffffffff7ae74f60  bln=22  avl=04  flg=05
  value=376059
WAIT #4: nam='undo segment recovery' ela= 2929424 segment#=5 tx flags=183 p3=0 obj#=27086 tim=52668309703725
EXEC #4:c=10000,e=2939555,p=0,cr=7,cu=2,mis=1,r=0,dep=0,og=1,tim=52668309703911
ERROR #4:err=1591 tim=1098267617
WAIT #4: nam='SQL*Net break/reset to client' ela= 9 driver id=1952673792 break?=1 p3=0 obj#=27086 tim=52668309704412
WAIT #4: nam='SQL*Net break/reset to client' ela= 973 driver id=1952673792 break?=0 p3=0 obj#=27086 tim=52668309705441
WAIT #4: nam='SQL*Net message to client' ela= 5 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309705502
WAIT #4: nam='SQL*Net message from client' ela= 2441 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309708092
WAIT #28: nam='SQL*Net message to client' ela= 3 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309708548
WAIT #28: nam='SQL*Net message from client' ela= 2608 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309711219
XCTEND rlbk=1, rd_only=1
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1952673792 #bytes=1 p3=0 obj#=27086 tim=52668309711650

And there is the error message!  ORA-1591 means “lock held by in-doubt distributed transaction”.

Querying DBA_2PC_PENDING showed a couple of pending two-phase-commit transactions since the previous day.  I informed the application owner, who then asked me to ‘commit force’ those transactions.  Problem solved.

Wrap Up

If the users can’t provide the DBA with an ORA-NNNN error message, then the DBA can try to capture the error message by enabling SQL tracing and searching through the trace files for “err=“.

2009/09/19

Tune-n-Tools vs Oracle

The Puzzle

A UNIX administrator contacted us (all the way from San Diego) with a problem regarding semaphore configuration on a new HP-UX 11i v3 server running an Oracle database:

The default semaphore settings were:

SEMMSL=2048   (semaphores per set)
SEMMNI=2048   (number of sets)

He had read Oracle’s installation documentation which said that the product of these two parameters must be at least the number of PROCESSES as defined in the Oracle database’s init.ora.  By default SEMMSL x SEMMNI = 4,194,304; much higher than the database’s PROCESSES parameter of 10,000.

Following Tune’n'Tools guidance, the UNIX administrator changed the parameters to:

SEMMSL=128
SEMMNI=8192

The new settings resulted in product of 1,048,576, which should support 10,000 database processes right?  So why did the database fail to start and report the following error?

SQL> startup nomount pfile="/${ORACLE_SID}/oracle/admin/pfile/init${ORACLE_SID}.ora";
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semids_per_proc failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: sskgpwcr2
ORA-27303: additional information: semids = 136, maxprocs = 10000

The Solution

There is an Oracle compilation parameter SS_SEM_MAX which limits the maximum number of semaphore sets oracle will allocate, and this normally defaults to 10.

So, the formula for success is actually Least(SEMMNI,10) x SEMMSL >= PROCESSES

For the case above: 10 x 128 < 10,000 [error], but 10 x 2048 > 10,000 [no error].

(See Metalink Doc 15566.1 for more about SS_SEM_MAX).

Metalink Doc 15654.1 ( TECH: Calculating Oracle’s SEMAPHORE Requirements) gives this guidance:

SEMMSL is an arbitrary figure which is best set to a round figure no smaller that the smallest ‘processes’ figure for any database on the system.

Which makes sense given the SS_SEM_MAX limitation, and given that the init parameter “processes” for a DB may be increased after the initial installation.

The default values of SEMMSL=2048 and SS_SEM_MAX=10 would support 20,480 processes per database, so I’d guess that not many people would have run into the SS_SEM_MAX limitation.

2009/09/13

Beyond DB Time

This post follows on from a previous one in which I used a Statspack report to show the value of the time model statistics. In this post I use some guess work to look outside the DB Time metric, to get a feel for what else might affect the response time for the application, and to gauge how important tuning the database (reducing DB Time) is from a holistic point of view.

I should point out that I’m not as familiar with AIX (the OS in this case) as I am with Solaris, Linux, HP-UX or even Windows.  I had no access to the server in question during the period for the Statspack report.  These are theories.  Make up your own mind and send us feedback (see About us page for email address) if you have knowledge that you’d like to share.

The system was described to me as “CPU constrained”.  The Statspack report shows that the database was directly responsible for only 24% of the 78% CPU utilisation (in other words increased server CPU utilisation by 19%).

Host CPU  (CPUs: 2  Cores: 1  Sockets: 0)
~~~~~~~~              Load Average
                      Begin     End      User  System    Idle     WIO     WCPU
                    ------- -------   ------- ------- ------- ------- --------
                       5.90    5.92     44.49   34.03   21.47    1.47  309.01

Instance CPU
~~~~~~~~~~~~                                       % Time (seconds)
                                            -------- --------------
                     Host: Total time (s):                  7,199.4
                  Host: Busy CPU time (s):                  5,653.5
                   % of time Host is Busy:      78.5
             Instance: Total CPU time (s):                  1,364.2
          % of Busy CPU used for Instance:      24.1
        Instance: Total Database time (s):                  4,059.1
  %DB time waiting for CPU (Resource Mgr):       0.0

OS Statistics
-> ordered by statistic type (CPU use, Virtual Memory, Hardware Config), Name

Statistic                                  Total
------------------------- ----------------------
BUSY_TIME                                565,349
IDLE_TIME                                154,590
IOWAIT_TIME                               10,579
SYS_TIME                                 245,021
USER_TIME                                320,328
OS_CPU_WAIT_TIME                       2,224,700
RSRC_MGR_CPU_WAIT_TIME                         0
PHYSICAL_MEMORY_BYTES              8,589,934,592
NUM_CPUS                                       2
NUM_CPU_CORES                                  1
GLOBAL_RECEIVE_SIZE_MAX                1,048,576
GLOBAL_SEND_SIZE_MAX                   1,048,576
TCP_RECEIVE_SIZE_DEFAULT                  16,384
TCP_RECEIVE_SIZE_MAX      ######################
TCP_RECEIVE_SIZE_MIN                       4,096
TCP_SEND_SIZE_DEFAULT                     16,384
TCP_SEND_SIZE_MAX         ######################
TCP_SEND_SIZE_MIN                          4,096
          -------------------------------------------------------------

OS Statistics - detail 

  Snap Snapshot
    Id Day Time          Load  %Busy  %User %System   %WIO  %WCPU
------ --------------- ------ ------ ------ ------- ------ ------
  1654 Tue 18 11:00:04    5.9
  1655 Tue 18 11:31:02    7.5   82.5   46.5    36.0    1.9
  1656 Tue 18 12:00:04    5.9   74.3   42.4    31.9    1.0
          -------------------------------------------------------------

This server is dedicated to the one database, and has no application component running on it.  So what was using the rest of the CPU?  Perhaps the database CPU measurements were not quite correct, (eg due to LOB OCI activity?), or perhaps the listener was using some CPU (tracing enabled?), but it seems to me there is more to it than that.  I see that 44% of the CPU busy time happened in system mode.  This could be related to I/O or network traffic (among other things).  It was subsequently discovered that the I/O was being cached by the OS, so perhaps that contributed to some degree.

Finding the cause might offer some opportunity for tuning, greater than just trying to reduce background and DB CPU time.

How about network traffic?

I already know that this application uses LOBS, (direct I/O to LOBs showed up in the Statspack report), and  LOB access can result in a high number of round trips.

                                                              Avg          %Total
                                          %Tim Total Wait   wait    Waits   Call
Event                               Waits  out   Time (s)   (ms)     /txn   Time
---------------------------- ------------ ---- ---------- ------ -------- ------
SQL*Net message from client    16,327,445    0    471,448     29     69.9

Statistic                                      Total     per Second    per Trans
--------------------------------- ------------------ -------------- ------------
bytes received via SQL*Net from c      5,432,266,525    1,508,962.9     23,262.5
bytes sent via SQL*Net to client       9,743,180,327    2,706,439.0     41,723.1
SQL*Net roundtrips to/from client         16,329,496        4,536.0         69.9

There were 16 million round trips and about 15GB of data transferred across the network during the hour.

4,536 round trips per second means 9,072 SQL*Net packets (up to 8000bytes) arriving or departing per second, which may then be broken into multiple network packets (up to 1500bytes), each potentially causing CPU interrupts which have to be handled by the OS. (“Potentially” because of interrupt coalescing).  Surely this would contribute to the CPU time system in system mode.

Further, it might show that the active database time is less significant to overall application performance than we might otherwise assume.

The total wait time was 471,448 seconds, most of which will be idle application time, but not all of it.  Some of this time is network overhead, and some will be application think time.  It is not possible to calculate the network + application time from the data in the Statspack report, but we can make a safe guess at a lower limit using the wait event histogram information.

Total ----------------- % of Waits ------------------
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
SQL*Net more data to clien  203K  99.1    .2    .3    .3    .0    .0    .0
SQL*Net message from clien   16M  16.1  29.5  25.4  19.3   6.5   1.6   1.4    .2
SQL*Net message to client    16M 100.0    .0    .0    .0    .0    .0    .0
SQL*Net more data from cli   60K  58.5    .2    .2    .2   2.3  38.1    .4

If we decide to treat any SQL*Net wait event longer than one second as idle (application and DB doing nothing), and treat the rest as time during which the user would be waiting, then we can get a lower limit from the highlighted row above.  (I’ll ignore the less significant more data waits for this example).

0ms <= 16.1% of waits < 1ms     0.161 * 0ms * 16,327,445 = 0s
1ms <= 29.5% of waits < 2ms     0.295 * 1ms * 16,327,445 = 4,816s
etc...

Taking the lower bound of each I calculate at that 45,701 sec is the lower limit.  Compare this to the total DB Time of 3,393 sec, and we can see that the time spent in the database is no more than 6.9% of the total response time of the app server + database server components.

Note that this doesn’t include any time used for communication between the application server and the user, browser processing, etc.

This extra information puts into perspective the scope of any benefit from reducing DB Time.  This information could be useful to the overall tuning effort by moving the focus to options that might help to reduce the number of round trips or make them more efficient.  (Eg bulk processing, pre-fetching, using varchar2 instead of LOB, increasing SDU, removing firewalls between app and DB, tuning interrupt coalescing parameters, application server tuning, etc).

Best of Times or Worst of Times?

When I first saw 10g’s time model views I wasn’t impressed.  The components overlapped, so they didn’t add up to 100% and there weren’t many categories.  I’d been using selected wait events + CPU statistics to approximate a breakdown of foreground response time ever since I’d first seen Steve Adams’ IXORA site.  I didn’t see anything extra offered by the time model views….. but I was wrong.

I’ll use a Statspack report, that was sent to me for analysis, to demonstrate how the 10g time model statistics can be useful.

Here are extracts from a statspack report for an 11.1.0.7 database on a dual CPU AIX server over a one hour period:

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
log file sync                                  230,767         868      4   49.6
log file parallel write                        227,285         577      3   33.0
CPU time                                                       198          11.3
db file parallel write                          17,889          26      1    1.5
SQL*Net more data to client                    203,568          22      0    1.2
          -------------------------------------------------------------

Looking at the top wait events first gives the impression that the database wasn’t very busy, and that commits and redo log I/O are the biggest targets for tuning.  However, if we then consider the DB Time and DB CPU metrics, a different picture starts to form:

   Elapsed:      60.00 (mins) Av Act Sess:       0.9
   DB time:      56.54 (mins)      DB CPU:      22.40 (mins)
Time Model System Stats
-> Ordered by % of DB time desc, Statistic name

Statistic                                       Time (s) % DB time
----------------------------------- -------------------- ---------
DB time                                          3,392.7
DB CPU                                           1,344.1      39.6
sql execute elapsed time                           862.5      25.4
parse time elapsed                                 100.0       2.9
PL/SQL execution elapsed time                       10.2        .3
connection management call elapsed                   6.2        .2
hard parse elapsed time                              3.3        .1
hard parse (sharing criteria) elaps                  0.3        .0
hard parse (bind mismatch) elapsed                   0.2        .0
sequence load elapsed time                           0.1        .0
repeated bind elapsed time                           0.0        .0
background elapsed time                            666.4
background cpu time                                 20.1
-------------------------------------------------------------

We can now see that the database was much busier than the top waits section indicated.  Remember DB Time = DB CPU + I/O time + other waits (foreground), so if all the foreground I/O and wait events were timed and captured correctly, they should add up to 2,000 seconds.  It is obvious from the top timed events section that about 1,000 seconds of I/O or other wait time is missing.

Should we just follow the usual step of jumping to the top SQL sections of the report and start investigating how to tune each of them?  In this case, the report goes on to show that the SQL captured by Statspack for the hour covers 100% of SQL executions and 97.3% of reads from the database buffer cache but only:

  • 6.4% of Total DB CPU  (foreground database sessions)
  • 3.1% of Total Disk Reads
  • 27.5% of Total DB Time  (foreground database sessions)

So how useful are the top SQL sections of the report?

I still want to know what the database was doing while it was busy, so that I can characterise the database and get a feel for where the opportunities for tuning are.

Browsing through the rest of the report, I notice a fair amount of direct I/O, mostly LOB related.

Statistic                                      Total     per Second    per Trans
--------------------------------- ------------------ -------------- ------------
physical reads                                53,787           14.9          0.2
physical reads direct                         52,626           14.6          0.2
physical reads direct (lob)                   47,504           13.2          0.2
physical writes                               61,375           17.1          0.3
physical writes direct                         8,773            2.4          0.0
physical writes direct (lob)                   8,773            2.4          0.0

Direct I/O is performed by the foreground processes, but the wait event times for direct I/O are not accurate.  (Inaccuracies vary depending on many factors, version/patch level, OS, mount options etc).  Also, LOB APIs are not associated with a user cursor (eg top SQL), so the time for some of the LOB work (and data transmission) can be hard to measure/identify.

Related notes for those interested:
268476.1 – LOB Performance Guideline – (see “CPU time and Elapsed time – not reported accurately”)
BUG 3504487 – DBMS_LOB/OCILob* CALL RESOURCE USAGE IS NOT REPORTED, AS THEY ARE NOT PART OF A CURSOR
223117.1 – Tuning I/O-related waits
50415.1 – WAITEVENT: “direct path read” Reference Note
50416.1 – WAITEVENT: “direct path write” Reference Note

Note 223117.1 says: Due to the way in which time for these waits is recorded (it does not measure the time taken to perform the I/O), their relative position in listings such as Statspack’s “Top 5 Wait/Timed Events” cannot be used to evaluate their true impact.

The DB Time metric does include the time the database sessions were performing direct I/O, so it becomes a useful sanity check when reporting on the break down of the wait events.  I like to compare (DB Time – DB CPU) to the total of the non-idle foreground wait events, to give a percentage of missing event time.  (About 30% in this case).

So…. if I assume that all the missing time (1,000 seconds) is related to direct I/O, then I can estimate that DB Time is made up of about 60% I/O and 40% CPU.

In this particular case, the DBAs supporting the database (and that sent our team the Statspack report), discovered that the database was not using direct or concurrent I/O for the JFS2 filesystem.  After testing with direct I/O and a larger database block cache, my next target for tuning (if required) would be to investigate the LOBs to see if any were suitable to be cached.  (Reduce direct I/O reads, and perform writes in the background).

Final Notes on DB Time

The first thing I do when logging into a UNIX or Linux database server is check the load.  I type “w” by habit, and compare it to my memory of usual loads for the system.  I find it a very good starting point for getting a ‘feel’ of how burdened the system is.

DB Time allows us to calculate a very simple but useful metric equivalent to load: average active sessions, calculated from DB Time / clock time.  We may write more about how we use this metric to support our customers in a future post.

While the DBA is mostly concerned with what the database is doing while it is busy, sometimes we can and should expand the scope wider than the information provided by DB Time and the time model statistics.  For example, the Statspack report example used for this post offered more useful information about the performance of the database server and the application as a whole.  (Will be discussed in a future post).

2009/08/22

When the Wrong Index Works Better

This week I saw a case where a poor choice of index may have actually improved performance…. temporarily.

An application with performance problems had been investigated and a new index had been implemented which reportedly improved reduced the relevant query’s execution time from 40 seconds to between 6-15 seconds.  However, there were still complaints about the variable length of the execution time, so I began to review the situation.

From V$SQL I could see hundreds of examples of the query that had run recently – after the index had been created.  The elapsed time for these took up to 103 seconds to complete.  The number of rows returned and physical/logical block gets also varied greatly.  I began to doubt the accuracy of the description of the new index’s impact, so I needed to verify that the new index was the best choice.

The queries were similar to this (altered for simplicity and confidentiality):

SELECT some_columns
FROM  dummy_table
WHERE FOREIGN_KEY = :b1
AND AMOUNT1 + AMOUNT2 <>0  
AND to_char(TIME_STAMP, 'yyyy-mm-dd HH24:MI:SS') <= '2009-08-18 17:03:54'
/

Each query had a literal for the TIME_STAMP comparison.
The first thing I noticed was the common developer error of putting the function on the column instead of the constant.  It should have been rewritten as:

TIME_STAMP <= to_date('2009-08-18 17:03:54','yyyy-mm-dd HH24:MI:SS');

This would allow Oracle to only convert between char and date once on the constant, rather than on the column for each row processed.  Also, it would allow the TIME_STAMP value to be used to access an index, if a suitable one existed.

These were the relevant indexes:

                                                   DIST KEYS       LEAF BLKS LAST ANALYZED
INDEX_NAME                                         (SELCTY%)   BPK (DEPTH)   (ESTIMATE %)   COLUMN (BYTES ASC/DESC)
-------------------------------------------------- ---------- ---- --------- -------------- ---------------------------------------
DUMMY_TABLE_IDX1                                   34K (0)    #### 243K (3)  15/08/09 (1)   FOREIGN_KEY (22 ASC)

DUMMY_TABLE_IDX2                                   346K (1)    108 283K (3)  15/08/09 (1)   TIME_STAMP (7 ASC)

DUMMY_TABLE_IDX3                                   25M (48)      1 192K (3)  15/08/09 (7)   FOREIGN_KEY (22 ASC)
                                                                                            TIME_STAMP (7 ASC)

DUMMY_TABLE_PK (U)                                 51M (100)     1 274K (2)  15/08/09 (1)   PRIMARY_KEY (22 ASC)

DUMMY_TABLE_IDX3 was the newly created index that was supposed to improve performance.

Although the to_char function prevented the TIME_STAMP column from being used to access the index, it could still be used to filter rows before accessing the DUMMY_TABLE to get the rest of the columns.  The DUMMY_TABLE was big and could never reside completely in the cache, so avoiding unnecessary table accesses would boost performance.  But would filtering rows based on the TIME_STAMP achieve this aim?

I am always suspicious of date_column < :date_time_value because it is hardly ever very selective.  In this case I noticed that the date and time compared to the TIME_STAMP column was always within a second or two of the actual execution time.  As one would expect from the name, the TIME_STAMP column has only past times in it.

Select count(1) from dummy_table where time_stamp > sysdate;

…returned no rows.  So, the TIME_STAMP column is of no use in an index to improve the performance of this query.

So why was there an impression that the index was beneficial?  One possible answer is that the new index is compact (dense) because it hasn’t had much/any data deleted from it, where as the index that would have previously been used by the query (DUMMY_TABLE_IDX1) had become sparse after a lot of data was deleted.  The index information above shows that the old index is 26% larger than the new index (confirmed by DBA_SEGMENTS) despite the new index having an extra column!  At first the smaller index (DUMMY_TABLE_IDX3), with less empty blocks would have been more efficient than the old one (see this post about index performance).  But, after time passes and data is deleted, DUMMY_TABLE_IDX3 will become larger and perform worse because it stores an extra column (TIME_STAMP) which adds nothing for performance because it doesn’t filter any rows.

Simply coalescing the index would have provided more benefit than adding the new index, without the overhead.  Compressing it would have improved it even more.

An alternative to the above:
Replace DUMMY_TABLE_IDX3 with a compressed functional index on (FOREIGN_KEY, AMOUNT1 + AMOUNT2), because the criterion “AMOUNT1 + AMOUNT2 <> 0″ does filter out a lot of rows for some of the queries, and so table accesses (sequential I/O) would be reduced.

2009/08/15

Reduce Processing Time by Reducing the Index

Rebuilding Oracle indexes is nothing new, but there are contrasting ideas about when this action should be taken.  Some argue that there is usually no real benefit, and others set automatic jobs to rebuild indexes regularly as a rule.

I think that each case should be evaluated on its merits.  Without going into it (the topic is well covered elsewhere) some indexes naturally tend towards 75% capacity and others >90%.  Index capacities well under 75% are usually due to deletions.  The only general rule I’d suggest is that when data is deleted, especially by a few large operations, then the related indexes might be worth checking to see if a reorganisation is warranted.

Compact (dense) indexes mean less disk space, less memory for caching the same information, less I/O and less processing (CPU time).

Rebuilding, coalescing (Enterprise Edition) or shrinking space (10g+) can be big and easy wins.  There are many cases I could describe, but here is the most extreme example.

Purging Performance Problem

An organisation approached me after they been trying to purge years worth of data from an Oracle 9i database, but found that the performance was atrocious, and they were after that elusive go fast switch.

The application had a purging function built in, (rare but so beneficial!), so the code couldn’t be adjusted.  There were no application design changes possible, eg partition by time and efficiently drop partitions instead of deleting.

The application purged data by asking for a period to retain and one to purge, then looping – deleting more recent data in the first iteration followed by incrementally older data in the next iterations.  Here is a simplified example of what I believe it was doing from the description given to me:

delete where data_date between :retention_date-1 and :retention_date;
commit;
delete where data_date between :retention_date-2 and :retention_date;
commit;
delete where data_date between :retention_date-3 and :retention_date;
commit;
....

This was probably designed to reduce undo space requirements.

The data_date key was indexed, which avoided tablescans.  Oracle would handle the first deletion by entering the index at retention_date – N and then (range) scanning the index until it found the retention date, deleting as it went.

The problem is obvious to those who understand how Oracle indexes work.  The deleted index entries result in more and more empty leaf blocks between the data yet to be purged and the retention date.  These empty leaf blocks are not taken out of the index structure until they are reused elsewhere (by an insert or update).  The index range scan had to access more and more of these empty leaf blocks as the purging process continued, requiring more processing time for each subsequent delete.

The solution I suggested was to punctuate the purge cycles with manual index coalesce commands to remove the empty leaf blocks.  The first coalesce of the relevant index took several minutes, but took less time after that. This is the feedback we received when the index coalesce was added to the purging process:

“Wow……..what previously took 1 hour now takes about 25s.  Yes, I can now purge a day off the table in about 25s.   This is great.  A whole week in 1 minute. and a month in 10 minutes.”

In this case, it wasn’t even necessary to access the database to find the solution.  When one does have access to the database, evidence that the index should be rebuilt / shrunk / coalesced can come from:

  • a high logical read count for a SQL statement, and
  • a high logical read count for an index segment, and
  • a low density shown by the results of analyze index .. validate structure, or
  • a low density shown by the results of the segment advisor (10g), or
  • a low estimated density from the key length x number of keys versus actual index size, or
  • a tree dump of the index showing many empty (or mostly empty) leaf blocks in the structure

Big Wins are the Easiest

Isn’t that always the way with Oracle tuning?  The really dramatic improvements are often the easiest to find and implement.  Tuning tasks that are highly restrictive, complex and only offer a small scope for performance gains may be the most challenging but the least appreciated.  Finding the “low hanging fruit” (beneficial, cheap and low risk) is justifiably what customers want first.

Here are two contrasting examples from my experience:

An Easy Win

We had just taken over support of a customer’s Oracle database from a competitor.  It was a terabyte database on Oracle 8i and Windows (32-bit).  When doing the initial audit, I was shocked to learn that their monthly billing process ran for two weeks solid!  This meant no outages, slow performance for OLTP users, and if something went wrong, they’d only just have time to run it again before the next billing process needed to start.

It was obvious to me that before we could start making other improvements highlighted in our audit, we’d need to get the billing process down to a reasonable length of time.  Sure, the platform was hardly appropriate for the amount of data and processing, but two weeks to process one batch job each month was ridiculous.

It took a few minutes to spot two inefficient SQL statements and to recommend a couple of indexes that ended up reducing the monthly billing run to less than a day.  The only problem was that the companies being billed were surprised to get their bills earlier.  ;)

This kind of miracle tuning is welcomed by the customer and looks great on a CV, but it was embarrassingly easy.

A Hard Win

In complete contrast to the previous example, this one took a lot of work for small but necessary gains.  A different customer had a legacy monthly billing process that was very poorly designed, and was taking longer each month as the data grew.  The core engine could not be altered; only tweaks were possible.  The application was due to be replaced in the new year, but until then, we were tasked with keeping it running within a window of a weekend.  (Users could not be in the system during this monthly billing process, so it had to complete within a weekend).

This application had already been through iterations of tuning, so there were no easy and quick wins left.  It took a lot of analysis and work to save 10% here and 5% there, just to hold a steady run time each month.  We tuned multiple layers: the hardware, storage, OS, Oracle parameters, data density, SQL, indexes and finally the PL/SQL packages.

Although the results were appreciated by the customer, and provided me with job satisfaction, stopping an application from slowing down doesn’t grab attention in the same way as the easy win described above.

How All this Relates to the HP Database Services Blog

Big and easy wins are often not so technically interesting or challenging, and are usually well covered in books and forums and other blogs.  Therefore, it is tempting to just write about more difficult or rare performance problems.  However, “low hanging fruit” are commonly sought after, so I’ll try to keep this in mind for future posts.

Here’s a start in that theme.

2009/08/08

Recycling or Wasting?

Pre-Ramble

My local council collects plastic, glass, paper, etc separately from the rubbish so that it can be put to good use for another purpose rather than these resources being wasted in a landfill somewhere.

Oracle databases (from 10g) collect dropped segments in a recycle bin, until the resource (tablespace capacity) needs to be used for something else.

My local council is having trouble finding a use (or buyer) for the collected materials, and so those materials are stockpiled and over time take up more and more space.  Some databases are in a similar situation.

Recent Case

A monitoring tool had been activated for an Oracle 10g database for the first time.  The monitoring tool’s database session was running for a long time and using a lot of CPU.  In fact, the next check was starting before the previous one had finished.

I could see that these sessions were doing a lot of consistent reads from memory (not much physical I/O) so high CPU usage was understandable because a lot of processing was occurring.

The SQL was referencing v$datafile, v$log, v$tempfile and dba_free_space.  It was a common enough piece of code that returned immediately on most 10g databases.  What was different about this database?  The first thing I thought to check was the number and size of the free and used chunks of space in the database.  When ordering the segments by total size or number of extents, I noticed a lot of segments beginning with “BIN$%”.  This made me think that the recycle bin had something to do with this issue.

SQL> select sum(bytes)/1048576 from dba_segments where segment_name like 'BIN$%';

SUM(BYTES)/1048576
------------------
        147092.125

1 row selected.

SQL> select min(droptime) from dba_recyclebin;

MIN(DROPTIME)
-------------------
2008-11-13:08:24:45

SQL> select count(1) from dba_recyclebin;

  COUNT(1)
----------
      9525

There are 9,525 items in the recycle bin that take up 150GB out of a total database size of 250GB!  Just think of the wasted space on disk, tape, etc.

Further reports on DBA_RECYCLEBIN showed that the application regularly drops some segments (partitions mostly).  I surmised at this point that the dropped objects are never purged, and that the datafiles are manually extended whenever the recycle bin contents fill up the tablespace enough to make the used capacity reach a certain threshold.  Note that if auto-extension had been left to handle growth, then the dropped objects would be purged as needed before the datafile was extended.  It was human intervention that was circumventing Oracle’s recycling process.

I suggested that this database should have the recycle bin purged and disabled so that the DBAs could continue to manage the capacity with their standard procedures.

A Little More for Anyone Still Interested

A “My Oracle Support” search showed that there were some bugs and problems in previous versions /patch sets regarding performance of DBA_FREE_SPACE when objects had been dropped.  Although there were no exact bug/note matches for this database’s version, I suspect that the underlying SQL was still not being executed efficiently.  Gathering statistics on the X$ views (via DBMS_STATS.GATHER_FIXED_OBJECTS_STATS) didn’t help, because one of the referenced v$ views (V$DATAFILES I think it was) had a RULE hint in it.  However, if the same SQL was forced to use CBO  then it ran much quicker.

This is the logical block accesses they’ve done so far:

SID BLOCK_GETS CONSISTENT_GETS PHYSICAL_READS BLOCK_CHANGES CONSISTENT_CHANGES

———- ———- ————— ————– ————- ——————

43        142        73903895 14224             0                  2

68        142        75150676 14573             0                  2

189        143        23221325 9066             1                  2

267        142        22997804 9059             0                  2

This is the SQL statement they are all running:

PARSING PARSING

SQL_TEXT

—————————————————————-

select  total_mb as met212_1, total_mb-free_mb as met212_2

from(select a.m + b.m + decode(c.m,null,0,c.m) total_mb, d.m fr

ee_mb    from(select sum(bytes)/1048576 m from v$datafile) a,

(select sum(bytes)/1048576 m from v$log) b,     (select sum(by

tes)/1048576 m from v$tempfile) c,     (select sum(bytes)/104867

6 m from dba_free_space) d)

CHLD      SHARE      PERST        RUN LOADED KEPT OPEN            PARSE                                            USER  SCHEMA

NUM        MEM        MEM        MEM   VERS VERS VERS LOADS INVL CALLS FIRST_LOAD_TIME     LAST_LOAD_TIME           ID      ID

—- ———- ———- ———- —— —- —- —– —- —– ——————- ——————- ——- ——-

0     153076      68608      67336      1    0    1     1    0     8 2009-07-30/11:51:10 2009-07-30/11:51:10      66      66

CHLD                 DISK     BUFFER                             ROWS

NUM     EXECS      READS       GETS      SORTS    FETCHES  PROCESSED      COST

—- ——— ———- ———- ———- ———- ———- ———

0         8 120258  549877971 64          4          4         1

CHLD    CPU_TIME     ELAPSED

NUM         SEC         SEC MODULE                                             ACTION

—- ———– ———– ————————————————– ————————————————–

0    18916.85 20740.22 SQL*Plus

Next Page »

Blog at WordPress.com.