Statistics on Partitioned Tables - Part 1

Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Feb 17: Statistics on Partitioned Tables - Part 1

If you've ever worked on large databases that use partitioned and subpartitioned tables, you'll be aware that there are significant challenges in maintaining up-to-date/appropriate statistics. We've encountered a few problems at work recently and I decided it would be an idea to put together a series of posts covering the basics of what can become quite an involved topic because it's not difficult to find yourself going round in circles reading the documentation, Oracle Support Notes, blog posts, forum threads and the rest until you don't know whether you're coming or going!

I'll steer clear of any remotely advanced angle and try to take some time to show simple, practical examples that might be useful to the great unwashed masses (like me). I'm pretty certain that everything I'm going to post has already been written about by the likes of Jonathan Lewis, Christian Antognini, Randolf Geist, Martin Widlake and others, but I want to write it in my own way that I can understand ;-) Sometimes I have a feeling when I write certain blog posts that I'm going to be discussing things which are apparently obvious to experienced people but I'm not convinced most people quite understand. I've no idea how many parts there might be because there's no plan here, but I know it's going to end up being too much for one post.

Added later - whilst digging out a link to Martin's blog, I noticed that he's planning a whole DBMS_STATS series soon. Sigh. Keep an eye out for that, because it will be as in-depth as always. I'll stick to the simple stuff here!

This is all on Oracle 10.2.0.4 running on Linux although we have several stats-related patches applied (probably more on those later) and I'll probably run the same tests on my own 11.2.0.1 installation later to identify any differences.

All of the examples will be based on the following table definition

SQL> CREATE TABLE TEST_TAB1
(
  REPORTING_DATE            NUMBER              NOT NULL,
  SOURCE_SYSTEM             VARCHAR2(30 CHAR)   NOT NULL,
  SEQ_ID                    NUMBER              NOT NULL,
  STATUS                    VARCHAR2(1 CHAR)    NOT NULL
)
PARTITION BY RANGE (REPORTING_DATE)
SUBPARTITION BY LIST (SOURCE_SYSTEM)
SUBPARTITION TEMPLATE
  (SUBPARTITION GROT VALUES ('GROT') TABLESPACE TEST_DAT01,
   SUBPARTITION JUNE VALUES ('JUNE') TABLESPACE TEST_DAT01,
   SUBPARTITION HALO VALUES ('HALO')  TABLESPACE TEST_DAT01,
   SUBPARTITION OTHERS  VALUES (DEFAULT)   TABLESPACE TEST_DAT01)
(  
  PARTITION P_20100131 VALUES LESS THAN (20100201) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100201 VALUES LESS THAN (20100202) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100202 VALUES LESS THAN (20100203) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100203 VALUES LESS THAN (20100204) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100204 VALUES LESS THAN (20100205) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100205 VALUES LESS THAN (20100206) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100206 VALUES LESS THAN (20100207) NOLOGGING NOCOMPRESS,  
  PARTITION P_20100207 VALUES LESS THAN (20100208) NOLOGGING NOCOMPRESS  
)
NOCOMPRESS 
NOCACHE
NOPARALLEL
MONITORING;

Table created.

SQL> CREATE UNIQUE INDEX TEST_TAB1_IX1 ON TEST_TAB1
(REPORTING_DATE, SOURCE_SYSTEM, SEQ_ID)
  LOCAL NOPARALLEL COMPRESS 1;

Index created.

So there is a partition per REPORTING_DATE which is sub-partitioned depending on the SOURCE_SYSTEM that sent the data. It's probably worth pointing out at this stage that the table definition and test data does not match that used in the system I'm working on, but is similar enough to illustrate the issues and is pretty similar to several other systems I've seen or worked on in the past. Speaking of test data, I'd better insert some.

SQL> INSERT INTO TEST_TAB1 VALUES (20100201, 'GROT', 1000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100202, 'GROT', 30000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100203, 'GROT', 2000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100204, 'GROT', 1000, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100205, 'GROT', 2400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100201, 'JUNE', 500, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100201, 'HALO', 700, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100202, 'HALO', 1200, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100201, 'WINE', 400, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100206, 'WINE', 600, 'P');

1 row created.

SQL> INSERT INTO TEST_TAB1 VALUES (20100204, 'WINE', 700, 'P');

1 row created.

SQL> COMMIT;

Commit complete.

With table and data created, I'll gather some statistics using default options and it's probably worth pointing out at this stage that everyone I've spoken to at Oracle is very keen that people should start off with the default options for reasons that will hopefully become apparent.

SQL> exec dbms_stats.gather_table_stats('TESTUSER', 'TEST_TAB1', GRANULARITY => 'DEFAULT');

PL/SQL procedure successfully completed.

So let's see what statistics have been gathered and focus on the simple NUM_ROWS for now.

SQL> select  table_name, global_stats, last_analyzed, num_rows
  2  from user_tables
  3  where table_name='TEST_TAB1'
  4  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     GLO LAST_ANALYZED          NUM_ROWS
------------------------------ --- -------------------- ----------
TEST_TAB1                      YES 10-FEB-2010 16:31:17         11

SQL> select  table_name, partition_name, global_stats, last_analyzed, num_rows
  2  from user_tab_partitions
  3  where table_name='TEST_TAB1'
  4  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     PARTITION_NAME                 GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131                     YES 10-FEB-2010 16:31:17          0
TEST_TAB1                      P_20100201                     YES 10-FEB-2010 16:31:17          4
TEST_TAB1                      P_20100202                     YES 10-FEB-2010 16:31:17          2
TEST_TAB1                      P_20100203                     YES 10-FEB-2010 16:31:17          1
TEST_TAB1                      P_20100204                     YES 10-FEB-2010 16:31:17          2
TEST_TAB1                      P_20100205                     YES 10-FEB-2010 16:31:17          1
TEST_TAB1                      P_20100206                     YES 10-FEB-2010 16:31:17          1
TEST_TAB1                      P_20100207                     YES 10-FEB-2010 16:31:17          0  

8 rows selected.

SQL> select  table_name, subpartition_name, global_stats, last_analyzed, num_rows
  2  from user_tab_subpartitions
  3  where table_name='TEST_TAB1'
  4  order by 1, 2, 4 desc nulls last;

TABLE_NAME                     SUBPARTITION_NAME              GLO LAST_ANALYZED          NUM_ROWS
------------------------------ ------------------------------ --- -------------------- ----------
TEST_TAB1                      P_20100131_GROT                NO   
TEST_TAB1                      P_20100131_HALO                NO   
TEST_TAB1                      P_20100131_JUNE                NO   
TEST_TAB1                      P_20100131_OTHERS              NO   
TEST_TAB1                      P_20100201_GROT                NO   
TEST_TAB1                      P_20100201_HALO                NO   

<output snipped ....there are a lot of subpartitions, all missing stats!>

So at the moment the row counts look spot-on and there are Global Statistics on both the Table and the Partitions of the table and no statistics at all on the Subpartitions. First, let's talk about global statistics. There are several good resources kicking around describing global stats so I'll list just a couple here. I always like a documentation reference and although this is the 11.2 documentation and so some of it isn't correct for 10g, I like the very simple mention of global stats given in the first two paragraphs on 13.3.1.3 - global stats are statistics on the table that describe the table as a whole, in addition to the stats on the underlying partitions. The important point is that sometimes the optimiser will use the global stats, sometimes the partition stats and sometimes both, depending on the query.  For those of you with Support access, Note 236935.1 goes into more detail.

However, our example is complicated by the fact that we have subpartitions too. So at this stage we have global stats that describe the table as a whole (including all of the underlying partitions) and global stats on each partition that describe that partition (and all of its underlying subpartitions). At this stage, let's just assume that having global stats is 'a good thing' which is why Oracle's default option is to gather them at the Table and Partition levels. In the next post I'll look at why they're important.

Why no Subpartition stats, then? Well, the optimiser is only going to use stats on subpartitions when it can guarantee that it's going to use a single subpartition and as that's probably less likely than you think, Oracle doesn't collect those stats by default, but is able to use higher level partition stats to guess what's going on at the subpartition level too. However, if you do think your queries are going to be able to drill down to a specific subpartition effectively, you can choose to gather subpartition statistics too. Beware though that, as far as I'm aware, the optimiser won't use subpartition stats at all, prior to 10.2.0.4 so there's no benefit to the additional overhead if you're running an earlier version.

In the next post I'll look at why global stats are both a good and bad thing ....
Posted by Doug Burns Comments: (11) Trackbacks: (9)
Defined tags for this entry: dbms_stats, optimiser, partitions
Related entries by tags:
Statistics on Partitioned Tables - Part 5
Statistics on Partitioned Tables - Part 4
Statistics on Partitioned Tables - Part 3
Statistics on Partitioned Tables - Part 2

Trackbacks
Trackback specific URI for this entry

Log Buffer #179: a Carnival of the Vanities for DBAs
You have found the 179th edition of Log Buffer, the weekly review of database blogs. Welcome. Enjoy your stay. We begin with . . . SQL Server Merrill Alrich gets going with a fresh juxtaposition–his thoughts on motorcycles a...
Weblog: The Pythian Blog
Tracked: Feb 19, 19:48
Statistics on Partitioned Tables - Part 2
In the last part, I asked you to trust me that true Global Stats are a good thing so in this post I hope to show you why they are, to make sure you don't kid yourself that you can avoid them. (Updated later - this is all on 10.2.0.4)Why would you even w
Weblog: Doug's Oracle Blog
Tracked: Feb 23, 04:58
Statistics on Partitioned Tables - Part 3
As soon as I'd committed my last post, I knew it wasn't what I'd hoped for and said as much to a couple of people before they'd read it. I knew it would probably just add to any confusion people already had about this subject (something I'm particularly k
Weblog: Doug's Oracle Blog
Tracked: Feb 23, 06:53
Statistics on Partitioned Tables - Part 3
As soon as I'd committed my last post, I knew it wasn't what I'd hoped for and said as much to a couple of people before they'd read it. I knew it would probably just add to any confusion people already had about this subject (something I'm particularly k
Weblog: Doug's Oracle Blog
Tracked: Feb 23, 06:53
Statistics on Partitioned Tables - Part 4
In the last post I illustrated the problems you can run into when you rely on Oracle to aggregate statistics on partitions or subpartitions to generate estimated Global Statistics at higher levels of the table. Until there are statistics for all of the re
Weblog: Doug's Oracle Blog
Tracked: Feb 28, 08:38
PingBack
Weblog: jonathanlewis.wordpress.com
Tracked: Mar 17, 07:44
PingBack
Weblog: coskan.wordpress.com
Tracked: Mar 18, 16:45
PingBack
Weblog: www.jcon.no
Tracked: Oct 27, 18:36
PingBack
Weblog: www.jcon.no
Tracked: Oct 27, 18:36

Comments
Display comments as (Linear | Threaded)

#1 - Pete Scott said:
2010-02-17 07:07 - (Reply)

Maybe the lack of sub-partition stats by default is a hang-over from the days when the only sub-partition option was hash partitioning... and queries would probably hit all (or is that any?) of the sub-partitions which were nominally the same size assuming even data distribution on the hash key.

Speaking to some Oracle DB development staff a few years back (9.2 days) it would seem that sub-partitioning for query performance rather than space management was something that came in from left-field.

#2 - Doug Burns said:
2010-02-17 07:45 - (Reply)

I think you're probably right, Pete. If I recall correctly, wasn't Hash the only option, followed by List in 9i? Subpartition stats are not going to be that useful on Hash subpartitions.

#3 - Pete Scott said:
2010-02-17 10:06 - (Reply)

Indeed hash subpartitioning was the only option in 8i

I am looking forward to the next part, Doug - Global stats are fun... and then there is that ability (and Martin Widlake recently mentioned this on his blog .. oh and in the pub the other night) of Oracle to 'invent' not existent global stats based on the values contained in partition stats for *ALL* the table's partitions.... great until someone creates a new (but empty) partition during the working day and wonders why query performance bombs

#4 - Doug Burns said:
2010-02-17 15:11 - (Reply)

Now, now, Pete. You know how this works. When I flag up a series, you're not actually supposed to post comments about future posts that haven't appeared yet! LOL

#5 - Wolfgang Breitling 2010-02-18 03:56 - (Reply)

"the optimiser won't use subpartition stats at all, prior to 10.2.0.4"

that is not entirely correct. The optimizer uses BLOCKS from the subpartition statistics, when gathered, to calculate the cost of a full scan of the subpartition. Tested in 10.2.0.3

The index cost calculations, however, are based on the partition statistics even if subpartition statistics are available.

That obviously makes index accesses relatively more expensive than a full subpartition scan and can in extreme cases lead to a different plan.

#6 - Doug Burns said:
2010-02-18 08:13 - (Reply)

Thanks for the correction, Wolfgang. I've learnt something new today already :-)

#7 - Amardeep Sidhu said:
2010-03-17 11:50 - (Reply)

Hi Doug,

Very nice writeup :-)

As per 10g documentation granularity=>DEFAULT option is obsolete now and the default option is AUTO. So shouldn't we do it with AUTO ? ;-)

Will read other parts and try to come up with more questions ;-)

Cheers !

#8 - Doug Burns said:
2010-03-17 12:24 - (Reply)

Yes, you're right. In fact, after I'd written this post, I was looking at the docs for another post and noticed it. Decided to leave it as is, though, because I'm not sure it will make much difference, but I'm glad you commented to make that clear.

Cheers,

Doug

#9 - CattleSpeed 2010-10-22 11:04 - (Reply)

Hi Doug,

I need some clarification with the Gather_stale utility when used on a partioned table. I am aware that the stats are marked as stale if 10% of the table(rows) change. what i need to know is, do partition exchanges count as part of the 10% because i only see insert,update and delete columns on the dba_tab_modifications view

#10 - Robin said:
2010-12-23 09:44 - (Reply)

For info, the link to 11.2 docs is 404

This one currently works: http://download.oracle.com/docs/cd/E11882_01/server.112/e16638/stats.htm

#11 - Doug Burns said:
2010-12-23 10:00 - (Reply)

Thanks Robin! Updated ...

Cheers, Doug


Add Comment

Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
BBCode format allowed
 
 

Statistics on Partitioned Tables

Contents

Part 1 - Default options - GLOBAL AND PARTITION
Part 2 - Estimated Global Stats
Part 3 - Stats Aggregation Problems I
Part 4 - Stats Aggregation Problems II
Part 5 - Minimal Stats Aggregation
Part 6a - COPY_TABLE_STATS - Intro
Part 6b - COPY_TABLE_STATS - Mistakes
Part 6c - COPY_TABLE_STATS - Bugs and Patches
Part 6d - COPY_TABLE_STATS - A Light-bulb Moment
Part 6e - COPY_TABLE_STATS - Bug 10268597

A couple of posts about Incremental Stats confusion

Part 1
Part 2

Comments

personal blog about Moving Sideways
Wed, 01.06.2016 18:34
That is a good tip particularl y to those fresh to the blogos phere. Short [...]
odziezprestige.pl about Moving Sideways
Wed, 01.06.2016 17:07
Please let me know if you're l ooking for a article writer fo r your site. [...]
Doug Burns about Moving Sideways
Tue, 10.05.2016 22:43
Oh, I won't give it that long unless I enjoy it ;-)

Bookmark

Open All | Close All

Syndicate This Blog

  • XML RSS 2.0 feed
  • ATOM/XML ATOM 1.0 feed
  • XML RSS 2.0 Comments
  • Feedburner Feed

Powered by

Serendipity PHP Weblog

Show tagged entries

xml 11g
xml ACE
xml adaptive thresholds
xml ASH
xml Audit Vault
xml AWR
xml Blogging
xml conferences
xml Cuddly Toys
xml Database Refresh
xml dbms_stats
xml Direct Path Reads
xml Fun
xml Grid Control
xml hotsos 2010
xml listener
xml locking
xml oow
xml oow2009
xml optimiser
xml OTN
xml Parallel
xml partitions
xml Patching
xml Swingbench
xml The Reality Gap
xml Time Matters
xml ukoug
xml ukoug2009
xml Unix/Shell
xml Useful Links

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.

Design by Andreas Viklund | Conversion to s9y by Carl