Doug's Oracle Blog

  • Home
  • Papers
  • Books
  • C.V.
  • Fun
  • Oracle Blog
  • Personal Blog

Aug 20: Time Matters - DB Time

[In retrospect, the title of that first blog post might have suited the subject, but doesn't translate too well for subsequent related blog posts. That was a lack of planning or foresight on my part. These blog posts are tumbling out of my head in a fairly incoherent way. Maybe when they're finished, I could revisit them and write them up in a single, sensible article! In the meantime, I've retitled the previous post.]

The previous post might have given the impression that you shouldn't pay too much attention to the values in the "Time (S)" column of the Top 5 Timed Events section of a Statspack or AWR report*. That's not true, particularly when comparing system workload between two different periods. (I was just building up to this slowly ;-)) In fact, although the values in that column are not "DB Time" they are all components of it and, just as an increase in the values in that column indicate increased workload or decreased performance during the two different periods, so does an increase in an Instance's DB Time.

As Chen pointed out in her comment on the last blog, there's a direct relationship between the number of sessions that were running during the reporting period and the total time values. No matter how busy the system, the total time available can not exceed the maximum number of concurrent sessions * wall clock duration that they were running. That's a hard limit, but there are other relationships, too. If you think about it, as the number of active concurrent sessions increased from one to four so did the database instance's workload as more sessions were either working or waiting for something and, if the database is working on something then that implies the application is waiting for it to complete! In all likelihood on a small laptop with limited resources, the actual time spent waiting on individual events would increase as well, as contention for resources increases. Both an increase in the duration of events and an increase in the number of different sessions running or waiting on events will increase DB Time.

Here's a definition of DB Time that you'll see in various presentations from the Oracle guys who worked on this. (e.g. John Beresniewicz's excellent presentation "Average active sessions: the magic metric?" which is available on Kyle Hailey's website. I've never seen JB give the presentation, but I love the slides.)
Database time is total time spent by user processes either actively working or actively waiting in a database call

The same presentation describes the components of DB Time as
Time spent in the database by foreground sessions
Includes CPU time, IO time and wait time
Excludes idle wait time

In the context of a Statspack or AWR report, the top 5 timed events section is showing you the top 5 contributors to DB Time. (In fact, what I find most interesting about the latest versions of the Statspack report is the increased focus on time. For example, the first section of top SQL statements by resource consumption is ordered by DB CPU, followed by the SQL ordered by Elapsed Time, before we get anywhere near Logical or Physical Reads. Regardless of whether you're licensed for the Diagnostics Pack and have access to ASH, AWR and ADDM, you can still take advantage of some of the instrumentation improvements - Event Histograms, for example - and it's clear that Oracle is increasing its focus on time.)

Note that, although I'm choosing to write about aggregated system-wide values as a performance indicator because it's one convenient use of DB Time (particularly when comparing two different periods of time that might be seperated by months or when you have hundreds of systems to support or, most important, if you're writing a tool like ADDM.), I don't think there is anything inherently system-wide about DB Time. If you insist that the only valid performance analysis is that carried out at the session level then, make no mistake, DB Time is recorded for each individual session. It can be used as the input to response time tuning and is really just the Oracle Servers recording of the R in R=S+W. i.e. it's equally useful for Response Time tuning at the session level. In fact, I wish it had been called DB Response Time but I understand that even mentioning 'Response Time' might have led to the argument that the Database Server is only one component of the user's end-to-end response experience and that Response Time doesn't make sense as a term for data that's going to be aggregated for multiple sessions in some cases. (i.e. Can an entire instance have a aggregated Response Time?)

DB Time is the most important of the various Time Model Statistics, which break down the Service component of R = S + W into more detail. Here are the Time Model statistics from the Statspack report for the single-user test. (Don't forget that, as always, this is just reporting underlying statistics. In this case, these statistics are also exposed via v$sys_time_model and v$sess_time_model)

Time Model System Stats  DB/Inst: TEST1020/test1020  Snaps: 22-23
-> Ordered by % of DB time desc, Statistic name

Statistic                                       Time (s) % of DB time
----------------------------------- -------------------- ------------
sql execute elapsed time                           282.9         88.0
DB CPU                                              57.4         17.9
PL/SQL execution elapsed time                        7.8          2.4
parse time elapsed                                   6.2          1.9
hard parse elapsed time                              4.6          1.4
hard parse (sharing criteria) elaps                  2.3           .7
hard parse (bind mismatch) elapsed                   0.6           .2
PL/SQL compilation elapsed time                      0.4           .1
connection management call elapsed                   0.0           .0
repeated bind elapsed time                           0.0           .0
sequence load elapsed time                           0.0           .0
DB time                                     321.5
background elapsed time                            249.4
background cpu time                                  2.3

... and here are the Time Model stats from the Statspack report for the four-user test

Time Model System Stats  DB/Inst: TEST1020/test1020  Snaps: 24-25
-> Ordered by % of DB time desc, Statistic name

Statistic                                       Time (s) % of DB time
----------------------------------- -------------------- ------------
sql execute elapsed time                           779.5         64.8
DB CPU                                             107.7          9.0
PL/SQL execution elapsed time                       15.7          1.3
parse time elapsed                                   2.8           .2
hard parse elapsed time                              2.2           .2
hard parse (sharing criteria) elaps                  1.1           .1
hard parse (bind mismatch) elapsed                   1.1           .1
connection management call elapsed                   0.1           .0
repeated bind elapsed time                           0.0           .0
sequence load elapsed time                           0.0           .0
DB time                                   1,202.4
background elapsed time                            408.3
background cpu time                                  2.7

(For someone who asked me this question recently, DB CPU is just the value you would see in the CPU row of the existing top 5 timed events section although I understand that CPU measurement has been improved significantly in recent versions.)

In this case, there's a relationship between the number of people trying to do something and the total amount of DB Time consumed for a given wall clock duration. If we divide DB Time by the Wall Clock time, we're left with the average number of active sessions during the period. If I apply this first to the single-user test, using the values from the Statspack report :-
Elapsed Time (from report header) = 5.95 minutes = 357 seconds
DB Time (from Time Model Statistics) = 321 seconds
321/357 = 0.89 Average Active Sessions

and, for the four-session test :-
Elapsed Time = 5.5 minutes = 330 seconds
DB Time = 1202 seconds
1202/330 = 3.64 Average Active Sessions

Clearly, the instance was a lot busier during the four session test.

Why weren't the values exactly 1 and 4? Well, for starters, the Statspack reporting periods were slightly longer than the test run period as I executed snapshots manually in a sqlplus session. Therefore there was a period of time in both reports when the server was dead quiet. There is also the application overhead of Swingbench running the benchmarks and submitting the calls to Oracle. Oracle can only sensibly record the time when it's actually doing work on behalf of the application, not when the application is doing it's own thing, processing the results of the last database request. None of the database sessions in this test that were performing many small transactions was active for 100% of the time.

But that's only one part of the relationship. In this case, the instances workload has increased because more users are working at the same time, but there's more to performance problems than that. How about a different example? Imagine a single user is running a single query that takes 2 minutes to complete. The session has been connected for 20 minutes, most of which has been idle, waiting for the user to submit the query. If we look at the session-level information (using v$sess_time_model), we'd see something like this.
DB Time = 120 seconds

i.e. DB Time shows us 'the Oracle bit' that we might be able to tune. The goal of the DB Time Performance Method that Graham Wood presented at last year's UKOUG conference (amongst others) is to reduce the amount of DB Time taken to deliver the same results. So, how can we reduce DB Time here? By making the query run more quickly, whether it's through tuning it to do less work, or increasing the efficiency of that work by reducing bottlenecks. Regardless of *how* I improve the performance of the query, let's say I happen to make the query run in 50 seconds.
DB Time = 50 seconds

The end user's experience has improved.

There's a lot more I could say about DB Time. Like all of the best performance concepts or methods (e.g. YAPP, Method-R) it can seem so obvious as to not be worth saying, but contains an enormous amount of common sense and technical rigour. I suppose one important aspect of DB Time that I would highlight is that it's a common currency for ADDM. If you were to write an automatic performance diagnostic tool, what would be your goal? To reduce the time that any particular action takes, whether that's by eliminating a bottleneck at the server level, or in the instance configuration or a particular part of the application code. By combining ASH and AWR data using DB Time as the key measurement, ADDM can focus on the actions that will deliver the most significant response time reductions, whether they be more CPU or disk resource, tuning individual SQL statements or eliminating locking issues. Because DB Time can be aggregated at many levels (e.g. Session, Service, Instance) , it can be used in a number of different ways, depending on what data is available and what the goal of the optimisation exercise is.

DB Time - it's the future ;-)

In the meantime, here are the slides from a couple of other presentations that cover DB Time in much more depth than I have here. Until Graham Wood starts that blog of his (hint, hint) these are the next best thing.

http://www.oracle.com/technology/products/manageability/database/pdf/ow07/diag_techniques_presentation_ow07.pdf

http://doug.nl/downloads/OGH20080410_GRAHAM_WOOD.pdf

Updated later - this article from Chris Foot on dbazine.com looks at v$sys_time_model and v$sess_time_model in more detail.

* Of course, some people would say that you shouldn't pay too much attention to any values in Statspack reports, but they've worked so many times for me that I'll stick to my view that they're not worthless. In any case, you have a balancing contrary view.
Posted by Doug Burns Comments: (8) Trackbacks: (0)

Aug 17: My Openworld Schedule

Knowing myself, I doubt it will end up quite like this, but I thought I'd throw together an approximation. (I must say, the schedule builder this year is a significant improvement on previous years. It doesn't have a jet-lag adjustor, but you can't have everything.)

Sunday, Sep 21

10:00-11:30 Building an Oracle Real Application Clusters Environment on VMWare for Free (IOUG)
13:00-14:30 Under the Hood of Oracle Clusterware (IOUG)
16:00-17:30 Storage: A New Paradigm for Databases (IOUG)
19:00 - ?     Blogger Meet-up

Monday, Sep 22

11:30-12:30 Undocumented Oracle DBA Utilities: Black Magic for the Oracle Expert
13:00-14:00 Simple Tips to Quickly Find Answers to Your Oracle-Related Questions and Keep Your Oracle Skills Up to Date
14:30-15:30 Oracle Real Application Clusters SIG: Oracle Grid Expert Panel
16:00-17:00 Go Behind the Scenes of Oracle’s On Demand Data Centers and Global Operations
19:30-23:00 OTN Night

Tuesday, Sep 23

09:00-10:00 Inside Oracle Database 11g Optimizer: Removing the Mystery
11:30-12:30 Advanced Performance Diagnostics: What the GUI Doesn’t Tell You
13:00-14:00 Current Trends in Real-World Database Performance
15:00-16:00 ACE presentation at the Unconference
17:00-18:00 Top 10 Things You Wanted to Know About Automatic Storage Management but Were Afraid to Ask

Wednesday, Sep 24

09:00-10:00 Keep Auditors Happy for Both Oracle Database and SQL Server with Oracle Audit Vault
11:30-12:30 Time-Travel Through Your Data with Oracle Total Recall
12:00-13:30 OTN Lounge - ACE Office Hours
13:00-14:00 SQL Tuning Roundtable with the Experts
20:00-00:00 Appreciation Event*

Thursday, Sep 25

09:00-10:00 Advanced Oracle Troubleshooting: No Magic Is Needed--A Systematic Approach Will Do
10:30-11:30 Oracle ACE Director Birds-of-a-Feather Tips and Techniques Panel
12:00-13:00 Real-World Database Performance Techniques and Methods
13:30-14:30 Real-World Database Performance Roundtable

There are all sorts of reasons behind these choices. Some are about products I'm interested in, some are panel sessions (in which I often enjoy the questions as much as the answers) and some are because I know the presenters are good. You'll notice a distinct lack of Web 2.0 or 'Futures' presentations, though ;-) I'm an incorrigible dinosaur.

I'm also hoping to be a bit more involved personally this year. First, there are the Oracle ACE 'Office Hours' mentioned by Tim Hall in his blog. The ACE organisers sent out an email suggesting that it might be best to have pre-set times for ACEs to hang around the OTN lounge in case anyone actually wanted to talk to them ;-) (about Oracle stuff, presumably). Well, by the time I'd got there, the one slot that seemed to fit in with my plans was already occupied by various Oracle performance experts but I reckon I might learn something by hanging around anyway ;-) It'll be interesting to see how that goes.

Second, I'm thinking about presenting at the Unconference, but I'm not sure yet. There are a couple of existing presentations that might fit the mood and tone of an Unconference but, like last year, I might hang fire to see if new faces step forward first.

* Scoop! Elvis Costello - could be worse. UB40 could even be alright if they were the band I liked as a teenager, but I expect it'll be a greatest hits set :-(
Posted by Doug Burns Comments: (2) Trackbacks: (0)

Aug 14: Time Matters - An Infinite Capacity for Waiting*

When I was teaching the 10g Performance class at my last customer site, I remarked that the total number of seconds you see in the "Top 5 Timed Events" section of a Statspack or AWR report will often be greater than the total number of seconds between the snapshot intervals. For example, this is the "Top 5 Timed Events" section of a Statspack report covering two snapshots that are 5 minutes 30 seconds apart, either side of a 5 minute Sales Order Entry benchmark that I ran using Dominic Giles' Swingbench utility.

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read                          5,814         461     79   38.0
log file sync                                    4,246         421     99   34.7
log file parallel write                          3,222         109     34    9.0
CPU time                                                       108           8.9
db file parallel write                           1,718          60     35    4.9

(Note that this is from a 10.2.0.4 Statspack report, but you would see something similar in all AWR reports and Statspack reports that include CPU time in this section.)

Add up those seconds in the Time column and, even just for the top 5 events, the total is 1159 seconds or 19 minutes and 19 seconds. How can the system have spent 19 minutes and 19 seconds using CPU or waiting on various events when there were only 330 seconds in the reporting period?

The person who asked me to clarify this after the training has seen a few Statspack reports in his time and understood that there is more CPU time available in a given period than the Wall Clock period because servers often have multiple CPUs. But, in this case, I ran the test on my laptop which has a dual-core CPU. Therefore the maximum CPU time available between these snapshots was 660 seconds, or 11 minutes.

No, the real reason is that Statspack is reporting system-wide event timings for more than one session and then aggregating them. In this case, there were four user sessions running the SOE benchmark during the period. Given that each user and background session is instrumented and clocking up both service and wait time, aggregating the data for all of the sessions will record more time than has passed if you look at the clock on the wall.

This is such an intuitive idea to me that I hadn't thought to explain it properly during the course, so I set a mental reminder that I would blog about it soon. Having said that, this has been written about by others on many occasions (*not least by Cary Millsap and Jeff Holt in Optimizing Oracle Performance. In fact, I stole the title of this blog post from their book - see page 215 for more information.) Therefore, if this post makes complete sense to you so far or you've read any of those sources before, continued reading will probably be a little boring. If not, this post is especially for you.

So let's revisit the tests. I ran two 5 minute tests using Swingbench's supplied Sales Order Entry benchmark, but I could have used anything for this. The important thing is that the first test consisted of one session running the SOE application, while four sessions were running during the second test.

Here is the top 5 timed events section of the Statspack report for the single session test.

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read                          4,818         175     36   46.3
CPU time                                                        57          15.2
db file parallel write                           1,085          42     39   11.2
log file sync                                      821          38     46   10.0
log file parallel write                          2,702          28     10    7.5

... and from the Statspack report for the four session test.

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read                          5,814         461     79   38.0
log file sync                                    4,246         421     99   34.7
log file parallel write                          3,222         109     34    9.0
CPU time                                                       108           8.9
db file parallel write                           1,718          60     35    4.9

So the two tests were running the same application, on the same hardware and for the same duration. Yet the values in the Time (s) column are much bigger when more sessions are running. The total for just the top 5 timed events is 340 seconds compared to 1159 seconds. (The average wait times have also increased in some cases, because the system is busier, and different events become more significant as concurrent sessions contend with each other. There are various things we could try to deduce from this section, particularly as this was a controlled test, but we'd need more supporting information. Let's just stick to the apparently increased time.)

The problem here, if you see it as a problem, is that aggregated data is being reported. Statspack is not the only example of this, though. Tracing Parallel Execution tasks and then aggregating the trace files using trcsess into one consolidated trace file will also result in variable timings, depending on the Degree of Parallelism. I talked about this in slides 33 to 37 of an earlier presentation. In fact, I can make the Time (s) values as big as I want to, just by running the SOE benchmark with more and more sessions.

That's why the % Total Call Time is a far more useful column. It tells you the system-wide percentage contribution of each event to any performance problem over the given period so that you can focus on the most significant contributors. It should go without saying that looking at session-level statistics is more specific and therefore more reliable but that's not always possible unless you're using ASH, Kyle Hailey's Simulated ASH, some other session history recorder or can recreate the problem session on demand, but that's another, bigger argument that I won't go into here.

So if wall clock time for those tests was about 300 seconds, then what is that bigger value being recorded in the Time (s) column? What I'm really talking about is DB Time, but I'll leave that for the next post.
Posted by Doug Burns Comments: (4) Trackback: (1)

Aug 13: Book Review - Oracle ASM

[Disclaimer - this is a review of a free copy I received from Oracle Press in the UK]

Well, it turned out that I had requested the ASM book (or "Oracle Automatic Storage Management: Under-the-Hood & Practical Deployment Guide", to give it it's full title) as one to review. It turned up a couple of months ago and so I tucked it away until I had some free time. The fact that I managed to read some of it on a beach in Spain is probably a good sign ;-) However, I should also point out that most of my beach reading was a 500+ page Pete Townshend biography, so I'm not a complete Oracle nut-case!

I enjoyed the book a lot more than I'm used to enjoying Oracle Press offerings, so maybe I shouldn't judge a book by the publisher. However, as ASM is still something of a rarity in the UK market, I don't have as much experience with it as others do, so I suggest you read Jason Arneil's review here, including the follow-up comments. Personally, I thought the technical content was high, other than the odd chapter that felt a little light and packed with screenshots, as Jason highlighted. I would say about 2/3 of the chapters were more than technical enough for me, a sign that some of the ASM development team were involved. but others didn't seem to have the same depth; Chapter 9 for example. Maybe it's because of the multiple authors?

I particularly liked the occasional O/S-specific hints (which are useful if you move between different operating systems as part of your work), the introduction to storage technology (which I can see being useful to new DBAs) and the technical details in the latter chapters that Jason highlighted, including the destruction of various ASM myths. Chapter 12 also raises, discusses and answers some of the organisational obstructions to implementing ASM in the type of large companies I usually work for. I'm convinced these are the main reason for the lack of ASM take-up I've witnessed, so maybe I'll carry a copy of the book around to future sites, in case it helps with those arguments.

All-in-all, a worthwhile book from my perspective and one that I'll actually be keeping and can see myself referring back to occasionally. I'd recommend it, particularly if you have a passing interest in ASM and want to know more about how it works.

P.S. I also liked the Hitchiker's Guide and Spinal Tap references ;-)
Posted by Doug Burns Comment: (1) Trackback: (1)

Aug 11: Advert - More Seminar Dates

As previewed by Kurt on his blog, I'll be giving the 10g Performance seminar for Oracle Belgium on the 16th and 17th October (Oh, and thanks for the kind words, Kurt). The reason I didn't mention it before now was because I needed to update the agenda to reflect the course a little better. The previous version was cut short but it's been fixed now with Ramona's help. Let's face it, it's not the greatest sales pitch, but will hopefully give people some idea of the topics covered. (Actually, it looks like the link in Kurt's blog is broken as a result.)

I discovered the updated agenda had been posted because a little Belgian bird told me that he'd spotted a Google Ads link for the course. If you go to a Google Belgium search page for Doug Burns, there it is. I hasten to add I'm not in the habit of searching for myself on Google Belgium ;-)

I'll also be presenting the same seminar in Austria on 15th and 16th December and the registration page for that one should be available shortly. (Updated later - here is the link for the Austrian dates)

But, for now, I'm going to use some much needed time at home to implement some improvement ideas I had after the Prague seminar. There might be a technical blog or two after that, because I have a few of those outstanding too.
Posted by Doug Burns Comments: (0) Trackbacks: (0)
« previous page   (Page 1 of 127, totaling 635 entries)   next page »

Comments

Jeroen about Time Matters - DB Time
Sat, 23.08.2008 01:42
Thanks Doug and JB for your explanation. This is a really interesting topic and [...]


JB about Time Matters - DB Time
Fri, 22.08.2008 18:44
The Time Model stats are notoriously difficult to understand because there are [...]


Doug Burns about Time Matters - DB Time
Fri, 22.08.2008 07:16
Looking at what the various Time Model statistics mean and how they relate to [...]


Links in Comments

It's a minor source of frustration to me that you can't just paste a Hypertext link into the comments form here but, should you ever want to include a link, all you need to do is use the BBCode format, as mentioned below the comment form.

Here is a link to the relevant part of the document that explains how.

Bookmark

Open All | Close All

Syndicate This Blog

XML RSS 2.0 feed
ATOM/XML ATOM 1.0 feed
XML RSS 2.0 Comments
Feedburner Feed

Powered by

Serendipity PHP Weblog

Design by Andreas Viklund | Conversion to s9y by Carl