Dec 24: The Reality Gap (4) - It's never the SAN
It's never the SAN.
Here's a true story. Rest assured that I've watched the same story develop several times so there's no possibility you can identify an individual client. In fact, the thing you're most likely to recognise is a similar experience of your own

- The business has problems with the overnight batch schedule on a significant system. The problem is that the execution times from night to night are unacceptably variable, for similar data volumes and identical code. The batch window is already tight, so this leads to the system being unavailable during critical hours. The business is upset, particularly as they are fined if they don't deliver certain transaction messages within set deadlines.
- The first port of call is the DBAs because everyone assumes there's a problem with the database. Despite the usual grumbles at this point, I have some sympathy with that perspective because it's a key component of the overall system that's both critical and quite opaque outside the DBA team. Besides, we have the best instrumentation so are likely to be able to help.
- We look at Statspack or AWR reports (this is a complex batch with hundreds of jobs in multiple job streams - a little difficult to trace) and notice that the average single block read time varies from night to night and there is a close correlation between the nights when i/o performance is poor and the batch jobs take longer to run. The vast majority of wait events are related to i/o. At this stage, we ask the O/S or Storage guys to take a look at our numbers and investigate i/o performance. They do so and come back with "no, everything looks fine to us". Which is a little tricky for me to deal with, given that you've demonstrated single block read times over 30ms! Still, what can you do? You ask for expert help and the experts tell you everything is ok.
- The next step might be to trace a few of the batch jobs to show the individual wait times. These look even worse - some are as high as 90m/s! We start to look at filesystem configuration options, reducing the i/o workload and several other bright ideas as we desperately cast around for a solution.
- This part of the story lasts for as long as the DBAs and Storage guys want to make it last. At some point, however, the business has had enough of this nonsense and calls in the storage vendor.
- This part saddens me the most. I haven't met every employee of every storage vendor and so I'm sure there are bright guys there (maybe I've been unlucky) but, invariably, there is 'no problem' with the storage. Again. There never is. More to the point, they join in the general finger-pointing in the direction of the DBAs, start asking for initialisation parameter values (guaranteed to drive me daft, is that one, because it's just a knee-jerk reaction) and explain to the business that the back-end infrastructure is working well. Of course, the last thing the vendor is going to do is to explain that the system that they helped the business to specify (typically on cost per Megabyte, rather than bandwidth) is under-specified. Oh, and if you've spent three million quid on something nice, big and shiny, the last thing you want to hear about are its limitations.
- Now at this point of the story, you
might start postulating theories which 'prove' it might not
be the SAN and I've had a few extended conversations on this subject with sys admins (Hi, Mike
), but here's the clincher. A true clincher.
- Coincidentally, new SAN infrastructure is due for deployment. When the database is moved to the new infrastructure, single block read times are reduced to single figures and are consistent. The performance problems are completely solved. "Ah", say the Storage guys, "that's just because not all of the databases havet been moved yet. When they are, you should expect to see similar performance levels as before."
- All of the databases are eventually moved and the new improved performance levels are maintained.
My cuddly toy mate, Flatcat, could analyse this situation and see it for what it was (and he's not exactly a computer expert).
It *was* the SAN and I don't enjoy wasting 9 months of my life debating it with people who I'm looking to for expertise, not hand-waving. If this was a one-off story, I might not mind, but I sometimes feel like I'm going round in circles.
The last part will discuss the problems caused by The Reality Gap.
P.S. Despite the tone of this post (which has been kicking around my brain cells for a while) I'm looking forward to Christmas more than a middle-aged man should. So, I'm not feeling grumpy and you'll forgive me if I log off now and pick up any comments later. Then again, I'm not expecting too many comments for a few days ....
#1 - Yas said:
2007-12-24 19:31 - (Reply)
In fact, the thing you're most likely to recognise is a similar experience of your own
This is one of the most common responses I get from the storage guys. It is never the SAN, all the numbers look fantastic in the SAN level
#2 - Chen Shapira said:
2007-12-24 21:39 - (Reply)
Ah, I've been there as well. Familiar indeed.
I've actually been on an even nicer situation, where an entire system had to be moved to a cheaper, slower storage temporarily as part of data center moved. Everyone was really surprised to see that the cheap, slow system outperformed the fast one significantly, resolving one of our most persistent performance issues. Exactly the same one that the vendor guaranteed was 100% not related to storage.
We also seen similar issues with the network. RAC has significant performance issues? The interconnect is just fine, the network admin promises. Except that a day later we discover that it is a 100M network, or that storage traffic is also using the interconnect interface.
I think that SAN is truly too complicated for anyone to properly debug.
#3 - Ben 2007-12-25 09:18 - (Reply)
Applies to NAS as well. That's why I have non-Oracle scripts which does read/writes on a regular basis to establish a baseline. If a go-slow happens (and they do) and they say it's Oracle, I flash out the timings from the non-Oracle scripts and say "No, it's not!".
#4 - Frits Hoogland said:
2007-12-25 11:18 - (Reply)
This sounds so familiar.
If this is shivering your spine, read my comment at alex's story about storage QoS
#5 - Markus Perdrizat said:
2007-12-25 11:31 - (Reply)
The situation is well known across the industry, this obviously also happens at our site. Now I'd like to hear suggestions on how the situation can improve.
Do we have to learn to talk the SAN talk?
Do SAN admins have to get smarter?
Do DBAs have to get smarter?
Does Oracle or do the DBAs have to provide better scripts/tools?
Do the SAN vendors have to provide better tools?
#6 - Pete Scott said:
2007-12-25 16:34 - (Reply)
One of the key problems about SANs is the way they have been sold as data buckets to business people; a "one size fits all" way to store your emails, word-processing, SAP OLTP, Data Warehouse and even serve up your website; often RAID5 configured for economy (less disks) and with a shed load of cache to speed things along. The problem with that is there is not a single ideal access path to get at all of the data, a point I emphasise in my data warehouse design course. OLTP users dart all over the place to get a few rows from here or there, document storage users often only deal with one or two smallish documents at a time (a few megabytes to read and then relatively little action until the next document is accessed, but data warehouse users read masses of data then write it back to disk to sort it (not all DW sorts can happen in PGA) which more likely than not floods the SAN cache.
Understanding how the data is accessed is essential in designing a mixed workload SAN - knowing where it is more important to have volume of storage and where it more important to have IO resource available.
#7 - Daniel Fink said:
2007-12-25 17:57 - (Reply)
I think this problem is pretty common, what is uncommon is the DBA team (including management) who are willing to not take "There is nothing wrong with ." response from other teams. I have seen single block read event times in excess of 5 seconds on a regular basis. Averages in excess of 50 milliseconds. And the response "We see no problems with the SAN."
It is not always just the SAN, it is the whole infrastructure stack from the instance/database to the actual disk. An architecture that uses a single controller for all servers to access all SANs is not exactly high performance. But it does meet the "It's simple, so it is best." expectations of management. No consideration of performance...until the users start screaming. And then it is almost too late.
#8 - John Flack said:
2007-12-26 13:16 - (Reply)
Similar problem - client/server system, performance great at headquarters where the server was accessed via the LAN. When the system was installed in the regional offices - connected to headquarters on a WAN, performance was horrible. It's the WAN, I said. Took us months to PROVE that it wasn't the database or the application - it was the WAN.
#9 - Michael Hagmann said:
2007-12-26 14:22 - (Reply)
that's exact what I hear every time: The Storage has absolut no problem all green. There must be a Problem with the Server ( I'm from the Serverteam ) or with Oracle.
What I also hear often is: "When we would use the PowerPath Driver than all will be better". The Problem is, we run SAP ontop of Oracle and SAP don't allow us to use any non open Driver ( with Linux ) like PowerPath, we can only use devicemapper multipath.
#10 - Doug Burns said:
2007-12-27 17:48 - (Reply)
Then again, I'm not expecting too many comments for a few days ....
Well, that shows how much I know
Although I confess to having a quick glance at the comments on my phone during the occasional quiet moment on Christmas Day, I was determined not to login and respond. So here's a batch update.
Chen said ....
Everyone was really surprised to see that the cheap, slow system outperformed the fast one significantly
That's an interesting and common mistake - equating high cost with high quality. Never mind that the high cost item may be configured incorrectly.
Ben said ....
That's why I have non-Oracle scripts which does read/writes on a regular basis to establish a baseline.
Yes, it's a terrific idea to strip away as many possible culprits so that they can't be used as an excuse Then again, you need to do it just right, or people will always find *something* else to blame, even to the extent of discounting the blindingly obvious possibilities.
Frits said ...
If this is shivering your spine, read my comment at alex's story about storage QoS: http://www.pythian.com/blogs/759/where-is-storage-qos
Ah, yes, I meant to include a link to Alex's blog, so thanks for that (I've turned it into a link). It's a good job I mentioned this subject to several people a few weeks ago or people might think I was "inspired" by him! (I am, but only his extraordinary physical constitution )
John said ...
Took us months to PROVE that it wasn't the database or the application - it was the WAN.
I know what you mean and sometimes it's a combination of the WAN and the application.
Pete said ....
Understanding how the data is accessed is essential in designing a mixed workload SAN
Even an attempt at 'designing' a SAN (and doing our best to maintain the integrity of that design), would be a big step forward In most cases, I think we're talking about this .... "data buckets; a "one size fits all". In the end, I don't think that SAN technology is anything to do with performance for most sites - it's just a way of simplifying storage provisioning and making it more flexible. Rapidly expanding space requirements and numbers of systems probably necessitate this approach. (Actually, I might ask a more qualified friend to comment on this ...)
The cache can then resolve any and all associated performance problems!
Dan said ...
what is uncommon is the DBA team (including management) who are willing to not take "There is nothing wrong with .
I always find that it's best to enlist the business to help put together a multi-disciplined team to focus on the problem. Without business involvement, it just becomes a game of Techie Tennis.
... and, saving the best until last, because this comment recognises the need for improvements, rather than just ranting :-
Markus said ...
Do we have to learn to talk the SAN talk?
Yes, to a certain extent. Because only by understanding a subject reasonably well can we hope to discuss it sensibly. However, I don't expect to become an expert, just as I don't expect storage guys to become Oracle experts.
Do SAN admins have to get smarter?
Yes, but it depends on the SAN admin. Like most roles, it will be filled by some who are more competent than others and the DBAs I've met certainly have no special claim to be universally smart!
Do DBAs have to get smarter?
I refer you to my previous answer
Does Oracle or do the DBAs have to provide better scripts/tools?
I think Oracle already recognise that they do, which is one of the reasons why they've started to produce storage plug-ins for OEM and i/o calibration in 11g. (At this stage, I don't have enough practical experience to say how well they work, mind you.) Having said that, the wait interface is not a bad starting point in recognising the type of obvious, but common, examples of problems that I'm talking about here. In other words, we already have tools that can show us i/o response times, although I think we need to be careful that there are multiple components wrapped up in the event timings and that the timings themselves tell us virtually nothing about the cause of the problems.
There's an important issue here, though, in that we need better visibility of the entire I/O stack, because otherwise we have to rely on the interpretations of those who have their own axe to grind.
Do the SAN vendors have to provide better tools?
I'm in the unfortunate position of rarely having access to the tools. All I can do is make my request for someone to investigate and then they use the tools to do so.
However, I think my main request of SAN vendors and Storage Admins is to change the way the technology is represented to the business. It's a pretty hard sell to say something might cost more, yet deliver worse absolute performance, but they could focus on the flexibility and capacity as acceptable trade-offs. They could also emphasise the performance risks of not spending more on fibre, HBAs, additional spindles, rather than specifying down to the lowest cost per gig in order to win the business! Yeah, I know, I've moved to cloud cuckoo land now ...
#11 - Mike 2007-12-27 19:20 - (Reply)
> "I don't think that SAN technology is anything to do with performance for most sites - it's just a way of simplifying storage provisioning and making it more flexible"
*Bing* - Mr Burns wins a prize (ahhh - you _have_ been listening ). The data explosion over the last few years have made conventional storage for even a medium sized company to be intolerably difficult to efficiently manage.
Also - don't forget the requirements of storage availability.. we've all seen pretty major SAN cockups, but a well managed SAN infrastructure should allow unprecedented levels of availability when compared with traditional storage systems. This level of availability is sometimes mandated by regulatory requirements - sometimes it's driven by pure financials. Either way, there's many good reasons for SAN-ing your storage.
Note this is a pretty broad definition of "SAN". Just because the traditional view of a SAN is one or more chuffing great storage arrays, hooked up to fc switches and fanned out to hosts as appropriate.. where is it going to end? - Datawarehouse hosts already generally have their own SAN, and quite often are direct-connected, omitting the requirement for a fc switch.
So what's different between this and the traditional SAN model? - no, not the fact that we're using fibres to connect, or even the fact that there may be switches in between the storage and the host (although that may be a factor).
It's the black-box model, and the siloed support arrangements. DBAs speak to Unix Admins, who speak to SAN admins, who speak to the hardware vendor. How does the DBA, each of which speak a different language.
Once upon a time there was just the DBA and SysAdmin, and although we had difficulty understanding the grunts of the DBAs , with enough patience you could get your message through. Now you've got multiple levels of support to get your message through, and it's not easy.
> "we need better visibility of the entire I/O stack" & "Do the SAN vendors have to provide better tools?"
You see your I/O request disappearing from Oracle, and you can't trace it through to the spindle like we used to be able to, again the black-box implementation makes this difficult for true end-to-end tracing. The I/O disappears into the hba driver, and we don't see it again until it has traversed the SAN, and the result is known.
The tools are still catching up. They've got a lot to deal with too.. again, back to the data explosion - the arrays have hundreds of spindles installed, and tracking or correlating any particular I/O from a host would be a difficult task. Not impossible.. just difficult... for now.
That number in the title is not a joke or an attempt to drag out the pain. However, OraNA has been flooded with '8' posts this week, so I thought it worthwhile to post a quick list of some really nice technical posts I've noticed over the last couple of
Tracked: Jan 12, 13:33