How does one show the impact of long Seek times on the total I/O response time? I took a maxtor 200GB firewire harddrive (for my linux RAC test installation) and reconfigured the hard drive to have partitions of 2 GB on the outer tracks and inner tracks (accomplished with fdisk).
Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
DeviceBoot Start End Blocks Id System
/dev/sda1 1 250 2008093+ 83 Linux
/dev/sda2 251 24542 195125490 83 Linux
/dev/sda3 24543 24792 2008125 83 Linux
Then downloaded the Reader.c program from the website of Jonathan Lewis (didn't want to write my own program). The test was done with running two processes running against the partitions. The partition size is important to overcome caching issues on the disk (and the system buffer cache). I also choose raw partitions to overcome potential locking issues in the file system(s).
Here are the results:
Test 1 (run 2 processes reading from partition 1 doing a 1000 Random Reads each)
+ ./read /dev/sda1 R 8192 262144 1000 13
File name: /dev/sda1
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 13
Descriptor: 3
+ ./read /dev/sda1 R 8192 262144 1000 19
[root@linrac1 tmp]# File name: /dev/sda1
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 19
Descriptor: 3
real 0m11.323s
user 0m0.010s
sys 0m0.030s
real 0m11.536s
user 0m0.000s
sys 0m0.040s
Now we run the same test against partition 3 (inside of the disk)
+ ./read /dev/sda3 R 8192 262144 1000 23
+ ./read /dev/sda3 R 8192 262144 1000 29
File name: /dev/sda3
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 29
Descriptor: 3
[root@linrac1 tmp]# File name: /dev/sda3
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 23
Descriptor: 3
real 0m17.398s
user 0m0.000s
sys 0m0.010s
real 0m17.482s
user 0m0.000s
sys 0m0.090s
The last test (run 1 process against partition 1 and 1 process against partition 3)
+ ./read /dev/sda1 R 8192 262144 1000 33
File name: /dev/sda1
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 33
Descriptor: 3
+ ./read /dev/sda3 R 8192 262144 1000 39
[root@linrac1 tmp]# File name: /dev/sda3
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 39
Descriptor: 3
real 0m32.673s
user 0m0.010s
sys 0m0.090s
real 0m32.926s
user 0m0.000s
sys 0m0.070s
So to summarize:
Outside Partition 11.4 seconds for 2000 8K reads
Inside Partition 17.4 seconds for 2000 8K reads (50 percent slower
Inside/Outside Partition 32.7 seconds for 2000 8K reads (200 percent slower)
Average read Outside 11.4 seconds / 2000 = 0.0057 seconds (5.7 ms)
Average read Inside 17.4 seconds / 2000 = 0.0087 seconds (8.7 ms)
Average read Inside/Out 32.7 seconds / 2000 = 0.0164 seconds (16.4 ms)
So the total number of I/Os that we can do really depends on where the data is located on the disk. According to maxtor the average seek time for this disk is around 9.3 ms. It means that the full seek time will be around 16-18 ms. That seems to fit. Subtract the average outside time from the average inside/out (16.4 - 0.5) and we get 15.9 millisecond as may be 90 percent of the full seek time.
So even for the simple ATA drives, it is important where the data is stored.
Thursday, June 16, 2005
Sunday, June 12, 2005
Outer Edge of Disk
There is a thread on AskTom about where to place data on the disk. The question is why one should place data on the outside of the disk and if that helps performance. I just like to add my view on that discussion. The number of physical I/Os per Disk is limited by the (full/average) seek time. The seek time is the biggest component of the I/O time (with normal Oracle blocksize). The Full Seek Time (seeking from the outermost track to the innermost track or vice versa) can be greatly reduced by only using the outermost tracks on a disk. Why the outermost tracks? Well there is this statistic that on the 1/3 of the outermost tracks there is around 50 percent of the disk capacity stored. If I only have to seek 1/3 of the distance to reach 50 percent of the data all the time, the average seek time will be greatly reduced. If the Seek time is greatly reduced, the total I/O time is greatly reduced and we can do more random I/O operations per second. So to summarize:
1) Use around 50 percent of the disk capacity (outer 1/3 of the tracks)
2) That will increase the number of random I/Os greatly.
It doesn't matter of you use SAN, NAS or JBOD, it is the same principle. Well with SAN and NAS you may not know where your data is located :-) But that is a completely different story.
1) Use around 50 percent of the disk capacity (outer 1/3 of the tracks)
2) That will increase the number of random I/Os greatly.
It doesn't matter of you use SAN, NAS or JBOD, it is the same principle. Well with SAN and NAS you may not know where your data is located :-) But that is a completely different story.
Friday, June 10, 2005
Physical I/O
Last week I was involved in solving a performance problem for a customer that suffered from enormous performance slow downs. The funny thing was that there was a special task force assigned to this problem by the customer and they had been working since half december on this problem. That task force consisted of 22 people and some were actually flown in to help solve the problem. A consultant was asked to have a fresh look at the problem by installing Veritas i3 for Oracle (that was used succesfully used in another project). After installing the tool and getting the performance data, he quickly called me after one day to confirm his findings (basically all the wait time was I/O related) and while we spoke on the phone we checked the mount options for the file systems and guess what, the directio mount option was commented out. It was used on many other servers, but on this particular server it was not. After talking to the task force it wasn't clear why this was done. So the decision made to include the directio mount option again and at the same time change two init.ora paramters: optimzer_index_cost_adj and optimzer_index_caching. The database was bounced and a simple batch job was run to validate the performance improvement. That batch job ran before with 50-70 calculations per hour and after the changes it ran with 2500+ calculations per hour. There was a factor of 35-50 times improvement. So this was great news ofcourse, however the next day one of the other databases on the system reported a severe performance problem. And in fact a quick check revealed that it was waiting most of the time in I/O related events. A further analysis showed a couple of things:
1) The Oracle buffer cache was really small and so the File System buffer cache was heavily used to compensate for that small Oracle buffer cache.
2) It also showed that a handfull of statements were responsible for 80+ percent of the I/O workload.
3) In fact it was discovered that one statement did around 600K physical I/Os in just over 2 hours.
4) And there were a couple of other statements just like that.
So it was clear that a handfull of statements that were poorly tuned were responsible for bringing down the performance of the whole machine (when the directio mount was not used). The other team working on this database with the large number of I/Os blamed the other thing for causing a performance problem in their database (in principle they are right about this) and basically wanted the directio option removed. So the question becomes should that be done? Or should the second team fix their SQL and application so that less physical I/O is done and everybody will have good performance? My take on it is that the second team should fix their application and that would be my recommendation, but clearly that team doesn't see it that way.
And a conflict is born ....
So who should win?
1) The Oracle buffer cache was really small and so the File System buffer cache was heavily used to compensate for that small Oracle buffer cache.
2) It also showed that a handfull of statements were responsible for 80+ percent of the I/O workload.
3) In fact it was discovered that one statement did around 600K physical I/Os in just over 2 hours.
4) And there were a couple of other statements just like that.
So it was clear that a handfull of statements that were poorly tuned were responsible for bringing down the performance of the whole machine (when the directio mount was not used). The other team working on this database with the large number of I/Os blamed the other thing for causing a performance problem in their database (in principle they are right about this) and basically wanted the directio option removed. So the question becomes should that be done? Or should the second team fix their SQL and application so that less physical I/O is done and everybody will have good performance? My take on it is that the second team should fix their application and that would be my recommendation, but clearly that team doesn't see it that way.
And a conflict is born ....
So who should win?
YAPP Ten years later: What has changed?
In 1995 I started to work on the wait events paper based on Oracle7.3, and that formed the basis for the YAPP white paper. That in turn formed the basis for the wave of response time tuning books and presentations. Now it is funny to see how the same thing is rehashed over and over again and nothing really new is being added. In fact I believe that all the attention on wait events and response time or resource tuning in the Oracle RDBMS, is taking away the way the focus of performance problems that actually originate outside the database. The word "over exposure" comes to mind. Does this mean that wait events are no longer important? Well yes and no. Believe it or not, but most databases suffer from the same performance problems. They differ in the symptoms that they show. For example, many databases suffer from I/O performance problems and Oracle has quite a number of wait events that are directly and indirectly associated with I/O. So instead of approaching each of these events individualy, they could be grouped together to just show the symptom. In fact, Oracle 10g has finally done this and has introduced wait event classes (not new, a company Precise (now part of Veritas) did that first in their product back in 1998). So Oracle is still expanding the number of wait events, but at the same it is grouping them together again.
Before the response time tuning (and even today) people actually based on best practices (BP). Each best practice has a ratio associated with it. For example the Buffer Cache Hit Ratio, basically is the Best Practice that tells people to cache frequently used data (which is a good thing in principle). The problem with tuning today is that many Best Practices exists and they all have some kind of ratio associated with them. So if a performance problem occurs DBAs starting working on this list of Best Practices to check if something applies to their problem. There are a couple of problems with that:
1) Not every body may have the same list of Best Practices or the same threshold for the ratios.
2) The list of Best Practices may not be sorted in the same way for different people
The result is that the problem finding process takes a long time and is not really repeatable by different people (different lists, different ratios). So starting the problem finding process with Best Practices is like shooting a gun and hooping that you will hit some thing.
The Response Time tuning process is basically telling you what Best Practice to use. For example if the Response Time consists mostly of I/O relatated waits we should start looking at the Best Practices for I/O. If CPU is the most common resource consumption, we should start looking at the Logical I/Os .
In this approach the different wait event groups play an important role, because each group is basically assoicated with one or more Best Practices. We still do care about what wait events are actually in the group, but for selecting the right Best Practice it doesn't matter. For solving the problem, it may be important.
I believe that people should start thinking this way instead of worrying about the individual events. I am not saying that the individual events are not important, but keep an eye on the complete picture before diving into the indivual events.
So one of these days (may this year) I should write an update to the YAPP paper and hopefully it can be basisses for a wave in the response time tuning. Oh yeah, I don't see my self as the inventor of all this. As far as I am concerned, Response time Tuning has always been there, it was just not really accepted in the Oracle world as the way of tuning your Oracle database. So may be I started that, but I just wrote the paper(s) and other people like Mogens Norgaard made it popular. It is kind of funny that I actually still use the same method(s) almost 10 years later (and it doesn't matter if they are on instance, session or SQL statement level it is the same methodology, with different result but still solving the same problems; think about that one .....)
Before the response time tuning (and even today) people actually based on best practices (BP). Each best practice has a ratio associated with it. For example the Buffer Cache Hit Ratio, basically is the Best Practice that tells people to cache frequently used data (which is a good thing in principle). The problem with tuning today is that many Best Practices exists and they all have some kind of ratio associated with them. So if a performance problem occurs DBAs starting working on this list of Best Practices to check if something applies to their problem. There are a couple of problems with that:
1) Not every body may have the same list of Best Practices or the same threshold for the ratios.
2) The list of Best Practices may not be sorted in the same way for different people
The result is that the problem finding process takes a long time and is not really repeatable by different people (different lists, different ratios). So starting the problem finding process with Best Practices is like shooting a gun and hooping that you will hit some thing.
The Response Time tuning process is basically telling you what Best Practice to use. For example if the Response Time consists mostly of I/O relatated waits we should start looking at the Best Practices for I/O. If CPU is the most common resource consumption, we should start looking at the Logical I/Os .
In this approach the different wait event groups play an important role, because each group is basically assoicated with one or more Best Practices. We still do care about what wait events are actually in the group, but for selecting the right Best Practice it doesn't matter. For solving the problem, it may be important.
I believe that people should start thinking this way instead of worrying about the individual events. I am not saying that the individual events are not important, but keep an eye on the complete picture before diving into the indivual events.
So one of these days (may this year) I should write an update to the YAPP paper and hopefully it can be basisses for a wave in the response time tuning. Oh yeah, I don't see my self as the inventor of all this. As far as I am concerned, Response time Tuning has always been there, it was just not really accepted in the Oracle world as the way of tuning your Oracle database. So may be I started that, but I just wrote the paper(s) and other people like Mogens Norgaard made it popular. It is kind of funny that I actually still use the same method(s) almost 10 years later (and it doesn't matter if they are on instance, session or SQL statement level it is the same methodology, with different result but still solving the same problems; think about that one .....)
Subscribe to:
Posts (Atom)

