Saturday, September 30, 2006

This is a picture taken by the Hubble telescope before it broke down. NASA is desperately trying to repair it. But this last picture shows the surface of URANUS. It has great detail. More detail will follow in a later blog.

Wednesday, September 13, 2006

OUG Scotland (making money while flying)

On Sunday september 10th I flew to Scotland to present at the Oracle Usergroup in Glasgow. I was aware of the new rules with carry on lugguage (well I thought I knew) and handed my carry on back over at the checkin counter and just double checked to see if the rules were still very strict. Turns out I was allowed to bring the bag on board, however it was too heavy and that is why it had to be checked in. Fine, I was not in a hurry so no problem. Oh, I flew Transavia and paid 2 euro (excluding tax) from this roundtrip ticket.

Got on board with 50 other passengers (plane was a 737-700, so many empty seats). Got to Glasgow Prestwick International Airport walked to the terminal building to get my lugguage, and of course it didn't show up. Filed a missing luguage report (that was a big challenge due to noise in the building and the accent of the lady asking questions), turns out there was another passenger missing his bag, that made me feel good :)

Turns out he was going to Glasgow also and helped me to find the train (no ticket needed) and when we arrived at the Central Train station in Glasgow he helped me to find the hotel. It was a beautiful day and many people were out shopping.

Checked in to the hotel and again had trouble understanding the girl at checkin, got to the room and ofcourse got on to the internet.

Monday arrives and I write my presentation in the morning as I have my presentation at 14:30. Decide to buy a clean shirt and make my way to the venue (Radisson hotel next to central station). But I can't find it. Get phone calls from the organisers who are worried (I am not) that I may miss the presentation slot. So I ask them to tell me the name of the road that the hotel is on and they can't. For the next 10 minutes I wander around Glasgow and then Doug Burns finally gives me permission to land at the hotel. Put on the new the shirt, did the presentation and then there was beer.

Oh and the lugguage didn't show until Tuesday afternoon (after my second presentation). Turns that Transavia pays 50 euro per day for the missing lugguage. That is at least 100 euro. Not bad if the ticket was costing 2 euro. Minus the price of the shirt, that I needed any way :-)

Thursday, August 24, 2006

Miracle Director Training Program

It has been a hectic week. My wife got a job and is now working 32 hours a week, that took less then 2 weeks, and on her first day I was in Denmark with my good friend Mogens Norgaard from Miracle Denmark. He said to me that I should come and be part of the Miracle Director Training Program. Last week it was Thomas Presslie from Miracle Scotland who got the same treatment.

The training program is pretty though and Mogens took a picture of me this morning after I had slept a couple of hours on the couch in the office. Ofcourse one has to wake up if the first people get into the office: "Good Morning Anjo!" So I roll of the couch and go the yellow house and fire up the laptop at the big Oaktable (yes that famous table) and I start doing email.

So we are making plans for Miracle Benelux and we are planning some Oracle and SQL Server events in November and December. We will present some of the big names of the Oracle and SQL Server world. So keep your calender and budget available for these events :)

More info on these events later .......

(I got some problems posting the picture, will try to figure it out later)

Monday, August 21, 2006

Back again

It took a while for a new post to show up on this blog, but I am back. The reason for that is that I quit my comfy job at Symantec and will startup Miracle Benelux together with Mogens Norgaard. So now I can post what ever I want without having some corporate lawyer looking over my shoulder.

So why quit? I tried to make a difference for close to 5 years and after the 3rd reorg in the last 3 years I realized that I wouldn't be able to make the product and the difference that I wanted. I am used to change and change is good if progress is made to achieve your goal. But I felt that all the changes where actually causing us to loose focus on what needed to be done. And working for a large company on a product that has less than 2 percent revenue impact ............

So is the new thing going to be easy? Of course not, but I will have fun trying!

Will keep you posted on the progress!

Thursday, June 16, 2005

Running a simple test

How does one show the impact of long Seek times on the total I/O response time? I took a maxtor 200GB firewire harddrive (for my linux RAC test installation) and reconfigured the hard drive to have partitions of 2 GB on the outer tracks and inner tracks (accomplished with fdisk).

Disk /dev/sda: 203.9 GB, 203927060480 bytes
255 heads, 63 sectors/track, 24792 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

DeviceBoot Start End Blocks Id System
/dev/sda1 1 250 2008093+ 83 Linux
/dev/sda2 251 24542 195125490 83 Linux
/dev/sda3 24543 24792 2008125 83 Linux


Then downloaded the Reader.c program from the website of Jonathan Lewis (didn't want to write my own program). The test was done with running two processes running against the partitions. The partition size is important to overcome caching issues on the disk (and the system buffer cache). I also choose raw partitions to overcome potential locking issues in the file system(s).

Here are the results:

Test 1 (run 2 processes reading from partition 1 doing a 1000 Random Reads each)

+ ./read /dev/sda1 R 8192 262144 1000 13
File name: /dev/sda1
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 13
Descriptor: 3
+ ./read /dev/sda1 R 8192 262144 1000 19
[root@linrac1 tmp]# File name: /dev/sda1
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 19
Descriptor: 3

real 0m11.323s
user 0m0.010s
sys 0m0.030s

real 0m11.536s
user 0m0.000s
sys 0m0.040s

Now we run the same test against partition 3 (inside of the disk)
+ ./read /dev/sda3 R 8192 262144 1000 23
+ ./read /dev/sda3 R 8192 262144 1000 29
File name: /dev/sda3
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 29
Descriptor: 3
[root@linrac1 tmp]# File name: /dev/sda3
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 23
Descriptor: 3

real 0m17.398s
user 0m0.000s
sys 0m0.010s

real 0m17.482s
user 0m0.000s
sys 0m0.090s

The last test (run 1 process against partition 1 and 1 process against partition 3)
+ ./read /dev/sda1 R 8192 262144 1000 33
File name: /dev/sda1
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 33
Descriptor: 3
+ ./read /dev/sda3 R 8192 262144 1000 39
[root@linrac1 tmp]# File name: /dev/sda3
Random or Serial: R
Read Size: 8192
Boundary: 262144
Read Count: 1000
Rand Seed: 39
Descriptor: 3

real 0m32.673s
user 0m0.010s
sys 0m0.090s

real 0m32.926s
user 0m0.000s
sys 0m0.070s

So to summarize:

Outside Partition 11.4 seconds for 2000 8K reads
Inside Partition 17.4 seconds for 2000 8K reads (50 percent slower
Inside/Outside Partition 32.7 seconds for 2000 8K reads (200 percent slower)

Average read Outside 11.4 seconds / 2000 = 0.0057 seconds (5.7 ms)
Average read Inside 17.4 seconds / 2000 = 0.0087 seconds (8.7 ms)
Average read Inside/Out 32.7 seconds / 2000 = 0.0164 seconds (16.4 ms)

So the total number of I/Os that we can do really depends on where the data is located on the disk. According to maxtor the average seek time for this disk is around 9.3 ms. It means that the full seek time will be around 16-18 ms. That seems to fit. Subtract the average outside time from the average inside/out (16.4 - 0.5) and we get 15.9 millisecond as may be 90 percent of the full seek time.

So even for the simple ATA drives, it is important where the data is stored.

Sunday, June 12, 2005

Outer Edge of Disk

There is a thread on AskTom about where to place data on the disk. The question is why one should place data on the outside of the disk and if that helps performance. I just like to add my view on that discussion. The number of physical I/Os per Disk is limited by the (full/average) seek time. The seek time is the biggest component of the I/O time (with normal Oracle blocksize). The Full Seek Time (seeking from the outermost track to the innermost track or vice versa) can be greatly reduced by only using the outermost tracks on a disk. Why the outermost tracks? Well there is this statistic that on the 1/3 of the outermost tracks there is around 50 percent of the disk capacity stored. If I only have to seek 1/3 of the distance to reach 50 percent of the data all the time, the average seek time will be greatly reduced. If the Seek time is greatly reduced, the total I/O time is greatly reduced and we can do more random I/O operations per second. So to summarize:
1) Use around 50 percent of the disk capacity (outer 1/3 of the tracks)
2) That will increase the number of random I/Os greatly.

It doesn't matter of you use SAN, NAS or JBOD, it is the same principle. Well with SAN and NAS you may not know where your data is located :-) But that is a completely different story.

Friday, June 10, 2005

Physical I/O

Last week I was involved in solving a performance problem for a customer that suffered from enormous performance slow downs. The funny thing was that there was a special task force assigned to this problem by the customer and they had been working since half december on this problem. That task force consisted of 22 people and some were actually flown in to help solve the problem. A consultant was asked to have a fresh look at the problem by installing Veritas i3 for Oracle (that was used succesfully used in another project). After installing the tool and getting the performance data, he quickly called me after one day to confirm his findings (basically all the wait time was I/O related) and while we spoke on the phone we checked the mount options for the file systems and guess what, the directio mount option was commented out. It was used on many other servers, but on this particular server it was not. After talking to the task force it wasn't clear why this was done. So the decision made to include the directio mount option again and at the same time change two init.ora paramters: optimzer_index_cost_adj and optimzer_index_caching. The database was bounced and a simple batch job was run to validate the performance improvement. That batch job ran before with 50-70 calculations per hour and after the changes it ran with 2500+ calculations per hour. There was a factor of 35-50 times improvement. So this was great news ofcourse, however the next day one of the other databases on the system reported a severe performance problem. And in fact a quick check revealed that it was waiting most of the time in I/O related events. A further analysis showed a couple of things:
1) The Oracle buffer cache was really small and so the File System buffer cache was heavily used to compensate for that small Oracle buffer cache.
2) It also showed that a handfull of statements were responsible for 80+ percent of the I/O workload.
3) In fact it was discovered that one statement did around 600K physical I/Os in just over 2 hours.
4) And there were a couple of other statements just like that.

So it was clear that a handfull of statements that were poorly tuned were responsible for bringing down the performance of the whole machine (when the directio mount was not used). The other team working on this database with the large number of I/Os blamed the other thing for causing a performance problem in their database (in principle they are right about this) and basically wanted the directio option removed. So the question becomes should that be done? Or should the second team fix their SQL and application so that less physical I/O is done and everybody will have good performance? My take on it is that the second team should fix their application and that would be my recommendation, but clearly that team doesn't see it that way.
And a conflict is born ....

So who should win?

YAPP Ten years later: What has changed?

In 1995 I started to work on the wait events paper based on Oracle7.3, and that formed the basis for the YAPP white paper. That in turn formed the basis for the wave of response time tuning books and presentations. Now it is funny to see how the same thing is rehashed over and over again and nothing really new is being added. In fact I believe that all the attention on wait events and response time or resource tuning in the Oracle RDBMS, is taking away the way the focus of performance problems that actually originate outside the database. The word "over exposure" comes to mind. Does this mean that wait events are no longer important? Well yes and no. Believe it or not, but most databases suffer from the same performance problems. They differ in the symptoms that they show. For example, many databases suffer from I/O performance problems and Oracle has quite a number of wait events that are directly and indirectly associated with I/O. So instead of approaching each of these events individualy, they could be grouped together to just show the symptom. In fact, Oracle 10g has finally done this and has introduced wait event classes (not new, a company Precise (now part of Veritas) did that first in their product back in 1998). So Oracle is still expanding the number of wait events, but at the same it is grouping them together again.

Before the response time tuning (and even today) people actually based on best practices (BP). Each best practice has a ratio associated with it. For example the Buffer Cache Hit Ratio, basically is the Best Practice that tells people to cache frequently used data (which is a good thing in principle). The problem with tuning today is that many Best Practices exists and they all have some kind of ratio associated with them. So if a performance problem occurs DBAs starting working on this list of Best Practices to check if something applies to their problem. There are a couple of problems with that:
1) Not every body may have the same list of Best Practices or the same threshold for the ratios.
2) The list of Best Practices may not be sorted in the same way for different people

The result is that the problem finding process takes a long time and is not really repeatable by different people (different lists, different ratios). So starting the problem finding process with Best Practices is like shooting a gun and hooping that you will hit some thing.

The Response Time tuning process is basically telling you what Best Practice to use. For example if the Response Time consists mostly of I/O relatated waits we should start looking at the Best Practices for I/O. If CPU is the most common resource consumption, we should start looking at the Logical I/Os .

In this approach the different wait event groups play an important role, because each group is basically assoicated with one or more Best Practices. We still do care about what wait events are actually in the group, but for selecting the right Best Practice it doesn't matter. For solving the problem, it may be important.

I believe that people should start thinking this way instead of worrying about the individual events. I am not saying that the individual events are not important, but keep an eye on the complete picture before diving into the indivual events.

So one of these days (may this year) I should write an update to the YAPP paper and hopefully it can be basisses for a wave in the response time tuning. Oh yeah, I don't see my self as the inventor of all this. As far as I am concerned, Response time Tuning has always been there, it was just not really accepted in the Oracle world as the way of tuning your Oracle database. So may be I started that, but I just wrote the paper(s) and other people like Mogens Norgaard made it popular. It is kind of funny that I actually still use the same method(s) almost 10 years later (and it doesn't matter if they are on instance, session or SQL statement level it is the same methodology, with different result but still solving the same problems; think about that one .....)