2012-05-29 14:15:57

Background
In January 2011 I posted Inline SQL PL vs compiled SQL PL.
One of the points I made in this entry was that inline SQL PL is preferable for simple SQL PL logic because it is faster.
This implies of course that compiled SQL PL has some headroom when it comes to performance improvements.
There is always room for improvement just about anywhere, so it can be interesting to talk about what motivated investments in this area.
In this case an IBM partner was enabling an Oracle application to DB2.
Most times we find that applications operate at competitive speeds after moving to DB2 thanks to the law of averages.
So DB2 may be slower for some SQL statements, but it will be faster for others.
As long as the fast statements cancel out the slow statements for all the time critical transactions everything is good.
In this case however the partner ended up with one critical transaction that exposed a performance regression.
Some further analysis revealed that there was a single PL/SQL function which consumed a lot more time in DB2.
A look at the function revealed that the function did not contain a single SQL statement.
All the function did was some sophisticated  scanning of strings to mine for patterns.
In this post I will describe the problem and how DB2 10 has fixed it.
Compiled SQL PL in DB2 9.7
Naturally I cannot reveal the partner's source code here.
So I have chosen an artificial test to illustrate the problem.
Some of you may know the "3*a+1" sequence.
I don't even recall when I bumped into it the first time.
But it's easy to understand and code:
1. Take a whole number n
2. If n is 1 then the sequence finishes
If n odd then multiply n by 3 and add one. This is the new n
If n is even then divide n by 2. This is the new n
3. Go back to step 2 until the sequence finishes
The question is whether this sequence is always finite.
The following SQL PL function tries out the first n whole numbers and verifies the sequence reaches 1 in each case.
To ensure we are comparing apples to apples we make this test in DB2 10, but restricting it to DB2 9.,7 technology:

1. CALL set_routine_opts('PVM_OPTLEVEL 127');
2. CREATE OR REPLACE FUNCTION three_a_plus_one(IN stop INTEGER)
3. RETURNS INTEGER SPECIFIC three_a_plus_one
4. BEGIN
5. DECLARE len INTEGER;
6. DECLARE n, a INTEGER DEFAULT 1;
7. DECLARE stamp TIMESTAMP;
8. SET (len, stamp) = (0, TIMESTAMP(GENERATE_UNIQUE()));
9. WHILE n < stop DO
10. SET a = n;
11. WHILE a <> 1 DO
12. IF (a / 2) * 2 = a THEN
13. SET a = a / 2;
14. ELSE
15. SET a = 3 * a + 1;
16. END IF;
17. END WHILE;
18. SET n = n + 1;
19. END WHILE;
20. RETURN midnight_seconds(TIMESTAMP(GENERATE_UNIQUE()))- midnight_seconds(stamp);
21. END
22. /
23. VALUES three_a_plus_one(100000);
24. 1
25. -----------
26. 209

This is the baseline to beat.
But where does time go here?

The SQL PL Profiler described previously in this blog can tell us that:

1. LINE EXEC TIME(ACT) CPU_TIME TEXT
2. ---- --------- --------- --------- --------------------------------------------------------------------
3. 1 CREATE OR REPLACE FUNCTION three_a_plus_one(IN stop INTEGER)
4. 2 RETURNS INTEGER SPECIFIC three_a_plus_one
5. 3 BEGIN
6. 4 DECLARE len INTEGER;
7. 5 DECLARE n, a INTEGER DEFAULT 1;
8. 6 DECLARE stamp TIMESTAMP;
9. 7 1 0 0 SET (len, stamp) = (0, TIMESTAMP(GENERATE_UNIQUE()));
10. 8 WHILE n < stop DO
11. 9 SET a = n;
12. 10 WHILE a <> 1 DO
13. 11 10753712 46283 64847998 IF (a / 2) * 2 = a THEN
14. 12 SET a = a / 2;
15. 13 ELSE
16. 14 SET a = 3 * a + 1;
17. 15 END IF;
18. 16 END WHILE;
19. 17 SET n = n + 1;
20. 18 END WHILE;
21. 19 1 0 0 RETURN midnight_seconds(TIMESTAMP(GENERATE_UNIQUE()))- midnight_seconds(stamp);
22. 20 END

What this is showing is that several of the statements inside of the loop are seen by DB2 as SQL statements.
Most of the math already happens within the Procedure Virtual Machine (PVM).
For example comparison on line 8 or the division in line 12.
It is interesting to see that about 10 million statements are being executed here.
Now let's take a look at the best case.
Inlined SQL PL
The densest way to drive the logic is to inline the SQL PL.
This is done by adding an ATOMIC keyword to the function.
When this is done DB2  will macro expand the logic into the VALUES statement.

1. CREATE OR REPLACE FUNCTION three_a_plus_one(IN stop INTEGER)
2. RETURNS INTEGER SPECIFIC three_a_plus_one
3. BEGIN ATOMIC
4. DECLARE len INTEGER;
5. DECLARE n, a INTEGER DEFAULT 1;
6. DECLARE stamp TIMESTAMP;
7. SET (len, stamp) = (0, TIMESTAMP(GENERATE_UNIQUE()));
8. WHILE n < stop DO
9. SET a = n;
10. WHILE a <> 1 DO
11. IF (a / 2) * 2 = a THEN
12. SET a = a / 2;
13. ELSE
14. SET a = 3 * a + 1;
15. END IF;
16. END WHILE;
17. SET n = n + 1;
18. END WHILE;
19. RETURN midnight_seconds(TIMESTAMP(GENERATE_UNIQUE()))- midnight_seconds(stamp);
20. END
21. /
22. VALUES three_a_plus_one(100000);
23. 1
24. -----------
25. 17

Due to inlining there are no SQL statements by definition.
So I cannot show profiler output here.
Instead we would have to look at the optimizer plan which I will skip to not distract from the goal of the post.
Now, let's take a look at normal DB2 10 operation.
Compiled SQL PL in DB2 10
We reset the "magic switch" we used to place DB2 10 into  DB2 9.7 mode and recreate the function without the ATOMIC keyword.

1. CALL set_routine_opts('');
2. CREATE OR REPLACE FUNCTION three_a_plus_one(IN stop INTEGER)
3. RETURNS INTEGER SPECIFIC three_a_plus_one
4. BEGIN
5. DECLARE len INTEGER;
6. DECLARE n, a INTEGER DEFAULT 1;
7. DECLARE stamp TIMESTAMP;
8. SET (len, stamp) = (0, TIMESTAMP(GENERATE_UNIQUE()));
9. WHILE n < stop DO
10. SET a = n;
11. WHILE a <> 1 DO
12. IF (a / 2) * 2 = a THEN
13. SET a = a / 2;
14. ELSE
15. SET a = 3 * a + 1;
16. END IF;
17. END WHILE;
18. SET n = n + 1;
19. END WHILE;
20. RETURN midnight_seconds(TIMESTAMP(GENERATE_UNIQUE()))- midnight_seconds(stamp);
21. END
22. /
23. VALUES three_a_plus_one(100000);
24. 1
25. -----------
26. 9

We have reduced the time 20 fold!

Even better we have beaten inline SQL PL by a factor of 2!

A look at the profiler shows why

1. LINE EXEC TIME(ACT) CPU_TIME TEX
2. ---- ---------- ---------- ---------- -------------------------------------------------------------
3. 1 CREATE OR REPLACE FUNCTION three_a_plus_one(IN stop INTEGER)
4. 2 RETURNS INTEGER SPECIFIC three_a_plus_one
5. 3 BEGIN
6. 4 DECLARE len INTEGER;
7. 5 DECLARE n, a INTEGER DEFAULT 1;
8. 6 DECLARE stamp TIMESTAMP;
9. 7 1 0 0 SET (len, stamp) = (0, TIMESTAMP(GENERATE_UNIQUE()));
10. 8 WHILE n < stop DO
11. 9 SET a = n;
12. 10 WHILE a <> 1 DO
13. 11 IF (a / 2) * 2 = a THEN
14. 12 SET a = a / 2;
15. 13 ELSE
16. 14 SET a = 3 * a + 1;
17. 15 END IF;
18. 16 END WHILE;
19. 17 SET n = n + 1;
20. 18 END WHILE;
21. 19 1 0 0 RETURN midnight_seconds(TIMESTAMP(GENERATE_UNIQUE()))- midnight_seconds(stamp);
22. 20 END

The only remaining SQL statements are the ones used for the timer.
The reason being that DB2 presently cannot execute expressions locally which are non deterministic.
GENERATE_UNIQUE() falls into this category.
Other limiting factors are global variables or user defined function invocations.
Restoring the Honor of inline SQL PL
Beating inline SQL PL was unexpected.
The reason for this is in the IF THEN ELSE statement.
Inline SQL PL has to emulate that in a relational access plan which is not trivial complex.
If we squeeze that IF out we should restore inline SQL PL to the top:

1. CREATE OR REPLACE FUNCTION three_a_plus_one(IN stop INTEGER)
2. RETURNS INTEGER SPECIFIC three_a_plus_one
3. BEGIN ATOMIC
4. DECLARE len INTEGER;
5. DECLARE n, a INTEGER DEFAULT 1;
6. DECLARE stamp TIMESTAMP;
7. SET (len, stamp) = (0, TIMESTAMP(GENERATE_UNIQUE()));
8. WHILE n < stop DO
9. SET a = n;
10. WHILE a <> 1 DO
11. SET a = CASE WHEN (a / 2) * 2 = a
12. THEN a / 2
13. ELSE 3 * a + 1 END;
14. END WHILE;
15. SET n = n + 1;
16. END WHILE;
17. RETURN midnight_seconds(TIMESTAMP(GENERATE_UNIQUE()))- midnight_seconds(stamp);
18. END
19. /
20. VALUES three_a_plus_one(100000);
21. 1
22. -----------
23. 4

That's much better!
But really we are not comparing apples to apples anymore.
Will the same trick help compiled SQL PL as well

1. CREATE OR REPLACE FUNCTION three_a_plus_one(IN stop INTEGER)
2. RETURNS INTEGER SPECIFIC three_a_plus_one
3. BEGIN
4. DECLARE len INTEGER;
5. DECLARE n, a INTEGER DEFAULT 1;
6. DECLARE stamp TIMESTAMP;
7. SET (len, stamp) = (0, TIMESTAMP(GENERATE_UNIQUE()));
8. WHILE n < stop DO
9. SET a = n;
10. WHILE a <> 1 DO
11. SET a = CASE WHEN (a / 2) * 2 = a
12. THEN a / 2
13. ELSE 3 * a + 1 END;
14. END WHILE;
15. SET n = n + 1;
16. END WHILE;
17. RETURN midnight_seconds(TIMESTAMP(GENERATE_UNIQUE()))- midnight_seconds(stamp);
18. END
19. /
20. VALUES three_a_plus_one(100000);
21. 1
22. -----------
23. 7

That's better as well, but doesn't beat inline SQL PL
To complete the circle let's try the same routine without the optimizations of DB2 10

1. CALL set_routine_opts('PVM_OPTLEVEL 127');
2. CREATE OR REPLACE FUNCTION three_a_plus_one(IN stop INTEGER)
3. RETURNS INTEGER SPECIFIC three_a_plus_one
4. BEGIN
5. DECLARE len INTEGER;
6. DECLARE n, a INTEGER DEFAULT 1;
7. DECLARE stamp TIMESTAMP;
8. SET (len, stamp) = (0, TIMESTAMP(GENERATE_UNIQUE()));
9. WHILE n < stop DO
10. SET a = n;
11. WHILE a <> 1 DO
12. SET a = CASE WHEN (a / 2) * 2 = a
13. THEN a / 2
14. ELSE 3 * a + 1 END;
15. END WHILE;
16. SET n = n + 1;
17. END WHILE;
18. RETURN midnight_seconds(TIMESTAMP(GENERATE_UNIQUE()))- midnight_seconds(stamp);
19. END
20. /
21. VALUES three_a_plus_one(100000);
22. 1
23. -----------
24. 211

Roughly the same speed.
So with 210s/7s that's a factor of 30 for a well tuned function.
Conclusion
DB2 10 significantly improves SQL PL (and with that also PL/SQL) performance.
In extreme cases this may be a factor of 30!
As SQL statements such as SELECT or UPDATE are added to a routine this relative increase will drop.
But for logic-heavy routines you can expect solid improvements which reduce the need to resort to inline SQL PL
Note that what we have not measured here was the overhead of invoking a compiled routine over inlining SQL PL.