I ran into an interesting behavior with MySQL this week. I was helping someone speed up a slow page and a sizable increase in performance was achieved by simply re-arranging the SQL statement. The page was calling a SQL statement inside of a loop-- probably around 150 times on a page load. My initial idea (and still the best long-term one I think) was to gather all the information from the database in a single cfquery and not hit the database over and over. That approach would have required quite a lot more refactoring of code, so first we tried to squeeze some better performance out of the SQL already being called.This site used MySQL which I am not as familiar with compared to MSSQL, but I started by looking at an EXPLAIN statement. In MySQL the placing the keyword EXPLAIN before your select will cause the engine to output a result set which describes how the select was performed.
[code]EXPLAIN SELECT *
FROM my_table
WHERE column = 'foo'[/code]
I will start by saying the amount of information MySQL (community edition) gives you when compared to a much more robust (and expensive) DBMS is much smaller. Perhaps the Enterprise edition of MySQL is better, but there's a REASON you pay for the Enterprise stuff. I LOVE how MS Enterprise Manager graphically maps out the entire query. The select in question was basically this:
[code]SELECT count(id) AS totalcount
FROM table1
WHERE fkid IN (SELECT id FROM table2 WHERE fkid2 = 100)[/code]
You can see it used a sub-select on table2 in the where clause to limit the records from table1. Table1 had a few hundred thousand records and table2 around twenty thousand. It began performing about 30 times faster when re-wrote it like so:
[code]SELECT count(table1.id) AS totalcount
FROM table1
INNER JOIN table2 e ON table1.fkid = table2.id 
	AND table2.fkid2 = 100;[/code]
All I did was move the sub-select into an inner join. Let's take a quick look at the execution plans: Version 1 (slower):
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY table1 index {null} fkid 4 {null} 249844 Using where; Using index
2 DEPENDENT SUBQUERY table2 unique_subquery PRIMARY,fkid2 PRIMARY 4 func 1 Using where
Version 2 (faster):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table2 ref PRIMARY,fk1d2 fk1d2 4 const 50 Using index
1 SIMPLE table1 ref fkid1 fkid1 4 table2.id 4 Using index
I'm not going to get too deep into what all that means. If you want a full explanation, check out the MySQL reference here: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html There's quite a bit of information there. The main difference I noticed from the plan was that the first method figured it would go through the entire table1 (about 250,000 records). The second select only expected to hit about 50 records in table1. What's a bit more interesting is that I have tried this setup both ways on MSSQL and there never seemed to be much of a difference between the two. It seems that Microsoft's query optimizer works a little differently. I don't have access to MSSQL right now or I would compare the plan there as well. Overall, simply by re-arranging that query a bit we were able to get a 20 second page load down to well under a second. I love improving performance. A fast web site is a happy web site.