Option 1: I can just use the same expressions in full:
Sum(Sales) - Sum(CostPrice*Quantity)
Option 2: I can reference the column values using the Column() function:
Column(1)-Column(2)
Option 3: I can reference the labels of the first expressions:
[Total Sales] - [Total Costs]
Now, someone once said to me that the last option, although it might appear to be the best because you are re-using existing values, actually calculates slower than the others. I have often wondered if that was true. Someone else suggested to me recently that the 2nd option actually performs quicker. I decided that I should find out!
Buried deep in the Document Properties of a QVW, on the Sheets tab, there is a list of all the objects on the sheet along with their last calculation time (CalcTime) in milliseconds. If you looked at this for one of your charts, you might find that it says "0". This is because the last calculation might have been from cached data so the only accurate CalcTime is when the chart is first initialized in QlikView or when the first of a particular set of selections are made in a recently opened document - where the cache has not been established yet.
So, what I did was to take a document with about 5,000,000 sales transaction rows and build a simple Straight Table of Sales / Costs / Margin by Country. I made 3 copies of this document, each copy making the Margin calculation in one of the above ways. The only object in each document is the straight table.
I then completely closed QlikView. I opened one of the documents, recorded the "Initialize" CalcTime for the chart, made a selection of 2 countries (UK and USA) and recorded the "Selection" CalcTime for the chart. Then I shut down QlikView again and repeated. I did this 8 times for each of the documents, opening them in random order.
Here are the results:
Option 1 | Option 2 | Option 3 | ||||
---|---|---|---|---|---|---|
Initialize | Select | Initialize | Select | Initialize | Select | |
219 | 125 | 219 | 125 | 219 | 125 | |
250 | 140 | 250 | 125 | 202 | 125 | |
234 | 125 | 265 | 156 | 203 | 124 | |
218 | 125 | 219 | 125 | 218 | 125 | |
250 | 125 | 218 | 141 | 218 | 140 | |
219 | 125 | 281 | 156 | 250 | 156 | |
234 | 141 | 218 | 141 | 219 | 140 | |
219 | 125 | 218 | 125 | 297 | 141 | |
Mean | 230.4 | 128.9 | 236.0 | 136.8 | 228.3 | 134.5 |
Mode | 219.0 | 125.0 | 218.0 | 125.0 | 219.0 | 125.0 |
Median | 226.5 | 125.0 | 219.0 | 133.0 | 218.5 | 132.5 |
There is some slight differences. But, statistically, I would have to say that there is no real difference between the 3 options!
One thing that I would say is that I don't like using the Column() function in a Straight Table. This is because the user may have the option to swap columns around and that would invalidate the expression. Other than that, it appears that you can just use whatever option you like and it won't make any difference to QlikView performance.
Stephen Redmond is CTO of CapricornVentis a QlikView Elite Partner. We are always looking for the right people to join our team.
Follow me on Twitter: @stephencredmond
Hi Stephen,
ReplyDeleteI did a similar test some time ago (http://www.qlikfix.com/2011/06/21/testing-the-performance-implications-of-variables-and-label-referencing-versus-direct-expressions/), comparing expressions, variables and column references. Came to the same conclusion, the differences are negligible.
I had actually expected column referencing to use some sort of caching, but maybe QlikView is so smart that it recognizes that you're reusing the same expression ;)
Cheers,
Barry
Whoops! Didn't mean to look like I was copying some existing work.
DeleteInterestingly, my results show a very slight advantage for using label references. However, as I said, I don't believe that it is statistically significant.
No worries, wasn't meant like that. I get this question quite often, so I assume others do as well, turns out to be the case :)
DeleteStephen and Barry: I got different results when using more resource intensive functions in the chart. http://blog.axc.net/?p=1042
ReplyDeleteVery interesting, seems like results are being cached. Do you have any data for using variables as well?
DeleteI do not. I would expect the same results as explicit expressions, because the variables are really just doing text replacement.
ReplyDeleteconcur with Mike - with very complex expressions, labels were two times faster in my case. My table was over 90 million rows
ReplyDelete