Monday, 25 August 2014

Update to Link Table rules

Back in 2011, I wrote a blog post entitled "Rules for creating a Key/Link Table in QlikView".  There were 3 basic rules:

1. Create a primary key in all the tables,
2. Rename the foreign keys to break the links,
3. Use a mixture of Concatenate and Join to generate the Key table using the Resident data.

These rules will still create you a Key/Link table that will work, but it might not always be the most efficient.  As I have had to deal with larger and larger data sets in QlikView, I have learned the hard way that things that work quite well on a table with a few hundred thousand records do not work quite as well when your tables with a few hundred million.

It is really the first "rule" here that can be the problem.  In a dimension table, the key that associates to the Key/Link table does need to be the primary key, however, the key that associates fact tables to the Key/Link table does not have to be a primary key!  This key only needs to be unique for the set of keys being linked to in the Key/Link table, not for the whole record.

For example, if my fact table contains CustomerID and DateID, most of the time that combination will not make a primary key in the fact table, but it is all the key that we need to associate to a Key/Link table containing CustomerID and DateID.  By not using a primary key in the fact table, we are vastly reducing the cardinality of the key being used and hence the amount of memory required to store it.

Another change that I would make is that I no longer use AutoNumberHashxxx to generate a key value.  Now, I always use AutoNumber with an AutoID.  This is because AutoNumber with an AutoID will generate a set of sequential integers which do not, in general, get stored in symbol tables, further reducing the memory requirement.  Using AutoNumberHashxxx to generate multiple different keys will result in non-sequential integer key values which will have to be stored in Symbol tables.

Yet another change that I would require is that the Key/Link table is generated using the Distinct keyword.  Nothing kills performance quite like having duplicate key values in a Key/Link table.

Finally, I no longer require to rename the foreign keys and then rename them back when creating the Key/Link table - I just leave them as is and then use the Drop Field xxx From yyy statement to remove the old foreign keys.

So, the new rules become:

1.  Create a key in each table at the correct granularity to associate to the Key/Link table.  Use AutoNumber with an AutoID.

2.  Use a combination of Concatenate, Join and Distinct to load a Key/Link table of distinct values.

3.  Use Drop Field xxx From yyy to remove the old foreign keys from the fact tables.

For example:

I have a simple structure with Customer, Calendar, Orders.  I have a separate fact table with Customer, Calendar and Delivery information.  I get a synthetic key that I want to resolve by using a Key/Link table:




Step 1 - Customer and Calendar (DateID) already have a primary key.  In Order I will create an ID from CustomerID and DateID.  I will do the same in Delivery:

   ...
   AutoNumber(CustomerID & '-' & DateID, 'CD') As LinkID,
   ...


Step 2 - Now I load my key table.  I will begin with data from the Order table:

   Key:
   Load Distinct
LinkID,
CustomerID,
DateID
   Resident
Order;

Now Join in the Product and OrderDetail keys from the OrderDetail table.

   Concatenate(Key)
   Load Distinct
LinkID,
CustomerID,
DateID
   Resident
Delivery;


Step 3 - Drop the old foreign keys:

   Drop Fields CustomerID, DateID From Order;
   Drop Fields CustomerID, DateID From Delivery;

Now all should be good:


Of course, we don't just create link tables to avoid synthetic keys, we usually create them to overcome association difficulties such as loops.

There is a special prize of a free copy of my forthcoming book, Mastering QlikView, for anyone who can give me a good, technical (i.e. not "because Qlik say so"), reason as to why the original, synthetic key containing, data structure is less desirable than the link table structure.


Stephen Redmond is author of QlikView Server and Publisher and the QlikView for Developer's Cookbook
He is CTO of CapricornVentis a QlikView Elite Partner. We are always looking for the right people to join our team.
Follow me on Twitter: @stephencredmond