Saturday, February 25, 2012

Attributes in referenced dimension

I am wondering about the correct/best/most efficient way to access attributes contained in a referenced dimension. ie in AdventureWorks: say I had population data for states (or some other level other than the base key/postal code level), and I added this as a column in the Geography dimension (Population attribute/property related to state attribute); I would then like to use this new data in a query against Reseller Sales by the Reseller dimension. Is this the "right"/optimal approach:

with member [Measures].[popu] as linkmember([Reseller].[Geography Key].currentmember, [Geography].[Geography]).parent.parent.properties('Population', TYPED)

select [Reseller].[Reseller].members on 1,

[Measures].[popu] on 0

from [Adventure Works]

While this works, I do wonder about the efficiency, and Mosha's comments regarding the general avoidance of both linkmember and properties has me suspicious that "there must be a better way"....

Thanks!

-f

Efficiency can be a question of how your users would like to see data in dimensions or how fast a query will be.

If you create a view on the Resellers and the Geography dimension you will have autoexist between attributes in these dimensions. That can be an efficient and a better solution than using referenced dimensions. This is from the point of view of how SSAS2005 works.

HTH

Thomas Ivarsson

|||

Thanks, Thomas -- in my particular situation, I'm pretty stuck with the referenced dimension (my real situation is a lot more complex). I do pull many attributes into the logical equivallent of the Reseller dimension in my situation, but can't pull all -- so I was really focused on the question of if this is indeed the right way to cross the referenced boundary and effect the "indirect overwrite" (to coin a phrase) to grab the attribute of interest (on a different level/hierarchy in the referenced dimension just to add to the complexity). I'm a little surprised if there isn't an easier/more efficient way to do this....

On an indirectly related note, this difficulty with referenced dimensions does really bug me, as while they do provide some measure of reuse, it seems that we still need to do a lot of data duplication between dimensions to really make use of the data values for calculations and even browsing, as in my example. While this is not a huge impact in terms of storage (dimensions are small and all that), it seems wrong from a development POV as we have data access/migration spread between ETL and/or AS DSV and/or AS calcs to overcome an implementation issue. I understand that there is a performance impact to keeping things clean, but as attributes in related dimensions are logically really just attributes in other hierarchies in the relating dimension, it would seem to me important to unify the handling of related dimensions with hierarchies within a dimension. (Logically, I would have expected this sort of functionality to be implemented by having related dimensions implemented directly attached to a dimension, in the dimension design area -- to perhaps be materialized as part of the processing of the relating dimension, rather than via the dimension usage area, but perhaps I am missing something fundamental about the implementation of AS.)

Thanks again,

-f

|||

Hello again Sir SliceDice. I am not sure about your business problem but most design approaches in SSAS2005 simply follows the same rules that you have with joins between tables in the relational world.

You have one-to-many, one-to-one and many-to-many.

Referenced dimensions can be subsituted with views, like I suggested before, or by having both a resellers key and and geography key in the fact table.

I suggest you to have a look here ( www.kimballgroup.com ) for guidelines of how to solve your business problem.

To solve business problems when the structure is fixed in a starschema, with MDX or dimension relations, is always harder than changing or rewrite the structure of the data warehouse.

I know that design changes in a data warehouse will effect the report side negatively, but I have no clear idea of why Microsoft have added referenced dimensions to SSAS2005? I have a hard time finding out why I should use them except when something is wrong with the design of my data mart /data warehouse.

Kind Regards

Thomas Ivarsson

|||

Thomas,

Thanks again for your reply. I agree that we can work around the limitations of the current implementation of the UDM via a view, or via recording the additional geography attributes in the fact table or the (Resellers in the case of AdventureWorks) dimension table.

BUT, let me ask: is geography a useful abstraction in its own right? Of course it is! It makes a lot of sense to seperate and isolate the geography information/attributes to their own table; it makes a lot of sense for lots of reasons to use a "snowflake" approach for geography and reseller. But now having the snowflake, we need to incorporate geography hierarchies/levels into the Reseller dimension OR use a referenced dimension. In the case of the Customer dimension, the AdventureWorks demo takes the former approach (Geo in the dimension); in the case of Reseller, it takes the latter approach (referenced dimension).

With the embedded snowflake approach (eg: Customer dimension), it is painful and wasteful to have to repeatedly design all the same hierarchies and attribute relationships, particularly if there is a lot of data related to the flaked dimension (Geography; e.g.: say you had geographical area data, population/census data, government data, etc). Do the users need to browse and/or calculate with all these items in all the related dimensions? Maybe, maybe not (but we can be sure the need will grow in the future) -- and while we can certainly work around the situation (views and the like), it still is a waste of time & effort to repeat the work.

And, in the realm of software development beyond the realm of data warehousing/business intelligence, this sort of repeating of effort and data is a non-starter. Sure, some exceptions from 3NF are reasonable and well justifiable, even if only to improve performance (especially with the strict gatekeeping administration of a strong ETL system), but wholesale duplication of dimension definitions/embedding sounds like a recipe for a lot of waste and problems with future change, as well as problems from unexpected behavior ("but I did it like that with the customer data, why doesn't it work right for the resellers -- they're in the same state!").... The problem of future change is especially significant: in my experience, it is particularly easy to break existing reports via changes to cube structure, making for very unhappy users (and the worst is when you break your own reports!): thus, solid upfront architecture is especially important.

Of course, to some extent, this is exactly the problem: these BI systems have originated in one world oriented to reporting and analysis, when my application, while still being described in those terms, is in reality quite a bit outside of the typical BI realm: I have pulled quite a bit of business logic into cube space, and have had great sucess with cubes as an effective general compute platform. After all, ALGORITHM+DATA STRUCTURE = PROGRAM, and in cubes we have a general purpose data structure (can model any data structure desired, and can do so very efficiently for exceptionally large structures), and we have a very descriptive algorithm implementation language (that allows effective blending of both declarative and procedural code efficiently over large data sets). The UDM was a huge step in this direction; with a few more improvements the AS05/UDM environment could be an ideal general compute platform....

But all that aside, back to my original question: is this the preferred approach to cross related dimension boundaries and grab properties?

linkmember([Reseller].[Geography Key].currentmember, [Geography].[Geography]).parent.parent.properties('Population', TYPED)

-f

|||

I don't know the complexities involved in why you chose a reference dimension, but I agree with Thomas that you can combine those two tables using a view and put all attributes into one dimension. Then you get auto-exists which should make it easier to efficiently get the population # you need.

It sounds like you're not going to go that direction, so optimizing the MDX is something to look into. That exact MDX you had above returned nulls for me because I don't believe there's a Population member property in the Geography dimension. (But it illustrated what you were doing, so it was fine.) I rewrote it as follows to test performance:

with member [Measures].[popu] as

Exists([Geography].[Geography].[Country].Members, linkmember([Reseller].[Geography Key].currentmember, [Geography].[Geography Key])).Item(0).Name

select Exists([Date].[Month Name].Members*[Reseller].[Reseller].[Reseller].members,,"Reseller Sales") on 1,

[Measures].[popu] on 0

from [Adventure Works]

In terms of performance, the following query outperforms it and returns just the same results I think. It's using Exists with a measure group to find the right combinations of Resellers and Geographies. You might consider trying this as I suspect it will perform better:

select Exists([Reseller].[Reseller].[Reseller].members*[Geography].[Geography].[Country].Members,,"Reseller Sales") on 1,

{} on 0

from [Adventure Works]

No comments:

Post a Comment