MDX Beginner to Professional ASAP

So been visiting a couple of sites recently to collect information on MDX from basic information to advanced but almost every site had some advantages to it and some disadvantages to it. Now just to make the information on MDX querying more concise, I am writing up this article. Lets start with the basics....

Step 1:
Syntax-->
SELECT Measure/Dimension on 0/Axis(0)/Columns,
Measure/Dimension on 1/Axis(1)/Rows,
Measure/Dimension on 2/Axis(2) [Note: From this point onwards... even though this MDX is valid, SSMS editor will not accept the information as it strictly adheres to cell information encapsulated between rows and columns].
. on 3/Axis(3)
.
.
.
from [Cube Name] where slicer information

Step 2:
Figure out what information it is that you exactly want from the cube......
Extremely important here is to follow the rules below:
Update the cube to take in a default measure.
ALTER CUBE [CUBENAME] UPDATE DIMENSION Measures, DEFAULT_MEMBER=DefaultMeasure;
Now once this is performed, all the infromation that you slice without explicitly specifying a measure will take the default measure used by the update statement as mentioned above.
SELECT Measures.DefaultMember FROM [CUBENAME] 


In the above diagram, we have the traditional Co-ordinate system with the positive X and Y planes indicated by straight lines, However when it comes to MDX we have to realize that barring the X and Y co-ordinates we can write queries across multiple axes as indicated by the dotted lines. Hence the term "Multidimensional Expressions". 
Coming back to the MDX aspects we divide the MDX into 3 categories-->
1] CellSpace/CellSet/Cell
2] Member
3] Tuple
4] Set
 
If we go with the standard definitions we could define that the CellSpace/CellSet/Cell (Cell) is the value of intersection between all the axes.
A member is any of the co-ordinate intersections which defines a particular behavior along one of the axes.
Another layer of abstraction over the member is called tuple and it basically is a collection of members from different access but share the same behavior.
A collection of tuples is called a Set.
 
Now we can get more clarity by reading the article as mentioned below (It is a quick read):
 
 
Once you have read this article, let us continue our foray into the world of MDX:-
General Syntax-->
With CalculatedMember as CalculatedMember formaula
Select  Measure/Dimension/CalculatedMember (Cell/Member/Tuple/Set) as Columns/0/Axes(0),
Measure/Dimension/CalculatedMember (Cell/Member/Tuple/Set) as Columns/0/Axes(1)
from [CubeName] where slicerinformation
Let us take a couple of examples from our AdventureWorks 2008 R2 cube.
In the following example we are taking the Measure amount and verifying it across the different members belonging to the Account dimension:
select [Measures].[Amount] on 0,
{[Account].[Account Number].[Account Number]} on 1
from [Adventure Works]
[This is the basic MDX statement that is normally written by Analysts and Dev to verify information]

Let us continue with other examples:
select
[Account].[Account Number].[Account Number] on 0 //Columns
from
[Adventure Works]

In this case we will still get a value for all the [Account Number] member's present in the Account dimension and the value we will get is the information pertaining to the default measure member as discussed earlier.

Now let us fetch the information for only one cell:
select [Account].[Account Number].[Account Number].&[4] on 0
from [Adventure Works]
This will fetch the Cell/Cell Space with the Reseller Sales (which is my default measure) for the Account numbered 4 from the Account dimension.

Now let us fetch a member using MDX:
select [Geography].[Geography].&[United States].Children on 0 from [Adventure Works]
The result will give us the cumulative Reseller Sales Amount for all the states belonging to the United States. So all the information returned represents one or more occurrence of the states belonging to the United States.

Let us discuss a couple of concepts here for Members:
Calculated Member:-
A calculated member is a custom calculation that we perform on the existing MDX query. The calculated member is generated from the storage engine once it determines that the expression is a valid expression. The following gives a return value of "TRUE" or "FALSE" based on the whether the reseller's internet sales is greater than the amount specified in the query:
WITH member x as
IIF([Measures].[Reseller Sales Amount] > 45450
,"TRUE"
,"FALSE")
select {[Measures].[Reseller Sales Amount],x} on 0,
{[Geography].[Geography].&[United States].Children} on 1
from [Adventure Works]

Here I am using the IIF statement which is similar to the if statement in traditional T-SQL. Here the concept is that if the condition is verified it returns the first argument else it returns the second argument : - IIF(Condition,Argument 1,Argument 2)
Something very similar to the IIF is the CASE function which can be utilized in the following manner:

WITH member x as
CASE WHEN ([Measures].[Reseller Sales Amount] > 45450)
THEN "TRUE"
ELSE "FALSE"
END
select {[Measures].[Reseller Sales Amount],x} on 0,
{[Geography].[Geography].&[United States].Children} on 1
from [Adventure Works]

The CASE statement in an MDX expression follows the T-SQL convention to the "T".
In the above MDX expressions, the calculated member x is derived by our custom expression which is using an IIF in one and a CASE function in the other.


Now let us move on to a tuple:
If we would like to consider the total internet sales per product sold in the st semester of a financial year(2007) like the following will fetch all the tuples in that section of the cube:
select
CROSSJOIN([Product].[Product].[Product].Members,
[Date].[Fiscal Semester of Year].&[FY H1]) on 0,
[Measures].[Internet Sales Amount] on 1
from [Adventure Works]
where
[Date].[Fiscal Year].&[2007]

The CROSSJOIN is similar to the CROSS JOIN in a T-SQL query.

Note: In the above query it is not necessary to provide the CROSSJOIN, however I feel that it is a good practice to use it because you know exactly what one is doing rather than write the query in the following manner-
select
([Product].[Product].[Product].Members,
[Date].[Fiscal Semester of Year].&[FY H1]) on 0,
[Measures].[Internet Sales Amount] on 1
from [Adventure Works]
where
[Date].[Fiscal Year].&[2007]

Also a very good approach when time constraints are less and someone is approaching MDX from a beginners perspective (or to Test the MDX query) is to verify it against the underlying datawarehouse (Let us verify it for a particular product say....."Mountain-200 Silver, 46":
select SUM(SalesAmount) from FactInternetSales
where ProductKey=356
and OrderDateKey in
(select datekey from DimDate where FiscalYear=2007 and FiscalSemester=1)
Ans: 103570.98

Let us take a look at the final key aspect which are Sets:
In the adventureworks datawarehouse the Fiscal Semester starts from the month of July every year, Now let us pull up the same information as the previous query for a set of tuples belonging to the 1st semester (July to December) for the year 2007 but let us restrict the set from July to September only:
select
([Product].[Product].[Product].Members,
[Date].[Month of Year].&[7]:[Date].[Month of Year].&[9]) on 0,
[Measures].[Internet Sales Amount] on 1
from [Adventure Works]
where
([Date].[Fiscal Year].&[2007],[Date].[Fiscal Semester of Year].&[FY H1])

In the above MDX query we are considering the set of months "[Date].[Month of Year].&[7]:[Date].[Month of Year].&[9]", we can replace this expression in the MDX query with the
expression "{[Date].[Month of Year].&[7],[Date].[Month of Year].&[8],[Date].[Month of Year].&[9]}"

We can verify the same at the DW (Adventure Works 2008 DataWarehouse) for the product "Mountain-200 Silver, 38" in the month of July with the following query:
select SUM(SalesAmount) from FactInternetSales
where ProductKey in (352,353)
and OrderDateKey in
(select datekey from DimDate where FiscalYear=2007 and FiscalSemester=1 and EnglishMonthName
in ('July'))

Quick Recap on what we have learnt so far:
1. What is an MDX query?
2. Cell/Cell Space
3. Member
4. Tuple
5. Set

Let us ramp up on some functions.... There are functions @ the member level and there are functions at the tuple leve. But before that there are some attributes of members or cell's that needs to be explained(In MSDN, these have been mentioned as functions but I feel that attributes is a better definition for the following).
Parent:
In a hierarchy, when we define the attribute relationships in SSAS, we also define a root attribute which can further be sliced along any of it's child attributes. And intermediate attributes are also parents of their respective children. A diagram to emphasize this is as follows:

If we would like to select the parent of the City attribute, which in our case is the State-Province we could go about doing it in this manner:
select [Geography].[Geography].[City].&[Coffs Harbour]&[NSW].PARENT on 0 from [Adventure Works]
Ans: New South Wales

The above MDX query gives us the parent for a specific city which in this case is New South Wales.

Children:
This will give us the child for any intermediate Hierarchy member present in the  above diagram-->
select [Geography].[Geography].[State-Province].&[BC]&[CA].Children on 0
from [Adventure Works]

It will give all the cities belonging to the State-Province of "British-Columbia"

Then we have the firstchild, lastchild, lastsibling, firstsibling and NextMember set of attributes.
select [Geography].[Geography].[State-Province].&[BC]&[CA].lastchild on 0 from [Adventure Works]
This will return the lastchild belonging to the State-Province of "British-Columbia". Since the order of the cities is in ascending order, the query will return "Westminster".

select [Geography].[Geography].[State-Province].&[BC]&[CA].firstchild on 0 from [Adventure Works]
This will return the firstchild belonging to the State-Province of "British-Columbia". Since the order of the cities is in ascending order, the query will return "Burnaby".

select [Geography].[Geography].[State-Province].&[BC]&[CA].lastsibling on 0 from [Adventure Works]
This will return the last member in the State-Province which is a sibling to the State-Province of "British-Columbia" which in this case will the state of "Quebec"

select [Geography].[Geography].[State-Province].&[BC]&[CA].firstsibling on 0 from [Adventure Works]
This will return the first member in the State-Province which is a sibling to the State-Province of "British-Columbia" which in this case will the state of "Alberta"

Moving on Let us consider the Hierarchy and Level functions which are provided to us.
Hierarchy Functions:
A] Dimension: This will return the dimension of the respective member used in the MDX query.

With member test as [Geography].[Geography].[State-Province].&[BC]&[CA].Dimension.Name
select test on 0 from [Adventure Works]
This will return the name of the dimension to which the State-Province of "British-Columbia" belongs to which in our case is "Geography".


B] Hierarchy: This will return the hierarchy of the respective member used in the MDX query.
With member test as [Geography].[Geography].[State-Province].&[BC]&[CA].Hierarchy.Name
select test on 0 from [Adventure Works]
This will return the name of the hierarchy to which the State-Province of "British-Columbia" belongs to which in our case is "Geography".

Level Functions:
As the name indicates, it is used to find all the level related information belonging to a specific member.
A] LEVEL: This will return the level (from the root) of a specific member used in the MDX query.
With member test as [Geography].[Geography].[State-Province].&[BC]&[CA].Level.Name
select test on 0 from [Adventure Works]
B] LEVELS: This is used to fetch a particular level in a hierarchy (or member relationship)
With member test as [Geography].[Geography].Levels(0).name
select test on 0 from [Adventure Works]
Ans: All
With member test as [Geography].[Geography].Levels(1).name
select test on 0 from [Adventure Works]
Ans: Country
With member test as [Geography].[Geography].Levels(2).name
select test on 0 from [Adventure Works]
Ans: State-Province
With member test as [Geography].[Geography].Levels(3).name
select test on 0 from [Adventure Works]
Ans: City

String Functions

A] UserName: This will return the current user's credentials under which the MDX queries are being executed.
WITH MEMBER test AS UserName
SELECT test ON COLUMNS
FROM [Adventure Works]

B] Name: This will give us the name of the dimension, measure or hierarchy (We have gone through some examples in the Hierarchy and Level Functions)

C] UniqueName: The only difference between Name and UniqueName attributes is the fact that the Unique Name will return the entire sequence of members from the root till the child member but the Name will only return the name of the member instance. Let us take a look at both:-
WITH
MEMBER ExampleUniqueName
   AS [Product].[Product Categories].UniqueName
MEMBER ExampleName
   AS [Product].[Product Categories].Name
SELECT
   {ExampleUniqueName
   , ExampleName
   }
   ON 0
FROM [Adventure Works]
Results:
    ExampleUniqueName                       ExampleName
[Product].[Product Categories]         Product Categories

D]MemberToStr: As the name suggests, this function is used to return a name for the member used. I have not used this function a whole lot of times but it is extremely useful when you want to send information between different teams as to what members need to be used for a specific pivot during a particular sprint.
 With member Test as
 MemberToStr([Geography].[Geography].&[United States])
 select Test on 0 from [Adventure Works]
The result from the following query is [Geography].[Geography].[Country].&[United States]

E] TupleToStr: As the name suggests, this function is used to return a name for the tuple used. I have not used this function a whole lot of times but it is extremely useful when you want to send information between different teams as to what tuples need to be used for a specific sprint.
With member test as
TUPLETOSTR(([Product].[Category].&[1],
 [Date].[Fiscal Semester of Year].&[FY H1]))
select
 test on 0
  from [Adventure Works]
The result from the following query is ([Product].[Category].&[1],[Date].[Fiscal Semester of Year].&[FY H1])


F] SetToStr: As the name suggests, this function is used to return a name for the Set used in the MDX query. I have not used this function a whole lot of times but it is extremely useful when you want to send information between different teams as to what Sets need to be used for specific pivots.
WITH member Test as
SetToStr([Date].[Month of Year].&[7]:[Date].[Month of Year].&[9])
 select (Test)
  on 0
  from [Adventure Works]
The result from the following query is {[Date].[Month of Year].&[7],[Date].[Month of Year].&[8],[Date].[Month of Year].&[9]}

G] LookUpCube: This function is extremely useful in order to fetch information from a cube located on the same server on which one is running queries against a different cube. A great example here is that one might create a history cube to collect statistics on the past, and there might be a need to compare the statistics from the History Cube to the current cube. In the following example I am executing the query from a different perspectives just to showcase how the function works.
Current Perspective: [Channel Sales]
Measure Considered/Cube: [Measures].[Customer Count]/[Direct Sales]
General Syntax: LookUpCube(CubeName,String or Numeric Expression)
WITH member Test as
LookUpCube("[Direct Sales]","[Measures].[Customer Count]")
select Test  on 0
from [Channel Sales]

Note: This query will always take a little time to execute, Do this only when necessary. So it is a peformance intensive function....

H] Properties: This function will get specific properties of a member based on values/expressions (these might need to be memorized) that a user requires. I do not use this function on a regular basis but I suggest going through the following link in case someone does feel the necessity to do so:
http://msdn.microsoft.com/en-us/library/ms144821.aspx

I] CoalesceEmpty: Now this is an MDX function that can have regular usage in almost everybody's queries. Similar to the Coalesce function in T-SQL, it basically return the first non empty (non nullable) value present in the argument set. The general syntax is CoalesceEmpty(MemberValue,value1,value2,.....)
WITH member Test as
CoalesceEmpty([Measures].[Customer Count],0)
select [Product].[Category].Members on 0,Test on 1
from [Adventure Works]
Wherever we get a null value, it will be replaced by the 0 digit.
Let us now move onto some Math functions:

Min: Will give us the minimum value in a given Set.
Example- We want to find the minimum discounted amount value for reseller sales based on the country-->
With member test as min(Geography.Geography.Country.allmembers,[Measures].[Discount Amount])
select test on 0
from [Adventure Works]

Max: Will give us the maximum value in a given Set.
Example- We want to find the maximum discounted amount value for reseller sales based on the country-->
With member test as max(Geography.Geography.Country.allmembers,[Measures].[Discount Amount])
select test on 0
from [Adventure Works]

Median: Will give us the median of a defined Set.
Going back to Math 101, the median of a defined set is not the average of a given set of numbers which will be the sum of all the numbers in a particular set divided by the count of numbers. Let us consider the set of numbers --> {12,21,34,45,51},
the average is (12+21+34+45+51)/5 = 32.6 but the median is taken by arranging the numbers in an ascending fashion and taking the number in the middle which in this case is 34
Example-We want to find the median of the discounted amount value for reseller sales based on the country-->
With member test as median(Geography.Geography.Country.allmembers,[Measures].[Discount Amount])
select test on 0
from [Adventure Works]
Var: This is the variance for a specific set of data points based on their distance from the Mean.
Let us take {1,2,3,4,5}
The Mean is 3.
The variance will be given by ((1-3)^2 + (2-3)^2 + (3-3)^2+(4-3)^2+(5-3)^2)/5 = (4+1+0+1+4)/5=10/5=2
Example-We want to find the variance of the discounted amount value for reseller sales based on the country-->
With member test as var(Geography.Geography.Country.allmembers,[Measures].[Discount Amount])
select test on 0
from [Adventure Works]

stddev: This is the standard deviation for a specific set of data. The standard Deviation will be the squareroot of the variance.
The Standard Deviation for the set of numbers {1,2,3,4,5} is basically sqrt(2)
Example -
With member test as stddev(Geography.Geography.Country.allmembers,[Measures].[Discount Amount])
select test on 0
from [Adventure Works]


Count: We have two different counts...Count(Tuple) and Count(Set)
Example - Count(Set)
With member test as Count(Geography.Geography.Country.allmembers,[Measures].[Discount Amount])
select test on 0
from [Adventure Works]

Count(Tuple)
With member test as
 Count(CROSSJOIN([Product].[Product].[Product].Members,
 [Date].[Fiscal Semester of Year].&[FY H1]))
select
 test on 0
 from [Adventure Works]

Aggregate: Aggregate operations are applied on Sets.
Example -
With member test as Aggregate(Geography.Geography.Country.allmembers,[Measures].[Discount Amount])
select test on 0
from [Adventure Works]


Let us now move onto some Date Related Functions. This is extremely important and is almost the fundamental key principles for business analysts to succeed when querying against Cubes.

Let us start with the highest level which is Year and keep moving down to lower levels like Quarter--> Month --> Week --> Day --> Hour

1. YTD (Year to date) This function will return the values for the specific year upto a specific date within the year. For example this following MDX query will give us the count of the customers for the entire year:

SELECT ([Measures].[Customer Count],YTD([Date].[Calendar Year].&[2003])) on 0 from [Adventure Works]

2. QTD(Quarter To Date) This function is used to return all the values belonging to a quarter defined within the cube. For example Jan - March might be considered as the first quarter and every 3 months from March can be grouped into a quarter group.

SELECT ([Measures].[Customer Count],QTD([Date].[Calendar].[Calendar Quarter].&[2001]&[3])) ON 0 from [Adventure Works]

3. WTD(Week To Date) This function returns a set of members from the date and time dimension starting with the first day of the week up to the specified member.

In case an invalid grain is given as part of the function then the following error will be raised: Executing the query ... Query (1, 37) By default, a week level was expected. No such level was found in the cube. Execution complete

4. MTD(Month to Date) This function is used to return the required aggregation from a measure based on the number of months(always the beginning of the year) till a specified date in a year

SELECT ([Measures].[Customer Count],MTD([Date].[Calendar].[Date].&[2])) on 0 from [Adventure Works]

5. We have a higher grain greater than the years which is the PeriodsToDate function which basically gives us all the information upto a particular period in time. The periods to date function utilizes any level within the time hierarchy to find the periodic information. The general syntax for this function is PeriodsToDate(LEVEL, MEMBER).

SELECT PeriodsToDate([Date].[Calendar].[Month], [Date].[Calendar].[Date].&[28]) on 0 FROM [Adventure Works]

6. Parallel Period is a function that is regularly used. Now parallel period is used to fetch information from a specific time range that occured but in a different time period. The genral syntax is ParallelPeriod( [ Level_Expression [ ,Index [ , Member_Expression ] ] ] ) Ex: The following will return the values from January 2003 as it is less than 3 quarters away from October 2003. SELECT ParallelPeriod ([Date].[Calendar].[Calendar Quarter],3,[Date].[Calendar].[Month].[October 2003])ON 0 FROM [Adventure Works] This is where our MDX function expedition comes to an end. The next part of the Article will deal with the storage and function engine in SSAS and how MDX queries are executed.

Comments

Popular posts from this blog

Rhino - ETL

Microsoft acquires LinkedIn

Redshift Experience