Simran Jindal's Blog

my effort to understand what the "it depends" really depends upon

Drill-Across Reports

leave a comment »

In a data warehouse system there can be multiple star schemas addressing differing business processes because it is difficult to find a subject area that can be fully described by a single fact table. In almost every practical application, multiple fact tables will be necessary. Multiple star schemas result in some challenging reporting requirements; Facts need to be aggregated in multiple ways.

Drill-across reports are a type of reports which require merged query results from more than one star schema. The way to compare two business processes is called Drilling Across. This term usually causes quite a lot of confusion. Although the term drill is used, it is completely unrelated to drill-up, drill-down or drill-through capabilities of many reporting tools. Instead, this term means “Crossing Multiple Processes“.

Below is an example of two fact tables, FactOrders (measuring the order process) and FactShipments (measuring the shipment process). The two facts in this example will be analyzed by Date, Product and Customer dimensions.

Drill-Across example

Drill-Across example

In order to summarize orders and shipments for May 2012, it is first necessary to summarize the individual orders and shipments separately at the desired level or dimensional details. The two result sets will contain at most one row for each Product given both the results sets have the same dimensionality. These two result sets can now be merged using the FULL OUTER JOJN in SQL language. A FULL OUTER JOIN includes all data from both the result sets, even if there is a row in one result set without a corresponding row in the other result set. During the merge process you can produce comparison of various metrics. For example, a ratio of orders to shipments can be added.

Cross-Process Ratios – These are really powerful measurements. These measures are often lost in the metadata, because they do not correspond to a single column or a single table. It is important to document these rations at design time, highlighting the interaction between business processes that various star schemas can support.

The above procedure is called Drill-across procedure. It is completed by successfully by decomposing the collection of data into discrete steps.

Step 1: summarize facts from each fact table at a common level of detail or dimensionality

Step 2: merge the intermediate result sets together created in Step 1 and add comparisons of the facts from different processes.

This technique can be used on two or more fact tables or on ‘n‘ number of facts tables, across multiple databases and even on data stored on RDBMs from different vendors. Drilling across can be also applied to a single star schema more than once to produce multiple useful comparison reports, for example “this year vs. last year” comparison.

Here’s the graphical representation of the Drilling-Across Procedure

Drill-Across Procedure

Drill-Across Procedure

.

For the Drill-Across Procedure to work, it is very important that the common dimensions be the same in each database, both in terms of structure and content. In terms of structure, the dimensions presence in each star schema allows the common dimensions to be retrieved by each Step 1 query. In terms of content, the identical representation of dimension values enables merging of the intermediate result sets in Step 2.

Hope this was helpful information. I’d be glad to hear your experiences about how you have handled drill-across reports in your data warehousing environments.

In the next post, we will look at how the Drill-Across procedure can be implemented at data warehouse load times and in what scenarios doing this would be helpful.

Disclaimer: How a process is defined or must every fact table correspond to one business process or similar questions are out of scope of this post as there are multiple design strategies. For this post the focus is Drill-Across reports and how to handle them when we need results from multiple star schemas.

Written by Simran Jindal

May 18, 2012 at 1:50 pm

How does a Star Schema Work?

leave a comment »

Ever wondered how does a start schema really work? I did, when I was new to BI and Data warehousing. It is really important to understand how a star schema is actually used. The basic understanding allows a dimensional designer to make intelligent choices.

Let’s look at a very simple example of star schema for an Order Process.

A sample star schema

A sample star schema

A note to the OLTP designers – Don’t react because this star schema is not in the third normal form. Remember, that a dimensional model serves a different purpose from and ER model.

In the above diagram we have four dimensions (Product, SalesPerson, Customer and Day) and a fact table FactOrder. Fact table is the core of the star schema. In addition to presenting facts, the fact table includes the surrogate keys that refer to each of the associated dimension tables. The three facts in the FactOrder table include QuantityOrdered, Revenue and Cost. It also includes surrogate keys that refer to Products, Sales Persons, Customers and Order Dates. In this example, all the surrogate keys together are used to identify a unique row. There are cases where these foreign keys or surrogate keys are not sufficient to identify a unique row but it is out scope for this post.

Each row in the fact table stores facts at a specific level of detail. This level of detail is known as the fact table’s grain. Now that we have the basics ready, let’s look at how is the start schema actually used?

Most queries against a star schema follow a consistent pattern. One or more facts are requested, along with the dimensional attributes that provide the desired context. The facts get summarized in accordance with the dimensions present in the query. Dimension values are also used to limit the scope of the query allowing filtering the data to be fetched and aggregated. Any relational database is well equipped to respond to such a query. Let’s say, I want a report showing the revenue by product category and product name for the month of January 2009. The star schema above can give these details even though the revenue fact is also available at a lower level of detail. The SQL query below produces the required results, summarizing tens of thousands of rows.

The SELECT clause of the query indicates the dimensions that should appear in the query results (category and product), the fact that is requested (Revenue), and the manner in which it will be aggregated (through the SQL SUM() operation).

The FROM clause specifies the star schema tables that are involved in the query.

The WHERE clause serves two purposes. First, it filters the query results based on the values of specific dimension columns (month and year). It also specifies the join relationships between tables in the query. In terms of processing time, joins are among the most expensive operations the database must perform;

The GROUP BY clause specifies the context to which the fact will be aggregated by the relational database.

The ORDER BY clause uses dimensions to specify how the results will be sorted.

For readers new to dimensional design, there are two key insights to take away.

First, the star schema can be used in this manner with any combination of facts and dimensions. This permits the star to answer questions that may not have been posed during the design process.

Although facts are stored at a specific level of detail, they can be rolled up or summarized at various levels of detail. The reporting possibilities increase dramatically as the richness of the dimension tables is increased.

Second, note that the ability to report facts is primarily limited by the level of detail at which they are stored. While it is possible to aggregate the detailed fact table rows in accordance with any set of dimensions, it is not possible to produce a lower level of detail. If a fact table stores daily totals, for example, it cannot be used to look at an individual order. The importance of this limitation depends in part on your data warehouse architecture.

Of course, star schema queries can get much more complex than this example. But The objective of this blog post is to give you an understanding how the star schema is actually used and also, why it is very important to think through the grain of your fact tables.

Star schema query interpretation

Star schema query interpretation

Start schema results

Start schema results

Happy learning!

Written by Simran Jindal

May 14, 2012 at 11:24 pm

NonEmpty() versus NON EMPTY

leave a comment »

This post is in continuation to my MDX techniques series.  To continue further, it is important to understand the difference between NonEmpty() function and NON EMPTY keyword as we would be using them a lot in the future.

Both the NonEmpty() function and NON EMPTY keyword are used to reduce sets but they do it in completely different ways.

The NON EMPTY keyword removes empty rows or columns or both, depending on the axis on which it is used in the query.  This means three things:

  1. NON EMPTY keyword is used only on the axes.
  2. NON EMPTY keyword is highly dependent on the members on axis and their values in columns and rows.
  3. NON EMPTY keyword pushes the evaluation of cells to an early stage wherever and whenever it is possible. This procedure brings a reduced set on the axis giving a huge performance advantage.

The NonEmpty() function on the contrary  can be used anywhere in the query. The NonEmpty() function removes all the members from its first set where the value of one or more measures in the second set is empty. If there is no measure specified, the function is evaluated in the context of the current member. This means that the NonEmpty() function is highly dependent on the members in the second set, the slicer or the current co-ordinate.

Hope this clarifies the distinction between the two available ways of reducing result sets.

Written by Simran Jindal

March 26, 2012 at 9:50 pm

Posted in MDX

My favorite BASIC MDX Techniques

leave a comment »

I have thought of putting together some of my favorite MDX tricks for a while now. Here is the first set of some basic tricks. All of the code examples have been tested on SQL Server 2008 R2 Analysis Services and on Adventure Works DW 2008 R2 sample database which can be downloaded from here.

Axis Skipping
Technically, we are not allowed to skip the axis but by providing an empty set on them we can either get only rows in the result set or only columns in the result set. Here is an example:

SELECT
{ } ON 0,
{ [Employee].[Employee Department].[Employee].MEMBERS } ON 1
FROM
 [Adventure Works]

The result of this query shows nothing on rows and all the employee names in a column. You can also define a measure returning null, 0 or 1 or any other constant and use that instead of an empty set. I prefer to use the empty set.

Division by Zero
This is a very common task pretty much like in relational queries. A division by zero occurs when the denominator is null or zero and the numerator is not null. To prevent this error, we need to check if the denominator is 0 using the IIF() statement. Here is an example:

WITH
MEMBER [Date].[Calendar Year].[CY 2006 vs 2005] AS
[Date].[Calendar Year].[Calendar Year].&[2006] /
[Date].[Calendar Year].[Calendar Year].&[2005],
FORMAT_STRING = 'Percent'
SELECT
{
 [Date].[Calendar Year].[Calendar Year].&[2006],
 [Date].[Calendar Year].[Calendar Year].&[2005],
 [Date].[Calendar Year].[CY 2006 vs 2005] } *
 [Measures].[Sales Amount] ON 0,
{
 [Product].[Product Categories].MEMBERS
}
ON 1
FROM
 [Adventure Works]

The above query returns 1.#INF on some cells for the measure [CY 2006 vs 2005] which is a ratio of Sales Amount between year 2006 and 2005. You will also notice that 1.#INF (the formatted value for infinity) appears on rows where Sales Amount for CY 2005 is NULL.

To solve this problem we need to wrap up the measure in an outer IIF() statement. Here is how the query looks now

WITH
MEMBER [Date].[Calendar Year].[CY 2006 vs 2005] AS
IIF
(
 [Date].[Calendar Year].[Calendar Year].&[2005] = 0,
 null,
 [Date].[Calendar Year].[Calendar Year].&[2006] /
 [Date].[Calendar Year].[Calendar Year].&[2005]
),

FORMAT_STRING = 'Percent'
SELECT
{
 [Date].[Calendar Year].[Calendar Year].&[2006],
 [Date].[Calendar Year].[Calendar Year].&[2005],
 [Date].[Calendar Year].[CY 2006 vs 2005] } *
 [Measures].[Sales Amount] ON 0,
{
 [Product].[Product Categories].MEMBERS
}
ON 1
FROM
 [Adventure Works]

I highly encourage you to read Jeffery Wang’s article explaining the details of the IIF()
http://mdxdax.blogspot.com/2011/01/mdx-iif-execution-plans-and-plan-hints.html

Setting default member of a hierarchy
Default member of a hierarchy can be set in the following three ways:
using the DefaultMember property found on every attribute
role, on Dimension Data tab.
MDX script
Let’s take a look at an example:

WITH
MEMBER [Measures].[Default Sales Territory] AS
[Sales Territory].[Sales Territory Country].DefaultMember.Name
SELECT
{
 [Measures].[Default Sales Territory],
 [Measures].[Sales Amount]
 } ON 0
FROM
[Adventure Works]

If you run the above script, you will see “All Sales Territories” as Default Sales Territory. To change the default Sales Territory using MDX, open the Adventure Works DW 2008 cube, go to the Calculations tab and choose the Script View. Paste the following script just below the Calculate command

Alter Cube
 CurrentCube
 Update Dimension [Sales Territory].[Sales Territory Country],
 Default_Member = [Sales Territory].[Sales Territory Country].&[Canada];

Save and deploy. Run the previous query again and you will see Canada as the Default Sales Territory.

Caution: Be careful when you set default members for dimensions especially dimensions with multiple hierarchies as it can lead to unexpected, incorrect or no results at all.


Using NonEmpty() function

NonEmpty() function is a very powerful function because it helps to reduce the result sets in a very fast and efficient way unlike Filter() and Existing() functions. Let look at an example. If you run the following query, it returns a result set which shows Internet Sales per Customer and Date and shows only those combinations where the sales are greater than 1000 USD. This query takes a little over a minute and returns 716 rows

SELECT
 { [Measures].[Sales Amount] } ON 0,
Filter
(
 { [Customer].[Customer].[Customer].MEMBERS } *
 { [Date].[Date].[Date].MEMBERS },
 [Measures].[Internet Sales Amount] > 1000
) ON 1
FROM
 [Adventure Works]

Now if we introduce the NonEmpty() function and run the query again, it runs in under 3 seconds returning the same number of rows.

SELECT
 { [Measures].[Sales Amount] } ON 0,
NON EMPTY
Filter
(
 NonEmpty
 (
 { [Customer].[Customer].[Customer].MEMBERS } *
 { [Date].[Date].[Date].MEMBERS }
 ,[Measures].[Internet Sales Amount]
 ),
 [Measures].[Internet Sales Amount] > 1000
) ON 1
FROM
 [Adventure Works]

So why does this happen? The Cartesian product between the Customer and Date dimensions has several combinations. The Filter() function is not optimized to work in the block mode and therefore, a lot of calculations have to be made by the engine to evaluate the set on rows. NonEmpty() helps in reducing the sets and is very effective in multi-dimensional sets resulting from cross-join operations. Because the set is reduced, the query runs much faster. In this little example using NonEmpty() reduced the query execution time from 1 minute 4 seconds to just under 2 seconds.

Using Non_Empty_Behavior property
This property is used to tell the analysis services engine to NOT perform any calculations if the dependent expressions are empty. Setting this property correctly, can give tremendous performance improvements in query execution time as because the Analysis Services engine needs to scan through fewer number of cells. Let’s look at a measure called “Internet Average Unit Price”. The measure is quite self-explanatory. It calculates the Average Unit Price for online products. The Non_Empty_Behavior here tells the engine that if “Internet Unit Price” is empty then then don’t perform the calculation and just show an empty result.

Create Member CurrentCube.[Measures].[Internet Average Unit Price]

As [Measures].[Internet Unit Price]
 /
 [Measures].[Internet Transaction Count],

Format_String = "Currency",
Associated_Measure_Group = 'Internet Sales',
Non_Empty_Behavior = [Measures].[Internet Unit Price] ;

But in Analysis Services 2008, it is recommended to avoid the Non_Empty_Behavior altogether because the engine deals with non-empty cells in most cases. It can lead to degraded performance. A better approach is to do some performance testing with and without the script and make decisions based on the result whether to set this property or not.

Watch this space for more of MDX nuggets.

Written by Simran Jindal

March 15, 2012 at 8:28 pm

Posted in MDX

No more Vertipaq, it’s now called xVelocity in-memory technologies

with one comment

As almost everyone out there is aware of the launch of SQL Server 2012 last week. Along with the launch Microsoft announced xVelocity in-memory technologies. This is an umbrella name for all the in-memory technologies. This includes re-branding of the Vertipaq engine that runs inside PowerPivot and Analysis Services 2012 tabular to “xVelocity in-memory analytics engine” and the term xVelocity will be used to refer to the Column Store Indexes feature in SQL Server 2012 database.

All this re-naming and re-branding does cause some confusion but again we’ll get used to it :-) . I understand better with pictures and diagrams than plain text, so here is how I have sketched the new term “xVelocity”

xVelocity in-memory technologies

xVelocity in-memory technologies

Reference links:
xVelocity and Analysis Services
Introducing xVelocity in-memory technologies in SQL Server 2012 for 10-100X performance
SQL Server 2012 – RTM
SQL Server 2012 – Resources

Hope this helps!

Written by Simran Jindal

March 15, 2012 at 11:14 am

Posted in SQL Server 2012

#SQLPASS Business Intelligence Virtual Chapter Slides and References

with 5 comments

Thank you #SQLPASSBIVC for having me and thank you to everyone who attended my SQL Azure Reporting session today on the SQLPASS Business Intelligence Virtual Chapter.

The sample databases that I referred to during the presentation can be downloaded from SQL Server Database Product Samples.

Besides the reference links provided in the presentation, here is a great three-part series on SQL Azure Reporting Preview and Management Portal that should help you get started:

SQL Azure Reporting Services and Management Portal Walkthrough – Part 1
SQL Azure Reporting Services and Management Portal Walkthrough – Part 2
SQL Azure Reporting Services and Management Portal Walkthrough – Part 3

What I love about the SQL Azure Reporting is the ease of server provisioning in almost not time. When was it that setting up Windows Server, Database Server, Databases, Reporting Server and creating and publishing reports online could be done under an hour.  The service has evolved a lot since its first CTP and I am certain that most of the on-premise reporting features will be eventually incorporated in Azure Reporting in due course of time.

Here are the slides from the session today. Please feel free to post your questions or comments. I’d be happy to answer them and blog about them.

Written by Simran Jindal

March 8, 2012 at 10:40 pm

Posted in Azure, Cloud

Azure SDK Install – Failed to Install Dependencies

with 2 comments

This is a very quick post about installing the Azure SDK. If you are trying to install the Azure SDK using the Microsoft Web Platform Installer and have run into an error that says that the following four dependencies could not be installed.

  • IIS: HTTP Redirection
  • IIS: Logging Tools
  • IIS: Tracing
  • IIS:CGI

The simple solution to this error (instead of going through the install log files, getting fooled by the multiple resstarts required and the worst of all assuming that Microsoft Web Platform Installer is actually trying to install these dependencies) is to install these Web Server Roles yourself before installing the Azure SDK.

Since I was trying to install the Azure SDK on a Windows 7 machine, all you have to do is to go to Start->Control Panel ->Programs->Turn Windows Features On and Off

Turn Windows Features on or off

Turn Windows Features on or off

Click on Turn Windows features on or off and you will get another window with all the features currently installed on your machine.

Turn Windows features on or off

Turn Windows features on or off

Expand Internet Information Service, under it expand World Wide Web Services and under it expand Application Development Features, Common HTTP Features and Health and Diagnostics and select the four IIS dependencies in each of the categories as highlighted in the screen shot below:

IIS Web Roles

Once you are done installing these roles, restart the MicrosoftMicrosoft Web Platform Installer and install Azure SDK again. The installation problems will go through without any problems.

Azure SDK Install Complete

Hope this helps!

Written by Simran Jindal

March 5, 2012 at 12:45 pm

Posted in Azure

SQL Server 2012 RC0 – Tabular Projects – Creating Calculated Columns, Measures and KPIs

leave a comment »

I am getting back to continuing writing on SQL Server 2012 – Tabular Projects. A series that I started back in September 2011 and then dropped off. If you are new to Tabular Projects I encourage you to read my earlier posts about them here.

This blog post will walk you through creating calculated columns, measures and KPIs in Tabular Projects. For the tabular series, I have a hyper-v virutal machine with Windows 2008R2 and SQL Server 2012 RC0 (in Tabular Mode) installed. I also have the AdventureWorks sample database which you can download from here.

To create a new SSDT (formerly known as BIDS) project, go to Start -> All Programs->Microsoft SQL Server 2012 RC0 ->SQL Server Data Tools. On the File Menu select New Project. Select the category Business Intelligence, then the Sub-Category Analysis Services, as show in the screen shot below.

NewProject.png

Give whatever name you like to your project. Go to Model->Import From Data Source->Microsoft SQL Server. You will be prompted to create a connection to an SQL Server. I connected to the local tabular instance installed on my virtual machine and selected the AdentureWorksDWDenali database.
The next step will prompt you to select the user account. A windows account or the SQL Server service account. For my instance, I selected a windows account which had access to the database.
In the next step, choose the option “Select from a list of tables and views to choose the data to import”
In the next step, select the tables you want. For this post, I selected DimCustomer, DimDate and FactInternet Sales”. Click Finish. This will import the data into the tabular project.

Once the import is complete, the diagram view should look like the screen shot below. Note, that DimDate is a role-playing dimension and the appropriate relationships between the tables are already interpreted.

cmk_diagram.png

Now we are ready to create calculated columns, measures and KPIs. In order to that, you need to use the Data Analysis Expression Language (DAX). DAX is considered quite similar to the Excel formulas but in my personal opinion it does have its own learning curve. DAX works with the Vertipaq engine to quickly perform calculation on large volumes on in-memory data.

Calculated columns are the colums that you add to an existing table in the tabular model. The value of the column is calculated for each row at the time you create the column. It is recalcualted if the underlying data is refreshed. These values are static values that do not change when the end user slices the data in a PivotTable.

To create a calculated column in the tabular model, change the model designer so that it is in the Data View Mode. The Data View mode shows the data in an Excel-like sheet, with each table as a separate tab as shown in the screen shot below.

cmk_excel.png

Select the FactInternetSales tab. To create a calculated column, right click on any column and select insert column. At the top of the sheet is the formula bar where you enter the formula for the calculated column. For example, the screen shot below, shows a calculated column “Margin” in the FactInternetSales table. The formula entered in the formula bar is similar to how you add formulas in Excel. [Sales Amount] and [Total Product Cost] are columns that already existed in the FactInternetSales table. Also notice, that the values are calculated for each row.

cmkcc.png

A measure is a calculation based on the set of data being evaluated. Measures are often based on aggregate functions, such as COUNT and SUM. Measures are evaluated for each cell since the value in each cell is dependent on the row and column headers. As the end user applies different filters, the values are dynamically recalculated for the cells.

To create a measure you use the measure grid at the bottom of the table. You simply need to click on an empty cell in the measure grid and type the formula in the formula bar. The screen shot shows a [Internet Total Sales] measure that simply sums up the [Sales Amount] column.

cmk_m.png

The measures can be formatted in differnt formats for better presentation. You can find the different formats in the measure’s properties. In this case, I have selected currency and to get a nicely formatted measure value.

Key Performance Indicators are used to gauge performance against a pre-set goal and trend over a certain period. A KPI calculation includes a base value, target value, and a status threshold. The base value is the measure you are interested in analyzing, for example, the quarterly sales. The target value is the goal and you are comparing the base value to this goal, for example, quarterly sales quota. The status threshold defines how the comparison is interpreted and is often used with a graphic (red, yellow, green) to help users quickly see the performance.

To create a KPI, right-click on the measure you want to use for the base value and select Create KPI. This launches the Key Performance Indicator dialog. The following screen shot shows a KPI created using the [Internet Total Sales] measure as the base value and [Internet Total Product Cost] as the target value.

cmk_kpi.png

The next step in creating a KPI is to set the threshold values and choose an image to display for the values. The following screen shot shows the threshold values for the [Internet Total Sales] KPI. The threshold values here mean that if the [Internet Total Sales] are greater than or equal to 110% of the target value which in this case is [Internet Total Product Cost] then the company is doing good and the KPI is represented with the green indicator. [Internet Total Sales] between 100% and 110% of the [Internet Total Product Cost] is acceptable but is not considered a good indicator. Therefore, the KPI for such values will be represented as the yellow icon. [Internet Total Sales] less than or equal to the [Internet Total Product Cost] is not a good indicator and is not acceptable to the company. Therefore, the KPI for such values will be represented as the red icon.

cmk_kpi_threshold.png

You can process the tabular project and see the calculated column, the measure and the KPI in action by browsing the tabular project database on the SQL Server 2012 Analysis Services tabular mode instance.

Additional References:
DAX in the BI Tabular Model Whitepaper and Samples

Happy Friday!

Written by Simran Jindal

February 17, 2012 at 12:23 pm

MOLAP Storage Engine vs Vertipaq Engine

with 2 comments

With SQL Server 2012 we have a new way of working with Analysis Services. The Unified Data Model (UDM) in SQL Server 2005 and 2008 will be replaced by Business Intelligence Semantic Model (BISM) which is intended to bring together the relational and multidimensional models under a unified BI platform. BISM supports both traditional MULTIDIMENSIONAL models using the MULTIDIMENSIONAL ONLINE ANALYTICAL PROCESSING (MOLAP) storage engine and TABULAR models (relational data model consisting of tables and relations) using the Vertipaq engine.

The traditional MOLAP engine is optimized for OLAP using techniques such as pre-built aggregates, bitmap indexes and compression to deliver great performance and scalability. The Vertipaq engine is an in-memory column store engine that combines data compression and scanning algorithms to deliver fast performance. It does need any indexes or pre-build aggregations. Also, since all the aggreegations are done on the fly in memory, it also avoids costly I/O reads from disk storage.

Vertipaq engine (aka in-memory columnar database or column store engine) means that instead of considering rows of a table as a main unit of storage, it considers every column as a separate entity and data for every column in stored independently. Vertipaq has a very special storage architecture. For each column a dictionary of distinct values is created. The actual data in each column is stored as pointers to the dictionary. These pointers are bitmap indexes that reference the dictionary. Both the dictionary and bitmap indexes are highly compressed and are stored in memory and hard disk. When you are working with a tabular model, all the data is loaded into memory. The data on the hard disk is for backup purposes. The special storage architecture and data compression makes columnar storage ideal for ad-hoc reporting and analysis as data reads are blazing fast.

With all the benefits of columnar storage, there is an imporant catch. Columnar storage and its dictionary plus bitmap index architecture makes it very fast to query data for a single column. Querying multiple columns for a single row can be a computationally expensive affair. Also, columnar storage performs much better if the number of unique values in the dictionary is not too large. If you have large tables with high cardinality, reading such data could be very slow depening on the size of the tables, number of columns being read etc.

I highly recommend you to read an excellant article written by Marco Ruso
Optimizing High Cardinality Columns in Vertipaq

Hope this helps!

Written by Simran Jindal

February 16, 2012 at 11:44 am

PowerView – How does it affect the current landscape of SharePoint Reporting?

with 2 comments

This blog post is in continuation to my previous blog post on SharePoint 2010′s BI capabilities. In that blog post we looked at what is included in the SharePoint 2010′s BI stack. In this blog post, we will look at a tool called Power View, Microsoft’s new Data Visualization tool, which is part of the SQL 2012 release. It is quite interesting that Power View is part of the SQL Server 2012 release but needs SharePoint 2010 to run. In my opinion it is a big diversion compared to previous releases of SQL Server. On the other hand, it also shows the need for ready to use, end user data visulization tool.

Power View is a server side Silverlight based application which runs in a web browser. The intention of this product is to be able to present the data in a FUN and MEANINGFUL way using highly interactive visualizations, animations, smart querying and rich STORYBOARDING (a feature that allows users to embed charts and graphs in PowerPoint presentations making it possible to show the effects of data changes) presentation capabilities. With this tool we can forget all the complex multidimensional concepts like dimensions, measure groups etc and focus only on DATA PRESENTATION.

To be able to use Power View, you need SharePoint 2010 Enterprise Edition SP1 installed and SQL Server 2012 Reporting Services installed and configured in SharePoint integrated mode. Power View reports can be created against PowerPivot workbooks (which are based on BISM tabular model) OR BISM tabular model created and published in SQL Server Analysis Services. To know more about BISM and BISM tabular, I encourage you to read on of my earlier blog posts – So What is Business Intelligence Semantic Model or BISM Really?

The screen shot below shows an example of a Power View report featuring a number of visualizations, including tiles, cards, small multiples, a bubble chart, and a slicer:-

Power View Dashboard

In the screen shot above three main areas of the Power View designer are highlighted

  1. Ribbon (giving you the office 2010 like user experience)
  2. Canvas (the data presentation area)
  3. Field List (the top half is from the underlying data model and the bottom half shows the selections made for the visualization).

This ten minutes video SQLShorts: SQL 2012 Power View on SharePoint 2010 on channel 9 is a great resource to familiarize you with the visualization features of Power View.

Now, lets look at some fundamental questions:-

How does the availability of this tool affects what we currently have?

Is Power View a replacement for SQL Server Reporting Services tools like Report Designer in BIDS (Business Intelligence Development Studio) for Visual Studio, Report Builder or Performance Point Services in SharePoint?

Tools like Report Designer and Report Builder are used to create powerful data rich static reports by power users and IT pros for end users. The end user is just consuming the reports, they are not creating the reports themselves.

Performance Point Services is used for creating complex dashboards including KPIs and Scorecards, collecting data from multiple data sources. This tool is also used by advanced users.

Power View on the other is targeted at end users or business users. Business users have no idea about the underlying data models but they know the data, they know how they want their reports. Power View is aimed at presentation ready, Ad-hoc reporting or Self-Service BI so the business users can create compelling and interactive reports.

Power View is definitely a great complement to the SharePoint BI tool set making it possible to quickly VISUALIZE the data in a fun and interactive way. To me it provides a canvas for the creative mind to go wild with the data presentation capabilties.

Reference Links:

You can download the SQL 2012 BI Image at
http://www.microsoft.com/download/en/details.aspx?id=28802

SQL Server Virtual Labs
http://www.microsoft.com/sqlserver/en/us/learning-center/virtual-labs.aspx
Look at the Exploring Power View (SQL 140) lab and start playing around with Power View

Written by Simran Jindal

February 15, 2012 at 2:51 pm

Follow

Get every new post delivered to your Inbox.

Join 185 other followers