1.
Background
The
purpose of this document is to provide details on importance of aggregation, how
effectively we can design aggregations, performance monitoring & usage
based aggregation.
2.
What is Aggregation and why
build aggregation
The aggregations
are the high level summaries of data. It is analogous to calculated value by
GROUP BY statements of T-SQL with some attributes. SQL Server pre-aggregates
the measures with respect to dimension attributes at some extent instead of
executing the query runtime. Pre-aggregation improve the query performance.
Following
are the reasons to build the aggregation:
I.
Performance is the top priority of all the OLAP system
and having aggregated data will helpful to easily retrieve the data. e.g. if we
aggregate the yearly data based on days in a year. Cube has to aggregate a
large no. of columns. In place of days wise, if we have aggregated data up to
months or quarter, we will get yearly aggregated data more quickly.
II.
Aggregations improve query performance by limiting the
amount of partition data that needs to be scanned to answer a query. Instead of
querying the fact level data of the partition, SSAS will query the aggregation
and because the amount of data in the aggregation is smaller, the query can be
answered more succinctly
3.
How to do logging and
Performance monitoring
1.
To enable the logging, go to analysis service instance
where your cube exists, Right click on the server èProperties
By default, Analysis
Services does not log queries into the query log table. To log queries, you
need to set Analysis Services properties. Change the below properties to enable
logging:
I.
Create QueryLog table è True will create a table
named OLAPqueryLog in DB based on connection string in
QueryLogConnectionstring.
II.
QueryLogConnectionstring will contain the ConnectionString
e.g Server name & DB name to create log table
III.
QueryLog sampling: The default value for this property is
10, meaning that 1 out of every 10 server queries is logged. I set to 1 to log
all the queries hit to cube.
2.
To Check whether table is created or not, connect the
OLTP server & DB based on ConnectionString provided and check whether table is available or not:
Table is
created.
3.
Browse the cube and check whether query information is
logging into table or not. Start the SQL server profiler also to see the query
performance:
Checking the logging information:
I
have browsed the cube for End Customer revenue fact and logging information
with fact name & query execution duration is properly logged in the
OlapQueryLog table.
By
using OLAPQueryLog table, we can get information about facts on which users are
mostly interested and frequently called facts
4.
Check the profiler for the logged information:
4.
How to design aggregation
Based on the above
statistical data, if we feel that query is taking time and performance of the
query can be improved. In this case, we can go for the designing aggregation
for Fact EndCustomerRevenue.
I.
For designing the aggregation, open the cube solution and
go to Aggregation tabè End Customer Revenue
fact. Here it is showing ‘unassigned aggregation design’, it means there is
explicitly aggregation designed for this fact.
II.
Select ‘Design Aggregations’ to design aggregation. It
will start the aggregation design wizard
Click next
III.
It will open Review Aggregation Usage window. Here if we
want to exclude any dimension attributes from aggregation, we can remove it to
avoid unnecessary aggregation.
The options are Default, Full, None, and Unrestricted. Full means that the
attribute must be included in every aggregation, while None means that the
attribute will not be in any aggregation. Default means that the SSAS engine
will examine the attribute and associated relationships and try to determine
whether or not to include that attribute in aggregations using a set of
internal rules. These rules are very restrictive and a lot of attributes get
excluded from consideration. Unrestricted means that the attribute will be
considered without regard to the internal set of rules. For the first creation
of aggregations, I just leave all attributes set to Default. We will keep analyze our cube and accordingly
will change the usage configuration of the aggregation. Click next
IV.
It will open the object count window, either we can enter
the estimated count value or wizard will automatically count the row. Entering
count values manually will be good in case of bulk amount of data because
wizard may take lot of time to count. While specifying the value manually, make
sure value should be nearer to exact count because the Aggregation Design
Wizard uses the object counts to estimate storage requirements.
Click on count to get the
estimated count
V.
Next is Aggregation option. It will provide an option to
design aggregation with respect to storage space or performance point of view.
In designing our cube, you will have to balance the
storage needs of the aggregation tables against the speed and performance of
the queries. The three approaches to achieving this balance are as follows:
·
Set the storage size and let Microsoft® SQL Server Analysis
Services determine which aggregations to store. This approach works well when
you have limited storage space.
·
Set the percentage of performance gain and let the
necessary aggregation tables take as much storage space as they need.
·
I click Stop: Manually determine the best balance by
watching the progress of the Performance vs. Size graph.
I am selecting Performance gain because for many
applications, performance is more important as compared to storage space and
Cube processing time.
VI.
Click start to design aggregation
It creates 274 aggregations
with occupying 446.5MB disk space. Save the aggregation name. We can either
deploy the changes now or we can save only. Click finish.
VII.
In the Cube aggregation tab, we can see the designed
aggregation:
5.
Monitor the performance of the
cube after aggregation design
Deploy the designed aggregation and
check the query performance by using profiler:
Fact End
Customer Revenue:
Fig showing query execution duration
before designing aggregation- 7952 Secs
Fig showing query execution duration
after designing aggregation- 4000 Secs
Now the same query that we executed in
section 3.4 taking only 4000 seconds
Net New
Quota
Fig showing the query execution
duration before designing aggregation- 1484 secs
Fig showing the query execution
duration after designing aggregation- 1484 secs
True up
Fig showing the query execution
duration before designing aggregation- 1344
Fig showing the query execution
duration after designing aggregation- 953
6.
Usage based optimization
The problem with the
Aggregation Design Wizard is that it assumes all queries are equallylikely. Therefore,
some of the aggregations it designs are likely to be useful, but some will not
be. More importantly, there will be aggregations you need that are not created
by the Aggregation design wizard. To create those aggregations that are
required but aggregation wizard does not created, we can use usage based
optimization.
Click on the advance view
under aggregation view:
It will display the aggregation designed with
respect to dimension attributes:
Based on the logging information, if
we found that we need some more aggregation then we can check the box against
the associated dimension attributes. If
we feel some of the aggregations are not required and wizard is creating
unnecessary, in this case, we can uncheck it. Removing unnecessary aggregations
will helpful to manage disk space for the partitions as well as it reduce the
cube process time.
7.
Disadvantages of more no. of
aggregations
i.
Storage space required for the aggregated table is more.
ii.
Processing time for the cube is more.
8. Conclusion
By using
the above steps, we can understand how to design aggregation to improve cube
query performance and based on user’s action, we can do optimize the
aggregation to provide best performance and adequate disk storage to the cube.
----------------------------------------End
of Document---------------------------------------------