Monday, February 29, 2016

Slowly Changing Dimension Type II: My Simple Personal Alternative To Merge

I've been working with the SQL Server's merge statement for some time now. I truly believe it's a great feature. Alas, the world out there is not so perfect. You'll soon run into a few hurdles, sooner than you know it. They can range from cross server limitations, deadlocks due to performance degradation. The list goes on and on...
Google: sql server merge statement limitations

I found myself having to come up with a custom solution. yes, a lot has been written on SCD Type II. Reinvent the wheel much???

I will create two tables...
1. dbo.customer table. transactional table
2. dbo.dim_customer table dimension table

create table dbo.customer
(
customer_id int identity(1,1) not null,
customer_name varchar(50) null,
city varchar(50) null,
favorite_car varchar(50) null
)

create table dbo.dim_customer
(
customer_id int null,
customer_name varchar(50) null,
city varchar(50) null,
favorite_car varchar(50) null,
row_is_current int not null,
row_effective_date date not null,
row_expire_date date not null
)


1.
truncate table dbo.customer
dbcc checkident ('dbo.customer', reseed,1)
;
with
cte_customers as
(
select 'john' customer_name,'cape town' city,'jaguar' favorite_car union all
select 'paul' customer_name,'johannesburg' city,null union all
select 'peter' customer_name,'pretoria' city,null union all
select 'matthew' customer_name,'durban' city,null
)
insert into dbo.customer(customer_name,city,favorite_car)
select customer_name,city,favorite_car
from cte_customers

select * from dbo.customer
 [img1]

declare @not_expired datetime
declare @load_time datetime
set  @not_expired = cast('99991231' as date)
set  @load_time = cast(getdate() as date)

update t set
  t.row_is_current = -1,
  t.row_expire_date = @load_time
from dbo.customer s
  inner join dbo.dim_customer t on s.customer_id = t.customer_id
  and t.row_is_current = 1 and t.row_expire_date = @not_expired
where t.city <> s.city
  or t.favorite_car <> s.favorite_car


/*insert changes where row_is_current is -1*/
insert into dbo.dim_customer
select s.*,1 row_is_current,@load_time row_effective_date,@not_expired row_expire_date
from dbo.customer s
  inner join dbo.dim_customer t on s.customer_id = t.customer_id
where t.row_is_current = -1 and t.row_expire_date = @load_time

/*update changes set row_is_current to 0 where row_is_current is -1*/
update t set
  t.row_is_current = 0,
  t.row_expire_date = @load_time
from dbo.dim_customer t
where t.row_is_current = -1 and t.row_expire_date = @load_time

/*new*/
insert into dbo.dim_customer
select s.*,1 row_is_current,@load_time row_effective_date,@not_expired row_expire_date
from dbo.customer s
  left join dbo.dim_customer t on s.customer_id = t.customer_id
where t.customer_id is null

/*
deleted rows, how does everyone deal with these anyways
*/
;
with
cte_deleted_records as
(
select t.customer_id,row_is_current
from dbo.customer s
right join dbo.dim_customer t on s.customer_id = t.customer_id
where s.customer_id is null and row_is_current = 1
)
update dels
set dels.row_is_current = -999
--row_expire_date = @load_time /*maybe optional*/
from cte_deleted_records dels

select * from dbo.customer
select * from dbo.dim_customer
--truncate table dbo.dim_customer

[img2]


update dbo.customer
set  favorite_car = 'jaguar'
where customer_id = 1

update dbo.customer
set  favorite_car = 'mazda',
  city = 'bloemfontein'
where customer_id = 2

insert into dbo.customer(customer_name,city,favorite_car)
  values ('mary','port elizabeth','ford')

select * from dbo.customer
delete customer where customer_id = 1
[img3]
[img4]


Thursday, May 21, 2015

Twenty Signs Your SSAS Implementation Will Fail Even If You Did Hire The Right SSAS Consultant

This post first appeared here on Thomas Ivarsson's blog...and it was initially titled: "Twenty signs that you have hired the wrong SSAS consultant" 

These are some of my observations during my years with Analysis Services since OLAP Services in 1999.

  1. All dimensions have parent child hierarchies. There is absolutely no excuse for this. Use parent-child hierarchies only when you have no other option.
  2. The data source view has named queries with a lot of transformations, name changes of columns, derived tables, sub queries and unions in the same mess.  The ETL system has moved in to the cubes and the source data warehouse (If there is one) is not properly structured. It is a clear indication of that the back end data warehouse is a mess.
  3. The measure names are still not right so all measure group measures have visibility = false and are replaced by calculated measures.
  4. The dimension sources consist of 5 to 15 views joined together and have several layers of views on top of each other, the last with different naming standards. This makes it very hard to follow the ETL flow back to the source tables.
  5. The dimension keys are text based, not integers.
  6. All dimensions have no attribute relations in the user or natural hierarchies.
  7. The natural hierarchies in the source dimension tables or views have undetected many-to many relations that will make the distribution of data across dimension members random. This can happen when you have hidden relations in the source table between attributes that are not primary keys.
  8. The cube dimensions have a lot of report hierarchies, which is a hierarchy where the attributes do not have a one to many relations from the top level to the subordinate attributes. Color and the size of a product is an example.
  9. The cube has no aggregations.
  10. The cube consist of more than 15 dimensions in a single measure group and many of the dimensions consist mainly of single attributes with 1-10 members and no hierarchies.
  11. The cube has 50-200 measures in a single measure group. Sometimes dimensions moves in to measures like actual, budget and forecast measures.
  12. The cube has 5000 rows of MDX code when you copy and paste it in to word and use 8 as the font size. To be fair you can be forced to do this by business requirements but you should warn about it from a maintenance perspective. It is very hard to debug.
  13. Measures have null for many dimensions. This is a clear sign that the data model behind the cube is not properly designed.
  14. The cube has no check for division by zero errors in calculated measures so users will see ugly error messages in the table cells.
  15. The consultants do not know what UAT is, has never seen a test protocol and do not know how to do tests while developing the cube.
  16. The consultant thinks that Kimball and Inmon are two soccerteams.
  17. The consultants do not know how to write documentation for a cube except for dropping all the MDX scripts in tables in word.
  18. Dimensions key errors, while processing have been set to ignore errors and discard records in a production system. This is a sign of no handling of orphan dimension foreign keys in the fact table. SSAS has a great feature to take care of this with missing dimension keys.  With this approach the cube will have a subset of  the same data as in the source table.
  19. The cube has 4-5 customer dimensions, not one customer dimension with 4-5 natural hierarchies. This is a migration from AS 2000 without a review of the cube model despite all the changes that came in SSAS 2005 and later.
  20. There are no partitions in the cube when the fact table has 50-100 million records.

Tuesday, March 11, 2014

Compare Data from Two Tables

There comes a time when you need to compare data from two structurally identical tables. When that time comes, you can either go the Excel route (I respect Excel), ask your boss to lend you the company credit card and purchase commercial package from the likes of RedGate®, or you could just code the darn thing.

I must say however, that I've looked at many different options, I've tried SSIS, SQL Server Data Tools, RedGate® Data Compare, and SQL Server's less known about tablediff.exe. None of these actually gave me the results I needed, or I just wasn't happy with the output format.

I found some nifty little piece of code written by Roman Pekar and this sort of made the job easier. I improved it a little and added a little bit of dynamism to the script. Now you can pass any two identical tables and let SQL compare the data for you.

Without further ado, here is the code...Please feel free to improve it. I tried to comment it as much as I could.
alter procedure table_compare
@first_table varchar(20),
@second_table varchar(20),
@id_column varchar(20),
@column_list varchar(max) = null
as
/********************************************************************************************************************************
*********************************************************************************************************************************
*********************************************************************************************************************************
****** Author: Thato Mantai
****** Purpose: Compare data from two identical tables
****** Date: 11 March 2014
****** Version: 1.0
****** Credits: Roman Pekar (http://careers.stackoverflow.com/romanpekar , http://ru.linkedin.com/in/romanpekar/)
*********************************************************************************************************************************
*********************************************************************************************************************************
*********************************************************************************************************************************/
declare @xml_first_table xml   --xml structure to hold data from the first table
declare @xml_second_table xml   --xml structure to hold data from the second table
declare @sql_all_data varchar(4000)  --dynamic sql for storing unique records from both tables
declare @first_table_columns varchar(500) --stores all columns from first table
declare @second_table_columns varchar(500) --stores all columns from second table
declare @err_table_not_found varchar(200) set @err_table_not_found = 'Sorry, I can''t find specified table, "[plchldr_table_name]" in the current database.'
declare @err_key_column_not_found varchar(200) set @err_key_column_not_found = 'Sorry, I can''t find specified key column, "'+@id_column+'" in table "[plchldr_table_name]".'

if object_id(@first_table, 'u') is null  begin  set @err_table_not_found = replace(@err_table_not_found,'[plchldr_table_name]',@first_table)  raiserror(@err_table_not_found,18,1)  return - 1 end
if object_id(@second_table, 'u') is null  begin  set @err_table_not_found = replace(@err_table_not_found,'[plchldr_table_name]',@second_table)  raiserror(@err_table_not_found,18,1)  return - 1 end

if not exists(select 1 from sys.columns where object_name(object_id) = @first_table and name = @id_column)  begin   set @err_key_column_not_found = replace(@err_key_column_not_found,'[plchldr_table_name]',@first_table)   raiserror(@err_key_column_not_found,18,1)   return - 1  end
if not exists(select 1 from sys.columns where object_name(object_id) = @second_table and name = @id_column)  begin   set @err_key_column_not_found = replace(@err_key_column_not_found,'[plchldr_table_name]',@second_table)   raiserror(@err_key_column_not_found,18,1)   return - 1  end

/*let's start with a bit of housekeeping*/
if object_id('table_compare_all_data', 'u') is not null drop table table_compare_all_data; if object_id('temp_table_for_xml_1', 'u') is not null drop table temp_table_for_xml_1; if object_id('temp_table_for_xml_2', 'u') is not null drop table temp_table_for_xml_2;

/*get all columns from first and second, the else part still needs to be coded, suppose i could have used temp tables here...grrr*/
if @column_list is null begin  select  @first_table_columns = stuff(( select distinct top 100 percent '],[' + t.name from sys.columns as t where  object_name(object_id) = @first_table order by '],[' + t.name for xml path('')), 1, 2, '') + ']'  select  @second_table_columns = stuff(( select distinct top 100 percent '],[' + t.name from sys.columns as t where  object_name(object_id) = @second_table order by '],[' + t.name for xml path('')), 1, 2, '') + ']'  if @first_table_columns <> @second_table_columns  begin   raiserror('The two tables do not contain the same number of columns',18,1)   return - 1  end end

/*i suppose for this part i could have used "../ *[local-name()=sql:variable("@varname")]", but i ended up with errors. will look into this later */
set @first_table_columns = replace(@first_table_columns,'['+@id_column+']','['+@id_column+ '] as id_column') set @second_table_columns = replace(@second_table_columns,'['+@id_column+']','['+@id_column+ '] as id_column')
print @first_table_columns print @second_table_columns

/*build dynamic sql to store all data*/
set @sql_all_data = 'with cte_all_data as '+       '( '+        'select '+@first_table_columns+','+char(39)+'first_table'+char(39)+ ' source_table from ['+@first_table+'] '+         'union all '+        'select '+@first_table_columns+','+char(39)+'second_table'+char(39)+' source_table from ['+@second_table+']'+       ')'+      ' select * into table_compare_all_data from cte_all_data'
execute(@sql_all_data)

/*clean up the comma separated column list*/
if @column_list is null begin  set @first_table_columns = replace(@first_table_columns,'['+@id_column+ '] as id_column','id_column')  set @second_table_columns = replace(@second_table_columns,'['+@id_column+ '] as id_column','id_column') end
declare @sql_for_temp_1 varchar(500) declare @sql_for_temp_2 varchar(500)
set @sql_for_temp_1 = ' select top 5 * into temp_table_for_xml_1 from (' +       ' select '+@first_table_columns+' from table_compare_all_data where source_table = '+char(39)+'first_table'+char(39)+        ' except ' +        ' select '+@second_table_columns+' from table_compare_all_data where source_table = '+char(39)+'second_table'+char(39)+' ) as t1'      
set @sql_for_temp_2 = ' select top 15 * into temp_table_for_xml_2 from (' +       ' select '+@second_table_columns+' from table_compare_all_data where source_table = '+char(39)+'second_table'+char(39) +       ' except ' +        ' select '+@first_table_columns+' from table_compare_all_data where source_table = '+char(39)+'first_table'+char(39)+' ) as t1'
execute(@sql_for_temp_1) execute(@sql_for_temp_2)

select @xml_first_table = (select * from (select * from temp_table_for_xml_1) as a for xml raw('xml_first_table')) select @xml_second_table = (select * from (select * from temp_table_for_xml_2) as a for xml raw('xml_second_table'))
--print 'here'
--print @sql_all_data --print @sql_for_temp_2
--select * from table_compare_all_data
--select * from temp_table_for_xml_1
--select * from temp_table_for_xml_2
;with cte1 as (   select    t.c.value('../@id_column', 'nvarchar(20)') as id,    t.c.value('local-name(.)', 'nvarchar(128)') as name,    t.c.value('.', 'nvarchar(max)') as value   from @xml_first_table.nodes('xml_first_table/@*') as t(c)      ), cte2 as (   select    t.c.value('../@id_column', 'nvarchar(20)') as id,    t.c.value('local-name(.)', 'nvarchar(128)') as name,    t.c.value('.', 'nvarchar(max)') as value   from @xml_second_table.nodes('xml_second_table/@*') as t(c)       )
select   isnull(c1.id, c2.id) as key_value,    isnull(c1.name, c2.name) as column_name,    isnull(c1.value,'value not found in table '+ @first_table) as 'value_in_first_table', isnull(c2.value,'value not found in table '+ @second_table) as 'value_in_second_table'  from cte1 as c1   full outer join cte2 as c2 on c2.id = c1.id and c2.name = c1.name  where  not  (   c1.value is null and c2.value is null or   c1.value is not null and c2.value is not null and c1.value = c2.value  )
--table_compare 'user1','user2','uid'

Monday, March 10, 2014

How Can I Find How Much Space Each Table Uses In Powerpivot

If you are ever interested in finding out how much space each table uses in your PowerPivot model. This post might become useful.

Handling Slowly Changing Dimension Type II in PowerPivot

Think this is a very interesting, simple yet effective approach to handling SCD 2 in a BISM environment. This bypasses the need of having to follow the ofttimes long-winded ETL process, where you'd usually need to store all your historical data in relational tables. See Alberto Ferrari's post here PowerPivot and the Slowly Changing Dimensions

Thursday, September 12, 2013

Could not load file or assembly 'Microsoft.AnalysisServices.SharePoint.Integration

A lot of articles have been written about this particular issue. It seems like a really big issue with all the errors popping up everywhere, but in all honesty, it's bark is worse than its bite. I'll update this post later with a few online resources which I've found very useful while trying to solve this error.

The error usually goes something like this...

Error: Could not load file or assembly 'Microsoft.AnalysisServices.SharePoint.Integration, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. The system cannot find the file specified.
Option 1

Open up command prompt with elevated permissions and run this command

RD /S C:\Windows\Assembly\GAC_MSIL\Microsoft.AnalysisServices.SharePoint.Integration

In most cases you will get an Access Denied error. That is because w3wp.exe is holding onto the file. You will need to stop IIS and run the command again.

Option 2


1. After the installer fails disable the assembly cache viewer.
2. Go into the assembly and find the file Microsoft.AnalysisServices.SharePoint.Integration.dll
3. Copy this dll to the bootstrap location: C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2\x64
4. Remove the Powerpivot instance from the Program and Features .. SQL Server 2008 R2 Uninstall
5. Enable the assembly cache viewer and copy the Microsoft.AnalysisServices.SharePoint.Integration.dll into the GAC.
6. Run the installation process again to install the PowerPivot instance.

OLEDB vs ADO.Net error in SSAS Tabular Models

OLE DB or ODBC error: A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.; 08001; Client unable to establish connection; 08001; Encryption not supported on the client.; 08001.

So I come into the office pretty early, with a spring in my step, definitely amped up for the day ahead. I mean, who wouldn't be, especially if you are working with Tabular Models is SSAS. That aside...because ofttimes it can be short lived as was my experienced today. I try to load my table into the designer so I can start performing some DAX magic. Let's agree, DAX is magic!!! Comprendo? Good.

When I go into table properties and click "OK". I get an OLEDB or ODBC error.

The error reads as such:
OLE DB or ODBC error: A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.; 08001; Client unable to establish connection; 08001; Encryption not supported on the client.; 08001.
A connection could not be made to the data source with the DataSourceID of 'c11c7644-9785-4e7f-89bb-e227848edd62', Name of '<Data Source Name>'.

An error occurred while processing the partition 'PARTITION_NAME_b799baff-060f-4e30-8dcc-bd8aff962b25' in table 'PARTITION_NAME_b799baff-060f-4e30-8dcc-bd8aff962b25'.
The current operation was cancelled because another operation in the transaction failed.




This error is caused by ADO.Net, so I had to change my provider from SQL Server Native Client to OLEDB Provider for SQL Server.

I don't know why this happens, but it does. If at first the change doesn't seem to work. Just close and reopen the project because connections are cached. You already know how to change providers, but here goes anyway,...