1. Background
The
purpose of this article is providing a way of managing duplicate string data in
such a case when users will enter duplicate data as part of their job. Best
example of such kind of case is giving feedback for a product or service.
Whenever we give feedback for a product, many people may write “Product Quality is good”, for Service, some people can write for
service- “Service is Awesome” or “Great Service” or “Great Help” etc.
If we are product or service company and after delivering our product\service,
if we send invitation for survey feedback to 1 million people then there is chance
to get same answer(string) from many people. Storing the redundant data in
tables is not a good practice and redundant data may confuse us, it may reduce
performance and can create storage issue.
If we see
the same case in Data Warehousing then we need a Feedback Dimension that will
have FeedbackID and FeedbackText. Multiple FeedbackIDs for same Feedbacktext
will do redundancy in dimension and may create confusion while loading Fact
table.
We are
going to manage this condition by using CHECKSUM
function that returns the checksum value computed on data and a unique
clustered index over that checksum value column to restrict redundancy.
2. Steps to restrict redundancy using CHECKSUM function:
i.
Create the table in such a way that ID column should be
computed based on Text column using CHECKSUM function. Below is the example showing
a table named DimFeedback having FeedbackNote and FeecbackID column whose value
computes based on FeedbackNote for the particular row using CHECKSUM FUNCTION :
CREATE TABLE [dbo].[DimFeedBack (
CREATE TABLE [dbo].[DimFeedBack (
[FeedBackID] AS (CHECKSUM([FeedBackNote])),
[FeedBackNote] [NVARCHAR](MAX) NULL
)
ii.
Now create a unique non clustered index on FeecbackID
column to maintain unique FeedbackIDs in the table. (Optional step)
CREATE UNIQUE NONCLUSTERED INDEX [FeedbackID] ON
[dbo].[DimFeedBack] ([FeedBackID] ASC ).
CREATE UNIQUE NONCLUSTERED INDEX [FeedbackID] ON
[dbo].[DimFeedBack] ([FeedBackID] ASC ).
iii. Finally
insert the data in such a way that system first should check whether checksum
value for the input string is avalable in table or not. IF not avalable then
insert the data else do not insert the data.
DECLARE @FeedBack NVARCHAR(500)
SET @FeedBack =
'Thank you so much for your great service.'
IF NOT EXIST (SELECT 1 FROM FeedBack (NOLOCK) WHERE FeedBackID = CHECKSUM(@FeedBack))
BEGIN
INSERT INTO FeedBack
(FeedBackNote)
SELECT @FeedBack
PRINT 'Feedback
has been sucessfully submitted.'
END
ELSE
BEGIN
PRINT 'Feedback
string is already available in the system.'
END
3. Conclusion
By using above steps,
we can avoid data redundancy in the table using CHECKSUM() function
in SQL Server..
in SQL Server..
Generally speaking, SQL is an easy language to learn. If you understand programming and already know some other languages, you can learn SQL in a few weeks. If you're a beginner, completely new to programming, it can take longer. best Udemy courses to learn SQL
ReplyDelete