In our world of Big Data, not all data is formatted in our desired format. Oftentimes the solution to normalizing data seems simple but would require significant manual effort and time. In this post I will walk you through a SQL solution to normalizing a commonly found denormalized table. In the example below, we have multiple CustomerIDs associated with single Customer Names. The CustomerIDs are separated by commas and also have spaces thrown in here and there. The goal is to show this data so that a single CustomerID appears on each line with its corresponding Customer Name – and without superfluous spaces. This example would be easy to do manually, but imagine you have a table with hundreds or thousands of rows – it would take forever! Step 1) Create a table in SQL. Be sure to add an extra field for rowID as shown below. Step 2) Import your table into the newly created ‘Table1’. Step 3) Run the following procedure. If your table is separated by something other than a comma, change the @Separator variable. The first ‘While’ loop puts separates the CustomerIDs into new rows. The second ‘While’ loop removes superfluous spaces (use if necessary). WHILE @a <= @b
BEGIN
SET @a=@a
--If there's no comma, go to the next CustomerName
IF (
SELECT CHARINDEX(@Separator, CustomerID)
FROM Table1
WHERE rowID = @a
) = 0
BEGIN
SET @a= @a + 1
END
--If there is a comma, add a new row to the table with the same CustomerName, plus the extracted CustomerID.
--Then, delete that CustomerID from the original CustomerName / CustomerID list.
ELSE
BEGIN
SET @SeparatorCharNum = (
SELECT CHARINDEX(@Separator, CustomerID)
FROM Table1
WHERE rowID = @a
) END SET @a = 1
WHILE @a <= @b
BEGIN
IF (
SELECT CHARINDEX(@Space, CustomerID)
FROM Table1
WHERE rowID = @a
) = 0
BEGIN
SET @a = @a + 1
END
ELSE
BEGIN
UPDATE Table1
SET CustomerID = REPLACE(CustomerID, ' ', '')
WHERE rowID = @a
SET @a = @a
END
END SELECT * FROM Table1
GO Your final output will yield the following table: Michael Kovalsky
CustomerName
CustomerID
Walmart
32221, 492
Microsoft
27346990, 472
Exxon
1235
Target
5234, 28372936
Bank of America
5231
BMW
2839264,28284
CREATE TABLE Table1 (
CustomerName VARCHAR(255)
,CustomerID VARCHAR(255)
,rowID INT IDENTITY(1,1) PRIMARY KEY
)
DECLARE @Separator VARCHAR(1) = NULL
IF @Separator IS NULL SET @Separator = ','
DECLARE @SeparatorCharNum INT
DECLARE @a INT= 1
DECLARE @Space VARCHAR(1) = ' '
DECLARE @b INT= (SELECT MAX(rowID) FROM Table1)
INSERT INTO Table1
SELECT CustomerName, LEFT(CustomerID, @SeparatorCharNum - 1) AS CustomerID
FROM Table1
WHERE rowID = @a
UPDATE Table1
SET CustomerID = SUBSTRING(CustomerID, @SeparatorCharNum + 1, 255)
WHERE rowID = @a
END
SET @a = @a
CustomerName
CustomerID
rowID
Walmart
492
1
Microsoft
472
2
Exxon
1235
3
Target
28372936
4
Bank of America
5231
5
BMW
28284
6
Walmart
32221
7
Microsoft
27346990
8
Target
5234
9
BMW
2839264
10
Big Data Normalization
Published on 07/16/2013
Authored by admin