SQL Remove Duplicates without Using Temp Table or Modifying Existing Table

Up until now, the way I understood to "fix" a table with duplicate data and no primary key or unique key is either:
  1. Modify the existing table, add a column and populate with unique key data, then delete off that unique key, or
  2. SELECT DISTINCT * INTO [a temp table] FROM [current table], TRUNCATE FROM [current table], INSERT INTO [current table] SELECT * FROM [a temp table]
I am not saying one way is better than the other.  In fact, if our table is huge and we are worried about transaction log file usage during this operation, option #2 is probably our best bet.  Just like everything else in BI, we need to understand the data first before developing.  That being said, the following way is fairly elegant in my option.  I have included the test script and screenshot of the result.  I have tested this in SQL 2014, 2016, and SQL Azure; all successful.


Test Code:

-- test delete duplicate rows from table without changing table

CREATE TABLE dbo.DeleteDupFromViewTest

       (FirstName VARCHAR(100) NOT NULL

       , LastName VARCHAR(100) NOT NULL)

GO

 

INSERT INTO dbo.DeleteDupFromViewTest

              (FirstName, LastName)

       VALUES

              ('FirstNameOne', 'LastNameOne')

              , ('FirstNameTwo', 'LastNameTwo')

              , ('FirstNameThree', 'LastNameThree')

              , ('FirstNameThree', 'LastNameThree')

              , ('FirstNameThree', 'LastNameThree')

              , ('FirstNameOne', 'LastNameOne')

              , ('FirstNameTwo', 'LastNameTwo')

              , ('FirstNameTwo', 'LastNameTwo')

              , ('FirstNameThree', 'LastNameThree')

              , ('FirstNameThree', 'LastNameThree')

              , ('FirstNameOne', 'LastNameOne')

GO

 

CREATE VIEW dbo.vw_DeleteDupFromViewTest AS

       SELECT

              FirstName

              , LastName

              , ROW_NUMBER() OVER(ORDER BY FirstName, LastName) AS RowID

       FROM dbo.DeleteDupFromViewTest

GO

 

SELECT * FROM dbo.vw_DeleteDupFromViewTest

 

DELETE t1

       FROM dbo.vw_DeleteDupFromViewTest t1

              LEFT OUTER JOIN (SELECT FirstName

                                                , LastName

                                                , MIN(RowID) AS RowID

                                         FROM dbo.vw_DeleteDupFromViewTest (NOLOCK)

                                         GROUP BY FirstName

                                                , LastName

              ) t2

                     ON t1.FirstName = t2.FirstName

                           AND t1.LastName = t2.LastName

                           AND t1.RowID = t2.RowID

       WHERE t2.FirstName IS NULL

 

SELECT * FROM dbo.vw_DeleteDupFromViewTest

 

SELECT * FROM dbo.DeleteDupFromViewTest

 

DROP VIEW dbo.vw_DeleteDupFromViewTest

DROP TABLE dbo.DeleteDupFromViewTest


Screenshot of result:



Print | posted on Friday, May 4, 2018 9:11 AM

Feedback

No comments posted yet.

Your comment:





 

Copyright © Kevin Shyr

Design by Bartosz Brzezinski

Design by Phil Haack Based On A Design By Bartosz Brzezinski