Deleting duplicate rows in SQL Server


Today I found a pretty nice way to delete duplicate rows in a table on SQL Server. I had a table with 25,000 rows where 7,500 rows where rows containing one or more duplicates. I was not very eager to manually delete these duplicate, so I started to googling for answers. I found many different approaches, but suddenly I found my answer at stackoverflow.com. This thread help me rewriting a simple SQL statement after I added an [Id] column to uniquely identify a row. The clue is to use Common Table Expression (CTE) in SQL Server together with the OVER() function to create an unique row number for all duplicates within the key expression (in my case [Name] column), and ordering by the [Id] column. I only want to keep the first row for each duplicate item. Therefore, deleting all [rowno] greater than 1.

WITH cte_duplicates AS 
(
	SELECT [Name], 
		row_number() OVER
		(
			PARTITION BY [Name] ORDER BY [Id]
		) AS [rowno]
	FROM [dbo].[fm2011]
)
DELETE FROM cte_duplicates WHERE [rowno] > 1
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s