Wednesday, October 18, 2006

Delete Duplicate Rows

It's easy to delete duplicate rows in Oracle using cursor and rownum. Here shows an example.

Create table and insert some duplicate rows.

CREATE TABLE DUP_TEST
(
COMPANY_ID VARCHAR(8),
COMPANY_NAME VARCHAR(80),
ADDRESS VARCHAR(80)
);

-- Create test data, dup by company id
INSERT INTO DUP_TEST VALUES ('1', 'Company One', 'Address1');
INSERT INTO DUP_TEST VALUES ('1', 'Company One', 'Address1');
INSERT INTO DUP_TEST VALUES ('2', 'Company Two', 'Address2');
INSERT INTO DUP_TEST VALUES ('2', 'Company Two', 'Address');
INSERT INTO DUP_TEST VALUES ('3', 'Company Three', 'Address3');

The duplicates are with company id 1 and 2. Let delete the duplicates.
set serveroutput on
declare
  rows_deleted integer := 0;
begin
  for cur in ( select company_id, count(*) cnt from DUP_TEST
               group by company_id having count(*) > 1 )
  loop
    delete from DUP_TEST
    where company_id = cur.company_id and rownum < cur.cnt;
    rows_deleted := rows_deleted + SQL%ROWCOUNT;
  end loop;
  dbms_output.put_line( 'records deleted: '  rows_deleted );
end;
/


Now show the result after deleting.
SQL> select * from dup_test;
COMPANY_ COMPANY_NAME ADDRESS
-------- -------------------- --------------------
1 Company One Address1
2 Company Two Address
3 Company Three Address3
It's a little bit diffcult to delete duplicate rows in SQL Server. I'll show one solution using a cursor. Consider the performance when you have large amount of data to search or delete.
declare cur cursor for select COMPANY_ID from DUP_TEST
declare @CompanyId int

open cur
fetch cur into @CompanyId

while @@fetch_status = 0
begin
if ( select count(*) from DUP_TEST where COMPANY_ID = @CompanyId ) > 1
begin
delete from DUP_TEST where current of cur
end
fetch cur into @CompanyId
end
close cur
deallocate cur
go
It goes through the full table and search duplicates for each row, if found, delete the current of cursor.

Monday, October 16, 2006

Ref: moving from Sybase to SQL

This is a good article discussing the difference between Sybase and SQL Server.

Thursday, October 05, 2006

SQL Server - Append Query Result to .csv file

While searching on the web, I found a lot talking about appending SQL Server data to Excel files. But not many about appending data to .csv file. After some research, I got one.

First, create the csv file and prepare the column header:
master..xp_cmdshell 'echo EmpID,LastName,FirstName > C:\temp\Employees.csv'
It creates a file c:\temp\Employees.csv with 3 columns.

Then, append the data using OPENROWSET:
INSERT INTO OPENROWSET('Microsoft.Jet.OleDB.4.0',
'Text;Database=C:\temp',
[Employees#csv])
select EmployeeID, LastName, FirstName from NOrthwind..Employees
(9 row(s) affected)
Open the file c:\temp\Employees.csv, you'll see there are 9 rows appended.
Some points here are: 1) Database=C:\temp (need to put the folder here only), and 2) [Employees#csv] (need to replace . with # as the table name.

OK, it's similar to read the file back:
select * from OPENROWSET('Microsoft.Jet.OleDB.4.0',
'Text;Database=C:\temp', [Employees#csv])

1 Davolio Nancy
...
9 Dodsworth Anne
At the end, I'll show another way to read the data in csv file:
select * from OpenRowset('MSDASQL',
'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=C:\temp;',
'select top 3 * from "Employees.csv"')
It seems you cannot append to csv using MSDASQL.