Removing duplicate records in a database

Questions or information that don't quite fit anywhere else

Moderator: Neosoft Support

Locked
User avatar
beno
Posts: 678
Joined: Fri Apr 01, 2005 9:03 am
Location: México
Contact:

Removing duplicate records in a database

Post by beno »

Hi,

Well my question is not about NB, is about DBs but I put it here, maybe someone can hekp me.

I have a NBDB with 11,000 records. Different persons use it and I'm affraid now it has duplicate records.

This database is a medical app wich stores patients records. Each record is for a particular patient. We have its name, sex, age, history, evolution, etc.

I know that if I sort the name field, duplicates will appear together... but if I have 11,000 it is going to be a bad idea to try to check the list manually.

There is no key for each record, just the fields described..

Mmmm, If I try to compare each one with the rest... this is going to be real slow...

Maybe I can compare each person name with the records that only starts with the same letter... as I type this some ideas are comming to my neurons...

Thanks for your time...

Saludos

beno
User avatar
Leos
Posts: 178
Joined: Mon Apr 04, 2005 11:13 am
Location: Coimbra, Portugal
Contact:

Post by Leos »

hi Beno,

My suggestion:

in this case i use dbfquery command

....
dbfQuery "sample.dbf" "Name==Beno AND cidade==mexico"

if [dbfQueryResult]>0
... you have great chances that it is a duplicated regist

so, use

dbfDeleteRecord
dbfshowall
dbfNext


you can do this procedure with a 'loop', or 'while'

I think this is not to slow with 11.000 records (i do this with a database with over 70.000 records, without problems... (just about one hour)

THIS is one more useful command that i suggest to include in the next update from this GREAT plugin

...find duplicated regists
User avatar
Gaev
Posts: 3782
Joined: Fri Apr 01, 2005 7:48 am
Location: Toronto, Canada
Contact:

Post by Gaev »

beno:
If I sort the name field, duplicates will appear together... If I try to compare each one with the rest... this is going to be real slow...
You only have to compare each current record with the previous (not all the rest) ... something like this ...

dbfSort "myDB" ...
SetVar "[Previous.Name]" ""
SetVar "[Previous.Age]" ""

Loop "1" "[dbfRecordCount]" "[counter]"

If "[myDB.Name]" "=" "[Previous.Name]"
If "[myDB.Age]" "=" "[Previous.Age]"
... same as previous record ... delete it ?
EndIf
EndIf

SetVar "[Previous.Name]" "[myDB.Name]"
SetVar "[Previous.Age]" "[myDB.Age]"
dbfNext "myDB"

EndLoop

... the key question is going to be which of the duplicate records should be deleted ? or are they 100% identical ?

If looping through 11,000 records takes too long ... the only thing I can suggest is ... a faster (make that super fast) machine.
User avatar
beno
Posts: 678
Joined: Fri Apr 01, 2005 9:03 am
Location: México
Contact:

Post by beno »

Gaev,

You are right!

Once sorted the DB I have only to compare with previous and next.

Possibly do a loop if there are many duplicates.

Thanks a lot,

beno

PS thanks for your time too Leos!
User avatar
Sam Cox
Posts: 768
Joined: Fri Apr 01, 2005 7:30 am
Location: Loveland CO USA

Post by Sam Cox »

Hi beno,

Gaev's solution is a good one -- and it's now in my notebook!

Can you improve your medical record database program so that continued use will not result in more duplicate entries?

-- Sam
User avatar
beno
Posts: 678
Joined: Fri Apr 01, 2005 9:03 am
Location: México
Contact:

Post by beno »

Hi Sam,

Good idea, now I think I can add your idea: check the existence of records using the patient's name before adding it to the database...

Thanks for your help,

beno
cp4w
Posts: 533
Joined: Sun Apr 03, 2005 4:37 pm
Location: Great Neck, NY

Post by cp4w »

Have not the patients a personal code, like fiscal code or medical public service code?

Bye
Licia
Locked