Hi,
Well my question is not about NB, is about DBs but I put it here, maybe someone can hekp me.
I have a NBDB with 11,000 records. Different persons use it and I'm affraid now it has duplicate records.
This database is a medical app wich stores patients records. Each record is for a particular patient. We have its name, sex, age, history, evolution, etc.
I know that if I sort the name field, duplicates will appear together... but if I have 11,000 it is going to be a bad idea to try to check the list manually.
There is no key for each record, just the fields described..
Mmmm, If I try to compare each one with the rest... this is going to be real slow...
Maybe I can compare each person name with the records that only starts with the same letter... as I type this some ideas are comming to my neurons...
Thanks for your time...
Saludos
beno
Removing duplicate records in a database
Moderator: Neosoft Support
hi Beno,
My suggestion:
in this case i use dbfquery command
....
dbfQuery "sample.dbf" "Name==Beno AND cidade==mexico"
if [dbfQueryResult]>0
... you have great chances that it is a duplicated regist
so, use
dbfDeleteRecord
dbfshowall
dbfNext
you can do this procedure with a 'loop', or 'while'
I think this is not to slow with 11.000 records (i do this with a database with over 70.000 records, without problems... (just about one hour)
THIS is one more useful command that i suggest to include in the next update from this GREAT plugin
...find duplicated regists
My suggestion:
in this case i use dbfquery command
....
dbfQuery "sample.dbf" "Name==Beno AND cidade==mexico"
if [dbfQueryResult]>0
... you have great chances that it is a duplicated regist
so, use
dbfDeleteRecord
dbfshowall
dbfNext
you can do this procedure with a 'loop', or 'while'
I think this is not to slow with 11.000 records (i do this with a database with over 70.000 records, without problems... (just about one hour)
THIS is one more useful command that i suggest to include in the next update from this GREAT plugin
...find duplicated regists
beno:
dbfSort "myDB" ...
SetVar "[Previous.Name]" ""
SetVar "[Previous.Age]" ""
Loop "1" "[dbfRecordCount]" "[counter]"
If "[myDB.Name]" "=" "[Previous.Name]"
If "[myDB.Age]" "=" "[Previous.Age]"
... same as previous record ... delete it ?
EndIf
EndIf
SetVar "[Previous.Name]" "[myDB.Name]"
SetVar "[Previous.Age]" "[myDB.Age]"
dbfNext "myDB"
EndLoop
... the key question is going to be which of the duplicate records should be deleted ? or are they 100% identical ?
If looping through 11,000 records takes too long ... the only thing I can suggest is ... a faster (make that super fast) machine.
You only have to compare each current record with the previous (not all the rest) ... something like this ...If I sort the name field, duplicates will appear together... If I try to compare each one with the rest... this is going to be real slow...
dbfSort "myDB" ...
SetVar "[Previous.Name]" ""
SetVar "[Previous.Age]" ""
Loop "1" "[dbfRecordCount]" "[counter]"
If "[myDB.Name]" "=" "[Previous.Name]"
If "[myDB.Age]" "=" "[Previous.Age]"
... same as previous record ... delete it ?
EndIf
EndIf
SetVar "[Previous.Name]" "[myDB.Name]"
SetVar "[Previous.Age]" "[myDB.Age]"
dbfNext "myDB"
EndLoop
... the key question is going to be which of the duplicate records should be deleted ? or are they 100% identical ?
If looping through 11,000 records takes too long ... the only thing I can suggest is ... a faster (make that super fast) machine.