click on the below button to pay money for coaching using a card or paypal
« boxing versus denial | Main | space and time luxury »
Monday
Apr252022

dirty data and the database problem

Who isn't underwhelmed by the claims made for 'big data'.

The two problems which will never go away are 1) proliferating databases

and 2) unclean data.

Unclean data is the biggest problem with any plan to compare and utilise data. It is a purely human problem. Bad inputting, bad collection, multiple sources- these are the reasons for unclean data.

Data has to be inputted to fit the database in question. Transfering it to a new database you find it needs a different format, a different setting. But you input it as best you can- and so dirty data gets into the system.

Next- the number of databases is always rising. This violates the first rule of databases which is: only one database for one set of data. Why? because one of the databases will be neglected and the data will soon 'not match'. The classic case of this is when you have two address lists and forget to update both with every change of address. PRETTY SOON you won't know which database is correct.

The number of databases is always rising. So data is always at risk. We will always have this problem...

PrintView Printer Friendly Version

EmailEmail Article to Friend