Data quality or How is missing data affecting the business reports?
The data that any business collects is named raw data. This data is unstructured.
The raw data needs to be interpreted to be useful and to give the insights.
Like people make mistakes, software has bugs, so data has quality problems.
Bad data quality leads to inaccurate reports, insights, and slows the decision making. Bad data quality can cause the economic damage with additional expenses, lost sales, and fines. Not to mention the lack of trust in data.
The criteria for quality data:
- Completeness in the data that is crucial. There should not be any missing information.
- Consistency:
no contradiction or conflict in data (same thing in two or more different
places).
- Timeliness:
is the information available when it is expected?
- Integrity: no missing
links, no orphaned records.
- Accuracy: is the
information reflecting real life values?
- Standardization:
database design should be using common standards or own variations.
Data quality, in my opinion, is one of the most important component of the data, since without the confidence and reliability in the data, no dashboard nor analysis created from that data is useful. Not to mention the efforts put into creating such analysis report.
Data accuracy is the most important criteria for high-quality data. The data must be correct to avoid faulty results. Inaccurate data needs to be identified and fixed.
Maintaining the high-quality data reduces the company’s costs for identifying and correcting the bad data, as well as operational errors and cost of bad decisions.
Therefore, having a high quality in the raw data is very important.
All the tasks to ensure the data has high quality are very valuable for many business stakeholders, management and decision makers.
**Fun fact: by 2025 the global data creation is projected to grow to more than 180 zettabytes. The byte prefixes go like this: each multiplies by 1000 (here talking in metric system, not binary)
- kilo, 1.000 or thousand bytes;
- mega, 1.000.000 or million bytes or 1.000 kilo bytes;
- giga, 1.000.000.000 or 1 billion bytes or 1.000 mega bytes;
- tera, 1.000.000.000.000 or 1 trillion bytes or 1.000 giga bytes;
- peta, 1.000.000.000.000.000 or 1 quadrillion bytes or 1.000 terabytes;
- exa, 1.000.000.000.000.000.000 or 1 quintillion bytes or 1.000 petabytes;
- zetta: 1,000,000,000,000,000,000,000 or 1 sextillion bytes (21 zeros)
- yotta; 1 septillion bytes or 1.000 zettabytes (24 zeros).