Data Integrity Check - A 6-Step Process

Data Integrity Check March 19, 2021

Hanh Truong

Introduction to Data Integrity

In today's data-driven world, metrics are constantly collected, mined, and used for making critical decisions. Many businesses depend on data to gain insight into their operations, financial health, and markets. In fact, a study from Sisense found that 55% of companies use data to improve efficiency and to forecast outcomes.

As large volumes of information are consistently compiled, organizations need to perform data integrity checks. Preserving and maintaining database integrity ensures the information is high quality and effective to use for business decisions.

What is Data Integrity?

Data integrity refers to the authenticity, accuracy, and consistency of data. One way to determine whether an organization's data has integrity is to look at its retrievability and accessibility. It is also important to look at whether the data is traceable and reliable. To ensure these factors are achieved, organizations will often create security measures for data integrity.

There are 4 common types of data integrity that businesses will preserve.

1. Entity Integrity

Generally, a database will have columns, rows, and tables. Entity integrity guarantees that each of these elements is never identical, nor null. With a primary key value, users can ensure that every field in the database has a unique identifier.

2. Referential Integrity

Referential integrity is when data from two or more tables have consistent and accurate data. This can be accomplished by making sure the foreign key value matches the values in the primary key.

3. Domain Integrity

Domain integrity refers to the authenticity and accuracy of inputs in a database. This involves determining a standard data type and format for a column. For instance, a database may require all monetary entries to include only 1 decimal and no commas.

4. User-Defined Integrity

User-defined integrity entails rules created by the user to fit their needs. Oftentimes, entity, referential, and domain integrity may not be enough to secure data accuracy. Therefore, users have to implement their own requirements.

Data Integrity Risks

According to statistics, an average business loses 30% of its annual revenue due to poor data quality. The following are various threats that can be attributed to low data integrity.

Human Error

Human errors can often occur unintentionally or maliciously. This is when individuals input inaccurate information, delete data, or duplicate entries. Additionally, data integrity is compromised when users do not follow established data entry protocols or when they make security mistakes.

Errors in Transmission

Transfer errors refer to when data does not successfully or accurately transfer from one system to another. This causes the metrics to be inconsistent in multiple databases.

Malware and Viruses

Malware and viruses, as well as other cyber threats, like bugs and hacking, can cause data to be altered, deleted, or stolen.

Compromised Hardware

Hardware systems can be compromised by accident or due to malfunctions. For example, a server may crash or a computer device may sustain physical damage during its transport. When hardware is impaired, data could be rendered incorrectly or it will become hard for users to access databases.

How to Preserve Data Integrity

To prevent risks and preserve data integrity, organizations should implement these best practices.

1. Validate Input

Before processing any data sets, organizations need to perform input validation. Information can either be provided by a known source or an unknown entity. While these entities may be end-user or another software system, they can also come from a malicious individual. Therefore, validation will verify that the input is correct and reliable.

2. Validate Data

Once the input is verified, business teams need to validate the data sets. This will ensure that the data process is not corrupted and that the incoming metrics are accurate. It is recommended that the organization determine specifications and important attributes of data to streamline this step.

For example, a business may require that all financial data be processed in U.S. dollars. Establishing this requirement from the start will ensure the metrics are validated correctly.

3. Remove Duplicate Entries

Confidential information from one database can sometimes be accessed in public documents, spreadsheets, or shared files online. Business teams should promptly remove any duplicate sources of data to prevent unauthorized access.

4. Perform Regular Back-Ups

Backing up data regularly will prevent accidental data loss and unintentional alterations. It also ensures organizations have an original copy of all their data in case of cyber attacks and threats.

5. Control Access

All database systems should have security systems in place to prevent hackers and unauthorized users from accessing information. These individuals can compromise the integrity of data and share sensitive information with the public. Software applications should have a form of access control for data security, like passwords and two-factor authentication. Hardware systems should be secured to a floor or wall to prevent theft.

6. Have an Audit Trail

In the case that a data breach occurs, organizations must perform an audit trail for their integrity checks. This will allow teams to pinpoint the cause of the threat and prevent impairments to data integrity in the future.

Generally, an audit trail includes tracking every event pertaining to the data, such as when metrics were created, deleted, read, and modified. Audits also entail identifying the user that accessed the system and when.