What is Data Integrity?
Incorrect information can very easily ruin your day.
Missing the bus because of an outdated timetable, missing out on love because of an incorrect phone number, missing the point of a joke because someone told the punchline wrong - all of these day-to-day frustrations could be avoided with accurate data.
However, having accurate data is even more vital when it comes to business because the stakes are often much, much higher. Corrupt data can cost your company its reputation, customers and revenue. Gartner reported that the average financial impact of poor data quality is $15 million per year in losses.
Source: Forbes
Data Integrity Definition
In its most simplified sense, data integrity is the practice of ensuring data remains accurate, valid and consistent throughout the entire data life cycle. To understand the concept fully, you need to know that data integrity has two definitions, depending on the context in which you approach it.
First, we have logical data integrity as a process that ensures that data is kept accurate and consistent. The primary purpose of this process is to stop data from becoming compromised and, essentially, useless.
There are three basic types of logical data integrity:
- Entity integrity
When data is recorded in tables for relational databases, entity integrity ensures a primary key (id, name) for each column. No two columns have the same identity or are unnecessary (null). - Domain integrity
In data integrity, a domain is a predefined set of allowed values that data can take on when records are created, for example, the data type, the length, the range, limitations and constraints, the date format, etc. Domain integrity outlines these values and ensures that recorded data is restricted to these formats. - Referential integrity
In relational databases, two or more tables can be linked together in a ‘relationship’ because they contain related data. These relationships are created using a foreign key (the associated table) and a primary key (the primary or parent-table).
Referential integrity requires a valid primary key to be referenced in the parent table whenever a foreign key is used, thus ensuring consistency between these tables.
Second is the product of these processes, physical data integrity as a state, i.e., a data set that is accurate and valid. Here we are concerned with storing and fetching the data to ensure it is not corrupted by events such as power outages, natural disasters, corrosion, etc. For many businesses, the introduction of cloud storage has solved the threat posed by loss of physical data integrity.
Source: Forbes
Data integrity problems are likely to occur when…
If data integrity processes are not followed, it can, as we have already mentioned, have a high cost to business, research, and anyone attempting to make decisions based on that data.
Here are some scenarios and instances where data integrity can become compromised:
- Human error - these are mistakes in data recording or maintenance, ranging in intent from accidental to malicious. Human error occurs when data activities are recorded incorrectly or wrongly deleting files, etc.
- Transfer errors - unintentional corruptions that occur during the movement of data between devices, applications or databases. Alterations, duplications of records, or undetected failed transfers can all threaten data integrity.
- Cyber threats - software bugs, hacking and viruses/malware are examples of insidious attempts to gain unauthorised access to sensitive private data.
- Compromised hardware - device or disk crashes can cause data to be corrupted or lost entirely, meaning it is no longer usable for analysis or record keeping.
- Remote working - moving from IT team controlled enterprise content management systems to off-premise remote working solutions can cause a loss of data integrity. For example, if personal laptops or USBs are used to enhance data accessibility this can lead to leaks or misplacement of sensitive data.
Source: Harvard Business Review
How can you ensure that your data is accurate and consistent when it's generated, duplicated, accessed, and moved around your enterprise at such a rapid rate?
The FDA (Food and Drug Administration) has outlined some principles for those in the pharmaceutical industry to ensure data integrity when recording on paper or electronically. However initially intended, they have become widely circulated and accepted as the standard across all industries. The principles can be remembered by using the acronym ALCOA, which stands for:
- Attributable
This principle refers to the responsibility of data and the ability to trace any action to a single user. To ensure attributable data, any person who makes a data action (recording, transforming or moving data) must be identifiable as the person who took action.
Analysts must create data logs for every action to include the name, computer ID, date of the data action, etc.
- Legible
Simply put, this principle aims to ensure that data can be read and understood by everyone who accesses it - whether it is recorded on paper or electronically.
Ensure that data is recorded in standard terms and values so that even when the data-recorder has left an organisation, the data remains valid and usable.
- Contemporaneous
Data integrity processes should occur at the same time as the data activity or immediately afterwards. All data activities should be timestamped to ensure that analysts have a clear record of the date and time when they took place.
Back-dating or overwriting data activity logs is a threat to data integrity as it increases the likelihood of human error or data loss.
- Original
Data must be recorded as raw or source data in the original location. In other words, when recording any new data or data activity, you must ensure that you not only record it immediately but that you enter it into the correct system.
If data is quickly recorded in paper notes and then either transferred onto official forms or electronic databases, corruptions and errors in the data entry can occur. Original data must always be maintained as the true copy to ensure a data audit trail can be maintained.
- Accurate
Recorded data must be free from errors and complete. However, we all know errors can occur, and in these circumstances, corrections must also be recorded.
When recording data on paper, this usually means crossing a line through the mistake and adding a to mark the correction. But what about when we are using electronic data (which most of us now are)? Automated audit trails and edit checks should be a feature of any database you use to ensure that incorrect alterations are flagged and corrected before resubmission.
Source: Laafon
As the ALCOA acronym was adopted by more industries, including big institutions such as the WHO (World Health Organisation), it was expanded upon. As such, ALCOA is now referred to as ALCOA+ and includes the following further factors:
- Complete
The information recorded must be complete enough to recreate the event, test or analysis carried out. If all information is not documented or disclosed, it can undermine data integrity and reliability.
- Consistent
All recording procedures must be consistently carried out, in the correct order and with all time-recording apparatus synchronised to ensure accurate recording.
- Enduring
The material used to record the data must be maintained to ensure endurability. For example, if paper files are kept, they should now be backed up electronically; if electronic databases are physical storage, they should be backed up with cloud storage, etc.
- Available
Records should be accessible to the organisation for review, audit and analysis at any time during the data lifecycle.
Source: Laafon
Do you need data integrity for data integration?
As we discussed above, data integrity can be compromised when problems occur during the migration of data. In some forms of data integration, data is transferred and replicated between systems for communication and analytics - this is precisely when unwanted duplications or alterations occur.
Steps must be taken to avoid the corruption of data during the integration process:
- Data cleaning - the process of identifying and fixing corrupt, incorrect or inconsistent data.
- Data profiling - categorising and documenting the structure of source data to understand its content and forecast the probability for error. Data profiling can help identify integrity problems before they arise and quickly resolve them.
- Data quality rules - create custom business rules to identify which source data is accurate, essential and reliable.
Attempting to integrate data without adequate data integrity protocols can lead to wasted resources and inaccurate business intelligence, meaning you’re basing essential decisions on bad data. Not ideal.
Summing up
Poor data integrity will undermine any effort you take to be a data-driven business. It seems like common sense, yet it’s estimated that 3% of companies meet basic data quality standards.
Safeguarding data integrity can ensure:
- Quality in your products and services
- Confidence in your business intelligence and decision making
- Safety and data privacy for your customers
If you follow the principles laid out in this blog, you will be on the right track to keep your data accurate, valid and consistent, so that you can feel secure in the decisions you make with it.
Are you struggling to get the full picture when analysing campaign results? Book a free demo today and see how Hurree can help you transform your company reporting to improve your sales and marketing output 💌 Don't hesitate to get in touch via contact@hurree.co if you have any inquiries - we’re happy to chat!
Share this
You May Also Like
These Related Stories