Explained: data enrichment

Explained: data enrichment

How do your favorite brands know to use your first name in the subject line of their emails? Why do you seem to get discounts and special offers on products you’ve recently purchased? Businesses are able to personalize their marketing messages thanks to data enrichment.

Data enrichment applies to the process of enhancing, refining, and improving on raw data. It is usually the last step in constructing a dataset for a marketing campaign, but can be used for several other goals.

Contact enrichment is the most common form of data enrichment. Contact enrichment is the process of adding additional information to existing contacts for more complete data.

Consider, for example, the scenario where a database contains names and addresses, but is missing telephone numbers that sales teams will need to reach out to prospective customers. One option is to apply contact enrichment that can match the data that the existing database contains with the telephone numbers listed in another database.

Definition of data enrichment, extended

Data enrichment is defined as merging third-party data from an external authoritative source with an existing database of first-party customer data. Some organizations do this to enhance the data they already possess so they can make more informed decisions.

More broadly, data enrichment refers to processes used to enhance, refine, or otherwise improve on raw data. In this context, it encompasses the whole strategy and process needed to improve existing databases. This idea and other related concepts are essential in making data a valuable asset for almost any modern enterprise.

Data enrichment processes

Even though data enrichment can be accomplished in several different ways, many of the tools used to refine data in a dataset focus on correcting errors or filling in incomplete data. A common data enrichment process would, for example, correct likely misspellings or typographical errors in a database by using precision algorithms designed for that purpose. And some data enrichment tools could also add information to simple data tables.

Another way in which data enrichment can work is by extrapolating data. Through methodologies such as fuzzy logic, engineers can produce extra information from a given raw data set. This and other similar projects can also be described as data enrichment activities.

Data enrichment can also include the merger of data-tables into a new dataset by using corresponding fields. In layman’s terms: Companies can buy access to other databases and look for additional information about their customers, adding that information to their own database.

Privacy concerns

The merger or combination of data hardly ever happens after a subject has been asked for permission. This poses a privacy problem, as users typically have a reasonable idea about which information they have provided to a specific organization, but if organizations add information from other databases, this picture will be skewed. The organization will have information about them of which they are not aware.

As long as this is generally available information, the problem is minor. But consider the famous example of your insurance company getting hold of the data gathered on the client-card of your supermarket. Knowing what you buy and consume may be something you would rather keep from them.

There are some privacy regulations that limit data enrichment for this very reason. The General Data Protection Regulation (GDPR) is a regulation on data protection and privacy in the European Union (EU) and the European Economic Area (EEA). It also addresses the transfer of personal data outside the EU and EEA areas. GDPR allows customers to ask which information is present about them in an organizations’ database and have records or parts of the records deleted.

Since GDPR also regulates the exchange and transfer of personal data, this can severely limit an organization’s choice of data enrichment providers. In the GDPR terminology, any data provider you use is a “Data Processor.” In order to send any EU citizens’ data to any Data Provider for any purpose, including enrichment, you must have a Data Processing Agreement (DPA) signed with the vendor.

A DPA is a legally-binding contract that states the rights and obligations of each party concerning the protection of personal data. They are mandatory to establish a chain of responsibility for the use, and safety, of personal data.

Steps to successful data enrichment

There are a few things to be done before you embark on a successful data enrichment process:

  • Sanitize your own data, or you will end up paying for data you will never use. Getting extra information about non-existing people, or adding to incomplete records is a waste.
  • Determine your goals and purpose for the data enrichment exercise. Again, avoid paying for data that turns out to be useless. Don’t pay for data tables just because they are available. If you are not going to use them, skip them.
  • Determine which processes the enriched data will support. Will the projected return outweigh the cost?
  • Determine your target market in terms of account profiles and personas. Do you want the data for a subset of customers that meet certain criteria, or would you, for example, like to exclude residents of GDPR-enforcing countries?

Sanitizing not only means removing duplicates, but also checking the validity of older data and the usefulness of entries that were filled out by customers or prospects themselves—on your website, for example.

Once you have determined your goals and decided which data are crucial to achieve these goals, then start looking for a data provider. Some may be more expensive but stronger in a certain data field. You can maximize your success by finding the data provider that best fits your needs.

Not all data enrichment makes you rich

Keep in mind that buying—and storing—the extra data will cost you. Data needs to be backed up and protected, and the storage costs can amount to a pretty sum depending on the size of the datasets. And if the data is not kept up to date, then it may soon become worthless.

Finally, if you’re ever breached, the amount and type of leaked data are determining factors for the ensuing loss of reputation.

ABOUT THE AUTHOR

Pieter Arntz

Malware Intelligence Researcher

Was a Microsoft MVP in consumer security for 12 years running. Can speak four languages. Smells of rich mahogany and leather-bound books.