The Customer Identity Resolution Conundrum

In today's data-rich world, one challenge reigns supreme in creating truly personalized customer experiences: accurately identifying the same individual across multiple touchpoints and systems.

Identity resolution aims to accomplish the fundamental task of recognizing the same person across all your customer data sources. Traditionally, businesses have relied on two main approaches: Unique Identifier methods and Static Rule systems.

In this post, we'll explore both conventional methods in depth, examining their strengths and limitations.

Unique Identifier Approach

In a perfect world, each individual customer would be represented by the same unique identifier across a brand’s datasets. A perfect unique identifier would be something inimitable, like a social security number or a thumb print.

However, few (if any) of a brand’s systems have identifiers that are truly unique. In these cases, a simple primary key/foreign key match is sufficient to connect data about an individual from one table to the next.

Unfortunately, for the majority of a brand’s data, primary keys are not unique to individuals. For example:

A graph showing identifiers for an ecommerce customer (primary key/ID, customer name, phone, and email address)

Using a simple Unique Identifier to identify people, one would assume that IDs 123, 456, and 789 represent three distinct customers without considering that all four records belong to the same individual.

Static Rule Approach

Ideally, customer data would be clean, consistent, and perfectly aligned. In reality, it’s anything but. Businesses juggle multiple systems, deal with human errors in data entry, and constantly update personal information—all of which create major pitfalls in maintaining accurate records.

For the past decade, many organizations have relied on rules-based approaches to manage these inconsistencies. While this method has its merits, it’s far from foolproof.

Revisiting the sample data above, let’s say we establish the following rules:

IF names are an exact match, THEN it's a match

IF email addresses are an exact match, THEN it's a match

IF primary key or foreign keys are an exact match, THEN it's a match

There’s a major flaw with this approach. Take the first rule—it successfully classifies all records with an exact match on "Cassandra Leonard" as the same person.

But what happens when you apply this rule to a more common name, like "John Smith"? Suddenly, records belonging to entirely different individuals get lumped together, leading to inaccurate data and potential business risks. This underscores how a rules-based data approach often can’t account for real-world complexities.

To weed out some of these false positives, we could instead define the rules as:

IF names are an exact match, AND IF email addresses are a match, THEN it's a match

IF primary keys or foreign keys are an exact match, THEN it's a match

While this refined approach does improve accuracy, it has the potential to introduce more false negatives.

Take a less common name like Calliope Glaser. If one record has the email cglaser@gmail.com and another has calliopeg78@yahoo.com, a strict rules-based system will treat them as two separate individuals. However, common sense suggests they are likely the same person.

Another challenge? Human error in a typo-riddled world. Casandra Leonard and Cassandra Leonard could refer to the same person, but a rigid system won’t recognize them as a match.

This is where fuzzy matching comes into play. Instead of relying on exact matches, more sophisticated algorithms—like Levenshtein distance—measure the similarity between values, improving accuracy and reducing errors in data classification.

By shifting from rigid rules to intelligent matching, businesses can better handle real-world data inconsistencies.

As more matching signals are introduced and the number of hand-coded rules grows, so does their complexity. Simple logic like (Rule A AND Rule B) OR Rule C is no longer enough. Instead, rules must be weighted based on how strongly they indicate that two records belong to the same person. Some rules carry more significance, while others provide only a weak signal.

To manage this, organizations often rely on massive static-rule tables. While this approach may work in the short term, it isn’t tenable. As data evolves and new sources are added, maintaining these rules becomes a constant challenge.

Moreover, the weight-setting process is far from scientific. Data administrators typically estimate values, leading to inconsistencies and potential errors.

There are two additional things to consider in a typical static rule-set approach:

1. What is Being Compared?

Generally, legacy systems perform pairwise comparisons, meaning that one record is compared against another. A determination is then made whether these two records are a match or not. What happens when there are more than two records?

This is conflicting information. Based on this information, should A, B, and C be considered the same person or not? The truth is that the flaws of pairwise matching with binary (yes/no) outputs render this question impossible to answer definitively.

2. What is the Output of the Comparison?

In the static rule-set world when two records are compared, the output of the comparison is a binary answer. While this may make perfect sense, important information is lost about whether the quality of the match was high, medium, low, and what records were “near” matches but not quite there. This can be problematic for two reasons, the first outlined in the example above.

Imagine if we were comparing A, B and C with each other and:

A and B = strong match
A and C = weak mismatch
B and C = strong match

If we could measure match strength, we’d confidently link A, B, and C—even if A and C show a weak mismatch. Different scenarios require different thresholds: some allow weaker matches, while others demand certainty.

For example, in personalized recommendations, even a partial match can be useful. If one record has a name, email, and zip code, and another has a name, email, and gender, you can tailor offerings, like beauty boxes for a female-identifying customer or winter gear for someone in a cold climate.

Consider two records showing a probable but not unequivocal match. We might want to use gender information from the second record to enrich the first, but a static-rules approach can’t handle that nuance.

Amperity is different. Thanks to our innovative solution to customer identification, we’re able to address the complex challenges that have long frustrated marketers and data teams alike. Download our identity resolution playbook to learn more.