Skip to main content
Puzzle Simple AI Powered Fuzzy Name Matching for Elasticsearch
Babel Street Match

Fine-tune your Match Threshold and Parameters

Remember playing “Concentration” as a kid? It’s a tile-match game. Each player must turn over two tiles and remember their placement to win future rounds. It’s a strict match/no match scenario. The red chair matches the other red chair, not the refrigerator. The chicken matches the other chicken, not the xylophone.

Name matching works nothing like this. You rarely find an exact chicken-to-chicken match. Nor are mismatches as clear as chicken-to-xylophone. High-velocity, high-stakes name matching in finance, national security, and other industries feels more like uncovering a tile picturing a chicken, another with an egg, and trying to decide whether the two align. To make these determinations, you need a name matching system that allows you to set your own match thresholds and tune match parameters, in alignment with your use case and tolerance for risk.

Match can help

Babel Street Match (formerly Rosette) is a scalable software solution that automates the matching of personal names, organizational names, and addresses. It employs AI-powered fuzzy matching capabilities to recognize names in all their varieties and, and by appending identifying information to names, helps your organization match more quickly and certainly — thereby reducing alerts and the amount of staff time required to investigate those alerts. It is interoperable with a broad variety of applications and databases, and is offered as a plug-in for popular search engines. Match's language capability enables it to match names from dozens of languages — including complex, non-Latin languages including Arabic, Chinese, Hebrew, Japanese, Korean, and Russian.

Rather than just providing a “match/no match” report, Match gives you normalized, actionable scores indicating its confidence in each match. Scores run from 0 to 1, with, as examples, a .5 indicating a 50% degree of confidence in the match, and .75 indicating a 75% degree of confidence.

As important, Match offers you the opportunity to fine-tune match parameters and adjust match thresholds according to your specific needs.

Setting your own match parameters

The default settings used in Match deployments are based on decades of research and tens of millions of successfully matched names in hundreds of environments. Still, you may choose to fine-tune about 20 of those parameters, weighting each according to your use case and risk tolerance.

These parameters include:

  • Disordered name components: John Aaron Smith vs. Smith, John Aaron
  • Translations or transliterations: John Aaron Smith vs. Juan Aarón Herrara (Spanish) vs. Jean Aron Havre (French) vs. 존 아론 스미스 (Korean)
  • Initialisms: John Aaron Smith vs. John A. Smith vs. J.A. Smith
  • Missing middle names: John Aaron Smith vs. John Smith
  • Nicknames and aliases: John Aaron Smith vs. Jack Smith vs. ZeZe Herrara
  • Misspellings: John Aron Smythe
  • Homonyms and Gender Conflicts: John Aaron Smith vs. Jan Erin Smith
  • Titles

Why would you choose to weight certain match parameters differently?

Matching beyond personal names

Matching addresses and corporate names presents particular challenges. Different countries have different address conventions, listing various address components in different orders. Companies are known by a variety of names and initialism. (The “North American Bee Company” may be more often labeled “NABC.” “The Yummy Yawning Yogurt Company” might be better known as “3Y.”) Match/no match systems are unlikely to make these matches: Match does.

If you know that John Aaron Smith often translates his name into Spanish or German, you can set Match to give extra weight to those translations. If you’re matching lists written in the United States with those compiled in the United Kingdom, you may want to adjust for the British Spelling of Smith’s last name: Smythe. If you have reason to know that John Aaron Smith always uses his middle name or middle initial, you may want to give more weight to middle names and initialisms — concurrently rejecting instances of John Smith without the “Aaron,” or instances of persons with different middle names or initials — John Michael Smith or John M. Smith, as examples.

The deeper the knowledge of your data, the better you can fine-tune parameters. If you know that a data set contains only names, you can tune Match to dismiss any numbers as typos or other recording errors. If, however, you’re matching a list of addresses, you may add more weight to numbers than to other address components: (“123 Main St.” more closely matches “123 Main” than it matches “368 Main St.”)  If you’re dealing with a list of medical professionals, partial matches may inappropriately arise because of the regular use of titles such as “Dr.” or designations such as “RN.” You can tune Match to ignore those appellations. If you have a list in which names are consistently ordered, you can tune Match to consider “Aaron John Smith” not as a disordered “John Aaron Smith,” but as a mismatch.

The match threshold balance

Finance and national security organizations need exact or strong matches in specific, significant instances. You can’t empower human traffickers to launder money through your bank. You can’t allow a terrorist to enter the country. These determinations almost always require human investigation. And rightly so. But significant investigative time and salaries are too often spent examining a deluge of false positives. In fact, over 95 percent of system-generated alerts are closed as “false positives” in the first phase of review, with approximately 98 percent of alerts never culminating in a suspicious activity report (SAR).[1]

To match names in a fiscally sound manner, you need to strike a balance between automation when possible, and human investigation when warranted. This can be accomplished in part by adjusting match thresholds — or the level at which a name matching systems deems two or more names to be an exact match, a strong match, a partial match, or not a match.

Consider the following scenarios

You work for an international bank. You’re worried Customer A may be using your bank to launder millions to fund human trafficking. You’re considering giving Customer B, a young adult, his first credit card. It has a €1,500 limit.

Matching for Customer A is a more significant task in every conceivable way. The stakes of a missed match are far higher (fines, loss of reputation, the moral culpability of enabling a heinous crime), than the potential consequences of a missed match for Customer B — who in a worst-case scenario may scam you out of €1,500.

Name matching is never as easy as turning over tiles in a children’s game. Rather, it is an onerous, time-consuming processes fraught with opportunities for bad matches and missed matches. Match automates much of the name matching process — spotting potential matches across a broad array of languages; empowering you to set match parameters and thresholds; and providing normalized scoring that reflects the system’s confidence in each match. In doing so, it helps you match names more effectively and efficiently, while saving the expense of live investigations.

Try it yourself. For free.

Sign up for a free trial license at https://developer.rosette.com.

End Note

  1. https://www.reuters.com/article/bc-finreg-laundering-detecting-idUSKCN1GP2NV "Anti-money laundering controls failing to detect terrorists, cartels, and sanctioned states"

Find out how to transform your data into actionable insights.

Schedule a Demo

Stay Informed

Sign up to receive the latest intel, news and updates from Babel Street.

Babel Street Home
Trending Searches