When we talk about cross-lingual name matching between English and Japanese, it’s pretty straightforward, and pretty obvious which name is in English and which in Japanese. This applies to any set of names written in different scripts: Arabic to Cyrillic, Devanagari to Latin, etc. However, differentiating between names written in the same script, such as Chinese, Japanese, and Korean — which can all use the Han script (Chinese ideographs) — is trickier, (but thankfully not a problem for Babel Street Match fuzzy name matching and name translation.)
First you have to know what language you are starting in. In many cases, how you pronounce the characters is necessary to matching it to the same name in a different language or correctly transliterating it to your chosen language. Even though the three languages share a script, names in one language are usually represented differently in the other two, necessitating a tool that can accurately select the right language model.
Preferred ways to translate
Take the Korean name 김정일 that is usually written in Hangul. The Chinese would use the Hanja equivalent 金正日 because it is Chinese ideographs. However, the Japanese would write it phonetically in Katakana as キム・ジョンイル.
English | Korean (hangul) | Chinese (takes Korean hanja) | Japanese (phonetic of Korean) |
Kim Jong-Il | 김정일 | 金正日 | キム・ジョンイル |
Korean (phonetic of Japanese) | Chinese | Japanese | |
Yasunari Kawabata | 가와바타 야스나리 | 川端康成 (uses Japanese Kanji) | 川端康成 |
Hibari Misora | 미소라 히바리 | 美空雲雀 (uses Chinese to approximate Hiragana sounds) | 美空 ひばり |
Alternatively, the Japanese name 川端康成 would be represented as is in Chinese, but the Japanese name 美空 ひばり (a combination of Kanji and Hiragana script) would be transliterated to Chinese characters that sound like the Japanese Hiragana (Hibari).
Varying pronunciations for the same character
Variations in character pronunciation are another reason you must know which language you are starting from. For example, character 金 is pronounced differently in each language:
Japanese pronunciation of 金 | kah-nay or kin |
Chinese pronunciation of 金 | jin |
Korean pronunciation of 金 | kim or gim |
As a non-speaker, how do you know what language a name is in?
Determining language is a relatively easy problem for someone who knows one of the languages, because although the languages share the Han script (Chinese ideographs), the characters used to form names in each language are very distinctive and different. This skill is analogous to an English speaker recognizing that “Jose Perez” is probably Spanish, while “Olivier Cousteau” is probably French.
The character 金 is the very common Korean surname “Kim”, but very rarely a surname for the Japanese and just occasionally a surname for the Chinese such as the famous Qing dynasty writer 金聖嘆 Jin Shengtan. Despite this, Japanese and Chinese language frequently use 金 with other characters to create given names.
Our name matching function understands idiosyncrasies of Han script and will determine the starting language of a name if unknown and then proceed with name matching. More simply, Match doubles as a language identifier and name matcher for these three languages.
Curious about how each of these languages express foreign names compared to the other two? Read on.
Chinese names
Chinese uses only Han characters (called Hanzi) to write. Thus all names are written in hanzi. Each character has a basic meaning and usually one pronunciation:
Family name | Given name |
章 | 娟 |
Zhang | Juan |
Foreign names in Chinese
Since Chinese has no script besides Hanzi, foreign names are written phonetically by selecting Hanzi characters that approximate the sound of the foreign name. For example, in China, Portman is transliterated as 波特曼 (bo te man), in Taiwan it’s translated as 波曼 (bo man), and 寶雯 (bao wen) in Hong Kong.
For Korean names, where an equivalent Hanja version is commonly used and known, the Chinese will use that, however if the Hanja are not known, then it will be translated phonetically to Chinese. A good example is the Korean name Gong Hyo Jin (Hanja name is 孔曉振, and in Hangul, 공효진), but it is also sometimes written phonetically in Chinese as 孔孝真.
The exception is Japanese names in Kanji and Korean names in Hanja, which are used as is — although the Chinese are sure to pronounce them differently!
Korean names
Chinese characters were borrowed by Korea as a writing system (called Hanja) hundreds of years ago. Exactly when is unknown, but Hanja was already in use when Korean King Sejong the Great commissioned scholars in the 1440s to come up with the uniquely Korean script, Hangul, which is almost exclusively used to write Korean today. Hangul is a purely phonetic representation and although Korean names now are written mostly in Hangul, many times they “map” to particular Hanja for their meaning. A given Hangul name can map to multiple various Hanja as there are many homonyms in Korean.
Family name | Given name | |
Hanja | 朴 | 明洙 |
Hangul | 박 | 명수 |
English | Park | Myeong-su |
Foreign names in Korean
In Korean, foreign names are simply transliterated phonetically and written in Hangul. What about names in Kanji or Hanzi?
Japanese names
Japanese names come in the greatest variety as the language has three scripts. Kanji are borrowed Chinese characters (from likely around the 4th century), from which the Japanese created Hiragana and Katakana, whose characters have no inherent meaning and just represent sounds. Kanji usually have at least two different readings depending on the word context they appear in. Names can be written in any of the three scripts, but for each person, they have ONE official spelling of their name and the different ways to write the same sounding name are not interchangeable, just as “Cyndi Hawkins” is not the same as “Cindy Haukens” in English. The name “Yoko” has several Kanji that match it as there are also many homonyms in Japanese.
Family name | Given name | |
Kanji | 佐々木 | 洋子 |
English | Sasaki | Yoko |
Kanji, Hiragana and Kanji | 菅野 | よう子 |
English | Kanno | Yoko |
Katakana | トリヤベ | ヨーコ |
English | Toriyabe | Yoko |
Foreign names in Japanese
Generally speaking, all foreign names are written phonetically in Japanese using the Katakana script. For Korean names in Hanja and Chinese names in hanzi, the hanja or hanzi name may also appear next to the Katakana, but it is rare to see either without the Katakana as well.
Disclaimer: All names, companies, and incidents portrayed in this document are fictitious. No identification with actual persons (living or deceased), places, companies, and products are intended or should be inferred.
Find out how to transform your data into actionable insights.
Schedule a DemoStay Informed
Sign up to receive the latest intel, news and updates from Babel Street.