How I Named My Newborn Child
An entertaining, and hopefully inspiring tale of a first generation Chinese American couple
We planned to have a baby and we found out we’re having a boy!
Now we have a daunting task awaiting us with a completely non-negotiable deadline. We’ve got to name the baby.
With my wife and I both being Chinese living in the U.S., we have a specific set of criteria we’d like for the name.
We wanted a name with a spelling that’s both a Western name (or an English word), and a valid romanized Chinese name (i.e. pin-yin). For example, John is out because there are no Chinese characters that spell out to J-o-h-n. On the other hand, Zhongxiang is out too because it’s a recognizable word from Chinese and is not a Western name or English word.
We wanted the name to be 2 Chinese characters. I have a single character name, I didn’t quite like the rhythm of it. For example, Kai is a great name, but it’s just a single character.
We wanted the name to be a generally accepted “boy’s name” in both English and Chinese. For example, while it might be possible to come up with Chinese characters that spell Maya so Chinese readers could see it as a boy's name, it’s out because the English name “Maya” is generally perceived as a girl’s name.
We wanted the name to have some meaning in both Chinese and English. For example, even though the name Luke satisfies the first 3 conditions, in Chinese, it sounds very much like a direct translation of the English name and doesn’t have a traditional Chinese meaning behind it.
We didn’t explicitly talk about this at first, but we realized we didn’t want a traditional or very typical name from another culture. For example, Yuri is a classical Eastern European or Slavic name. With my son not having any Slavic heritage, we didn’t want to have the dissonance between the person and the name.
First glance, this seemed like an impossible set of conditions. We thought so too. But we thought we’d give it a try anyway, and with 9 months of time, maybe we’d come up with something.
First 8 Months
We know the task is difficult, but we didn’t really put too much concentrated effort into it. We came up with and wrote down some names here and there when they popped into our heads, but none of them seemed entirely satisfactory.
We had names such as Luke, Lian, Timo, Jude. For quite some time, we really liked Timo and it was an acceptable back-up if we weren’t able to come up with anything else. It gave us some confidence that finding a “perfect” name is still possible.
Interestingly, there seem to be a lot more girl’s names that meet the conditions, for example, Miya, Lexi, Nina are all great choices.
3-4 Weeks to Due Date
With the due date coming up, we put more effort into coming up with more options. We found certain patterns that can give us a bunch of choices, such as Zimo, Zemo, Zeno, Chase, Channing but they also led to some rather uncommon ones, such as Size, Mice, Dice, Ate, Late, Mate, Make.
Of course, we asked ChatGPT for its help as well. Somewhat disappointingly, I couldn’t get ChatGPT to understand the first condition. It kept giving me Western names that aren’t valid Chinese pin-yin, no matter how I explained it.
At this point, we still liked Timo the most. But our hesitation was on condition 5. Since Timo is primarily of Finnish/Germanic origin, we’re not too sure about its cultural implications.
1 Week to Due Date
We weren’t able to make a lot of progress in coming up with more options, even with more frequent and intense discussions and brainstorming sessions.
Exactly 1 week before the due date, in the evening, my wife felt a strong wave of contractions. Even though the baby didn’t come out that day, it was a strong reminder of the imminent upcoming event.
We had to have a name ready ASAP!
With a renewed sense of urgency, I realized we didn’t have to keep banging our heads on the wall like this. We can automate an exhaustive search!
Attempt 1: Naive starter
The first idea I had was to simply generate all combinations of 2 Chinese characters and look through them to discover suitable names.
Even though there might be up to 20,000 different characters, the number of sounds (i.e. spellings) are far smaller (especially without counting the tones). Within each Chinese word, typically it consists of an Initial (that usually contains the opening consonants) and a Final (that supplies the vowels).
There are exceptions, and there are combinations that are not valid, but instead of listing out all actually valid spellings, I thought I’d use the combination of all Initials and all Finals to make up a word, and then list out all 2-word combinations.
Here’s the Python code in its entirety:
initials = ['b', 'p', 'm', 'f', 'd', 't', 'n', 'l', 'g', 'k', 'h', 'j', 'q', 'x', 'zh', 'ch', 'sh', 'r', 'z', 'c', 's', 'y', 'w']
finals = ['a', 'o', 'e', 'i', 'u', 'v', 'ai', 'ei', 'ui', 'ao', 'ou', 'iu', 'ie', 've', 'er', 'an', 'en', 'in', 'un', 'vn', 'ang', 'eng', 'ing', 'ong', 'iao', 'ian', 'iang']
whole_words = ['zhi', 'chi', 'shi', 'ri', 'zi', 'ci', 'si', 'yi', 'wu', 'yu', 'ye', 'yue', 'yin', 'yun', 'ying', 'yuan']
words = [i + f for i in initials for f in finals]
words += whole_words
words += ['ai', 'ou', 'er', 'an', 'ang']
names = [w1+w2 for w1 in words for w2 in words]
print(',\n'.join(names))
I ran this and was proven to be way too naive. Even after deduplication, there are more than 412k combinations. Unfortunately that’s way too many to look through manually. And worse, many of these combinations are not even valid!
Attempt 2: Automated filter
The obvious optimization is to write down the list of spellings that are actually valid for a Chinese character instead of using all Initial <> Final combinations.
That turns out relatively straightforward. I found this website that contains a table of exactly the valid spellings, and I simply typed up all 413 of them in a list, and saved it to a file (`words.txt`).
Immediately, we narrow down the valid 2-word combinations to 413*413=170,569 possible names. Unfortunately that’s still too many to look through one by one.
with open('words.txt') as w:
words = [wo.strip() for wo in w.readlines()]
names = set([w1 + w2 for w1 in words for w2 in words])
print("total number of raw names to start with", len(names))
Now, since we started writing code anyway, we might as well start filtering the possible names in code as well.
After a quick look around, the first filter I added is a nice spell checker called Enchant that simply narrows the list down to valid english words.
d = enchant.Dict("en_US")
english_words = [w for w in names if d.check(w)]
print("total number of names that are also words", len(english_words))
This filtered the list all the way down to just 681 names. That’s perfect!
As I looked through this list, I quickly realized that we’re missing something. This is now a list of valid 2-word Chinese combinations that are valid English words, not names!
We don’t want to name the baby Like, or Wanna, or Women, or Rerun!
Attempt 3: Get the names in
Now the task is clear. We have to filter for English names in the valid Chinese spellings. To do that, I just need to find a list of boy’s names.
The first library I found, names-dataset, looked great. It’s a dataset of popular names in a large number of countries. It contains popularity data, and it has a nice querying interface. I made a few attempts using it to filter, but it didn’t quite work out how I expected.
When I used all boy’s names in the U.S., I ended up with still a very large number of names remaining, including all the very Chinese ones. It turns out, this data was based on Facebook, and of course, there are many Chinese people based in the U.S., so that didn’t help much.
On the other hand, when I used only top boy’s names in the U.S. to filter, even up to top 500,000, it couldn’t find a single overlap with the list of Chinese spellings.
I didn’t look much further, but I guess this is because all the names that are also valid Chinese are relatively far down on the popularity list, which is quite reasonable given the scale of the dataset.
On top of this, this dataset is rather large (2.3 GB) and takes a while to load, so I moved on.
Looking around more, I found another couple of datasets on names:
This dataset contains a list of popular names in the U.S. by year from 2000 to 2021, based on social security data
This dataset from a Kaggle competition also contains similar data but dating back all the way to 1880.
So I integrated them into the algorithm as well:
names_by_year = set()
for year in range(2000, 2022):
boy_file = f'popular-baby-names/{year}/boy_names_{year}.json'
with open(boy_file) as bf:
names_by_year |= set([n.lower() for n in json.load(bf)['names']])
print("total number of names by year dataset", len(names & names_by_year))
names_from_file = set()
baby_names_file = 'babyNamesUSYOB-full.csv'
with open(baby_names_file) as bnf:
reader = csv.reader(bnf)
rows = [r for r in reader][1:]
for r in rows:
if r[2] == 'M' and int(r[0]) > 1940 and int(r[3]) > 9:
names_from_file.add(r[1].lower())
print("total number of names from file", len(names & names_from_file))
all_names = names & (names_by_year | names_from_file)
with open('results.txt', 'w') as nf:
nf.writelines([name+'\n' for name in all_names])
nf.writelines(['--------------------\n'])
nf.writelines([name+'\n' for name in english_words])
Conclusion
With this final version, we started out with 167,626 raw “names” that are all valid Chinese 2-word combinations (down from 170,569 after deduplication). 681 of them are also valid English words. Independently, 96 of them overlapped with popular names from 2000-2021. Furthermore, 934 of them overlapped with popular names (at least 10 boys had the name in a year) going all the way back from 1940.
After deduplication, we now have exactly 1617 names to pick from.
Here are a few that made our final shortlist: Daren, Boyan, Randi, Sinan, Yale, along with Timo and Channing from earlier.
This is now looking much better.
3 Days to Due Date
Funny how life works.
We looked at all the names on the shortlist, and we pronounced them over and over. All of a sudden, a completely new name that’s not on any of these lists came to us, and we loved it. We looked it up, and it has a great meaning. It’s easy to pronounce, and it’s cultured in both Chinese and Western contexts.
Just like that, we got a name for the baby.
However, I have no doubt that we wouldn’t have come up with it, had it not been inspired by all the work that went into it.
I suppose the takeaway from this is, hard work pays off, even though sometimes in unexpected ways.
Congrats on your baby! And admire the nerdiness you demonstrated in this task 😂 I always thought naming your child is more of a banging your head type of task...