Generating Look-alike Names via Distributed Representations Abstract—Motivated by applications in login challenge and privacy protection, we consider the problem of large-scale con- struction of realistic-looking names to serve as aliases for real individuals. We seek these look-alike names to preserve name characteristics like gender, ethnicity, and frequency of occurrence while being unlinkable back to the source individual. We introduce the technique of distributed name embeddings, representing names in a high-dimensional space such that dis- tance between name components reflects the degree of cultural similarity between these strings. We present different approaches to constructing name embeddings, and evaluate their cultural coherence. We demonstrate that name embeddings strongly encode gender and ethnicity, as well as name frequency. I. I NTRODUCTION Names are important. The names that people carry with them are arguably the strongest single facet of their identity. Names convey cues to people’s gender, ethnicity, and family history. Hyphenated last names suggest possible marital rela- tionships. Names even encode information about age, as social trends alter the popularity of given names. That names serve as people’s primary societal identifier gives them even more power. Privacy requirements often make it undesirable or even illegal to publish people’s names without their express permission. Yet there are often technical contexts where we need names which can be shared to represent things: to serve as placeholders in databases, demonstrations, and scientific studies. In this paper, we consider the problem of constructing realistic-looking names on a large-scale to serve as aliases for real individuals. The task here is more subtle than may appear at first. We might consider randomly assigning names from reference sources such as telephone books, but these are actual names and hence violate privacy concerns. We might consider generating names at random from component first/last name parts. But these fake names will not respect gender, ethnicity, and temporal biases: consider the implausibility of names like “Wei Hernandez” or “Roberto Chen”. When replacing names to anonymize a medical study, dissonance is created when female names are replaced by male ones, and the names of elderly patients aliased by newly coined names. Our interest in generating look-alike names arose through a computer security application: how might an email user who lost their password be able to convince the account provider of their identity as part of account recovery process, or as part of the second-login challenge? We reasoned that the genuine account holder should be able to distinguish the actual contacts they have corresponded with from a background of imitation names (Fig. 1). But this is only effective when the background Fig. 1: Three contact list based challenges using the technique proposed in this paper. In each challenge only one contact is real. names are culturally indistinguishable from the contacts, a property which did not hold under naive name generation methods. Otherwise, the correct answer might stand out and hence be easily guessed by an attacker who tries to take control of user accounts. For example, consider an example chal- lenge question asked to a hypothetical user, wendy wong@, given in Table I (left). If background names are generated naively without preserving ethnic properties (right), guessing the correct answer becomes much easier. However, when the generated names preserve ethnic and cultural properties of the real contacts that they replace (middle), the guessing task for attacker remains hard, because the imitation names look very similar to the real contacts. TABLE I: A security challenge question: “pick someone you contacted among the following”. Left: the contact list of a hypothetical user wendy wong@. Middle: a replacement list generated using the technique proposed in this paper (retaining one real contact Charles Wan). Right: a naively generated random replacement list. Real Contacts Proposed Challenge Naive Challenge Angela Chiang Amanda Hsu John Sander Paresh Singh Nirav Sharma Steve Pignootti Charles Wan Charles Wan Charles Wan Yuda Lin Joko Yu Jeff Guibeaux Lin Wong Hua Li Sam Khilkevich Tony Kuang David Feng Mary Lopez Hua Yim Jie Fung Ron Clemens The major contributions of our work are: