On Friday, I joined Rachel Maddow for a segment on the Rachel Maddow Show about news that ICANN will soon begin supporting truly internationalized domain name extensions -- in other words, dot-com, dot-(country name), and the like, typed out non-Latin character sets. Chinese, Hebrew, you name it.
A number of Boing Boing readers commented on that video clip, and had questions. What about scripts that go right to left, like Arabic or Hebrew? Will we all have to buy new keyboards (and keyboard cats)? Is this the end of the internet as a unifying force, and the beginning of greater cultural rifts? WHAT ABOUT THE CATS AND THEIR CATSPEAK WHICH MUST BE TYPED? The Internet is serious business.
Paul Hoffman was one of the authors of the original standards that led to this news. I asked him to address our commenters' questions, and to go into a little more of the geeky techno-historical detail that wouldn't really work in a five-minute television news segment. Paul kindly obliged -- his thoughts follow, and he answers your questions after the jump:
Ever since the DNS was created, many people had the feeling that requiring all names to be only in a limited set of characters was not going to work for the whole world. The Internet was never just a US invention; there were plenty of Europeans soon after things got going. Names are *very* important to people, and the old DNS rules basically forces many people to misspell their names, their company's names, and even worse, their country's names.
Starting almost exactly ten years ago, there was a groundswell of interest in the IETF to fix this. The IETF is the standards organization that makes the technical rules for how the Internet works. Unlike most standards organizations, anyone can contribute in the IETF, and all of its standards have always been open and freely available. This helped make fixing this part of the DNS easier, because people from all over the world could help without having to pay anything, or even formally "join" the IETF.
The work from 2000 to 2003 was fairly intense. People unfamiliar with how the DNS works, but who wanted it to work for their languages, had to learn the technology so they could weigh the various proposals for fixing it. People who knew the DNS technology intimately had to learn to let go of their "it works just fine" mentality. Everyone had to get over "the keyboard issue", and to remember that most people go to web sites and send email by clicking, not by typing.
In 2003, the IETF published the standard, called IDNA (Internationalized Domain Names in Applications). The "in applications" part is important: we didn't change the DNS technology at all, we only changed how applications like web browsers and email clients would display domain names. Many application vendors such as Microsoft, Mozilla, and so on, embraced it immediately, as did the domain registries. Thus, you have been able to go to éxample.com for many years.
The IDNA standard we created over five years ago applies to *all* parts of the DNS, including the top level domains (TLDs). However, ICANN was hesitant to open up the TLDs to using IDNAs because of many political, non-technical concerns. The news this week is that ICANN has finally opened up the root zone to useful IDNs, not that the technology was just developed. The technology has been available for years, but the desire to make it available has finally achieved a critical mass within ICANN.
Also note that ICANN's announcement *only* covers IDNs in country names, not in new TLDs. The latter will (or won't) happen a few years from now, and the political discussions there will be even more difficult than it was for the country names. For example, it makes a great deal of sense to let China have an equivalent of .cn in their native script. VeriSign owns .com, which most people think means "commercial"; does it make sense to let VeriSign have the equivalent of the word "commercial" in every native script in the world? If not, in any?
All of that is politics and business, not technology. What the IETF did in 2003 was to make it possible and then let everyone else decide how to use it. The IETF is just about finished with a revision to IDNA that will change things a bit, but again, only on the technology level. Later on, you will see IETF standards for making email addresses (the stuff to the left of the "@") also be internationalized. It's a long, slow process, particularly because it is done by volunteers.
YOUR QUESTIONS ANSWERED.
"What if a human rights group in Canada wants to register a domain name in Chinese or Arabic, in the native-alphabet country extensions for China or Saudi Arabia," she said, "Can the countries involved deny that request? Those are the sort of challenges to free speech that lie ahead."
That is incorrect: they have been able to do it since 2003. In fact, that was one of my driving motivations: I should be able to have a domain name that speaks to the intended recipient. All that happened this week is that now the whole domain name *might* be able to be the native script. I emphasize *might* because many countries don't have open registration policies for names under the country's name.
Maybe it's a bad day for web browser developers. Whatever code just needed to work with ascii characters now needs to work
with unicode characters. Or maybe it's a good day for web
browser developers because they have more work to do.
That is incorrect: the browser makers added this years ago.
This is a good day for phishers. How long until "ebay.còm" and such addresses that look like one domain but are another.
This is partially incorrect: ICANN is responsible for preventing look-alike TLDs, and has promised that they will be vigilant. In fact, that is some of the reason that we have had to wait as long as we did. The phishing topic has been discussed for over a decade, and the number of people who would mistype a domain name is dwarfed by the number of people who will just click on anything. Avram got it right in #8.
How does this interact with right-to-left scripts?
Great question, and they won't like the answer: not so well. Right-to-left scripts (also called "BiDi scripts" because they use bidirectional characters) have lots of very difficult-to-handle side effects. IDNA deals with them by restricting the labels that have right-to-left characters, and the update coming next year loosens that restriction a bit. This will take over a decade to fully sort out, I'm afraid, but our experience since 2003 is that we did pretty well on the first round.
The homogenization and standardization of English, is one of the main reasons for the explosive growth and globalism of the net. This will only serve to fractionate things and in the end will hurt growth and usefulness. Trendy dialects belong in the history books, English, my friend is the language of the future.
Incorrect: we have seen no significant fractionalization since 2003. The content of web pages has always been able to be non-English, and that is much more important than the domain names that lead to the pages. Internationalizing domain names just makes the access to that content easier for those who type names.
The biggest myth is that this is the first time that you could have internationalized domain names. It will be (once they approve some in a few months) the first time that you can have fully internationalized domain names. People have been using internationalized domain names in a variety of scripts for years now.
It seems like the second myth is that people who can't type the names won't be able to reach the web sites. This myth is hard to kill, even though doing so only takes one word: click.
FURTHER READING: If you want to learn more about the IETF, a good intro is "The Tao of the IETF." The actual standards for IDNA are RFC 3454, RFC 3490, RFC 3491, and RFC 3492, all available from this link. Some of Paul's notes on ICANN are here.