Endangered languages and gadgets that record them

There's a really interesting piece in today's NYT by John Noble Wilford about "endangered languages" --
Of the estimated 7,000 languages spoken in the world today, linguists say, nearly half are in danger of extinction and likely to disappear in this century. In fact, one falls out of use about every two weeks.
So, this is not news, but the piece goes on to explore a project led by K. David Harrison, an associate professor of linguistics at Swarthmore --

Beginning what is expected to be a long-term project to identify and record endangered languages, Dr. Harrison has traveled to many parts of the world with Gregory D. S. Anderson, director of the Living Tongues Institute, in Salem, Ore., and Chris Rainier, a filmmaker with the National Geographic Society.

The researchers, focusing on distinct oral languages, not dialects, interviewed and made recordings of the few remaining speakers of a language and collected basic word lists. The individual projects, some lasting three to four years, involve hundreds of hours of recording speech, developing grammars and preparing children’s readers in the obscure language. The research has concentrated on preserving entire language families.

When I saw this story, my eyes zoomed in one one detail: it looks like Mr. Harrison, in the photo above at far left, is doing all that recording on the same device I use for audio recording. Link to larger image, so you can see more clearly.

Unless I'm mistaken, that's a Marantz PMD660. Amazon link for purchase ($489-499, generally), and here's a super informative item about the device on transom.org.

The short version: it's not the smallest, it's not the lightest, it's not the cheapest, but it's a trusty workhorse capable of producing pro quality audio in a variety of formats. I began using it because it was standard-issue among NPR reporters, and audio engineers there showed me ways to get the most out of it. It's served me well in difficult sonic and environmental conditions, and I wouldn't readily trust another device. You got one shot at capturing something, you want to nail it. "Newest" isn't as important as "least likely to fail."

I'm planning a longer post sometime soon about the mics I use for field recording, and some of the smaller, more compact alternatives for recording units. (Hm, wonder what kind of mics they're using?)

Image: Chris Rainier/National Geographic.



    As the technical operations manager for a small public radio news organization, though the largest radio presence in the U.S. Capitol, I concur 100% with Xeni. The PMD 660 is an excellent pro-sumer device that is both flexible and durable.

    That being said I must share two of my reservations with the unit.

    First, they burn through batteries like a lunatic. Four AA’s, on a good day, get you four hours of audio. By spending more on Lithium Ion batteries you can stretch your recording time to almost fifteen hours.

    Second, Marantz rushed the PMD 660 to market and as such the bulk of the units available (and in production) share one glaring defect. Namely, when running a line level signal into the unit, the cheap preamp is apt to distort — even if the incoming signal is as clean as a whistle. Colossal and rich NPR was able to strong arm Marantz into fixing this problem just for them, though these upgrades aren’t readily available to the general public via the manufacturer.

    Also, in reference to the Transom article mentioned, I highly recommend spending the extra few hundred dollars to get the unit modded with better preamps. The pre-s currently used work a lot better with condenser mics than dynamic; and though in the city this isn’t much of a problem, you should always have a back up dynamic in the field, should phantom power become a problem. Besides, even if you have the best microphone in the world, if your preamp sucks, your audio quality will suffer.

  2. As a botanist, the article about endangered languages hits me in two different ways.

    First, I would say that linguists themselves are responsible for the loss of information, particularly in hunter-gatherer languages. For instance, an elderly woman is the last speaker of a northern aboriginal language. A linguist sits down with her, and records names for three different species of blue-tongued lizard (presumably, three different species of skinks), or a term for fruits that grow on bushes on the sides of sandhills, or a term for a small, edible mushroom. Because the linguist had no biological background, these words will be lost, because the linguist has no way to determine the proper referent for them. A biologist, particularly one with some local knowledge, would know how to find the organism that fits the term, and either determine the name in scientific terms, or collect a specimen and name it. Biologists, particularly trained systematists, routinely create new names. Were life scientists to get involved in these efforts, much more of the knowledge encoded in these languages could be brought forward and properly valued.

    My screen name, Heteromeles, is an example of this. Heteromeles is the genus for toyon, a large shrub in the rose family that grows in California. It’s one of my favorite plants, but more importantly, it’s the only major plant in California that still bears the name given to it by a native California tribe. All others have been renamed according to European conventions.

    The second point is that there are a number of endangered languages in english. I have a PhD in botany, and I have spent over a decade learning the unique terms for plants and fungi. My particular specialty has only a few hundred trained people, which means, in effect, my tribe has only a few hundred fluent speakers. Currently, I am working as a consultant, and not working in my specialty (although, thankfully, I’m still a botanist). Unless I can get an academic job, I will probably not be able to pass on my language, and it will die with me (most botanist’s children do not become botanists).

    There are a large number of small sciences out there that are currently endangered by the science funding practices of the Bush administration and the desires of society at large. This number includes, by the way, ALL of the people who can deal with biodiversity, the naming, counting, and conserving of the species on this planet.

    However, our society thinks it is more proper to fund additional studies on the dangers of erectile dysfunction drugs than it is to fund the people who know how to name and care for the myriad species of the world. This is true whether those people live on reservations, or in small offices in small colleges. Knowledge dies if it is not passed on.

  3. This is an endlessly fascinating subject. For instance, minor languages aren’t just spoken by obscure tribes living in areas that are only visited by National Geographic photographers.

    The idea that there are single unitary languages — English, French, German, Italian, Spanish, et cetera — is an artificial imposition that came in with the nation-state. “English” is the Southern English that was spoken around London. “Italian” is Tuscan. Find a good website on minor languages and have a look at Italy, or at the related language groups that bracket the Pyrenees but are classified as dialects of French or Spanish depending on where they are relative to the border, or at the mountain valleys where France, Germany, and Italy intersect, where every valley has its own tongue.

    Or look at Lallans Scots, a.k.a. Braid Scots, or various other names. It’s not just a few quaint regional terms plus a thick accent. But for hundreds of years, kids who’ve grown up speaking it have been told they speak nonstandard English — not separate, just wrong.

    Have a look at the article. Languages are part of the basic texture of the real world.

  4. I used to think “good, let more languages die, maybe someday we will all speak the same language and there will be no more wars”, of course that’s naive: wars start over power and money and resources. So now I think: if we all spoke the same language it would make it that much easier for us to argue all the time.

    Wish I still had my Marantz portable stereo cassette recorder, that thing was a workhorse, though I do admire all the new flash recorders out now (Zoom and Edirol are particularly nice).

  5. I hope you’re using the MP3 mode, Xeni. There’s just no point in recording at 44.1khz or 48khz for radio. The FM band is always going to be less than 25khz wide.

  6. I’m a linguist that works on a North American aboriginal language (Cree) & language family (Algonquian). Happened to see your article, and the article in the New York times.

    First of all, be aware that the numbers they’re quoting for the world’s languages are very fuzzy, and not well documented. The details of what makes something a “language” and not a “dialect” or a set of “technical terminology” are usually not well-understood by the general public (reference the Botanist at the top – English is a language. Technical terminology for a specific field is better considered a little subordinate lexicon, rather than a whole different language.).

    Yes, most of us use Marantz recorders like that – they’re lovely in the field, and quite durable. The sound quality is usually good enough for acoustic studies. I’ve used a variety of them over the years, actually. They haven’t let me down yet! Lately, I’ve been using a nifty handheld one from Olympus – the sound quality is plenty good (I’m mostly doing semantic/syntax stuff, mind you, not the fancy acoustic analysis), and it’s not very intrusive, which is important, particularly for linguistic / ethnographic work where you need to get it out of people’s faces so they can think.

    With regards to the botanist’s concerns – people need to realize that languages are very BIG and linguists are very SMALL. There are many, many kinds of knowledge encoded in a language, and only a few of us to try and document it. Think of it like being in a house that’s on fire – the community wants to get what they can before it burns down. I’m sorry if some of us haven’t gotten your botany or biological terms, but our arms are only so large, and we can only carry so much. For the speakers themselves, the main concerns are often not the documentation of these terms, but rather the documentation of philosophical or religious concepts – the stuff they really desire to pass on. On the big list, the names for lizard species fall pretty far down, unless they have some cultural significance.

  7. @verafides, which Olympus recorder do you like?

    Many thanks to all who’ve weighed in on this thread, really fascinating to read your insight here. Even you, Jason.


  8. We can’t let French go, or we’ll never again be able to order dinner in Montreal, which would be sad. Also, losing it would take the fun out of Haitian Kreol.

  9. Attempts at getting the confirmation e-mail have been met with crickets, so anonymous I am, for the moemnt.

    As a linguist that’s spent plenty of time in the field with his trusty Marantz, I’ll say that the machine is a worthy mainstay. We looked at recording directly onto a laptop, but couldn’t find a USB Mic that my prof thought was good enough. Being a product of the digital age, I felt a little weird recording things on tape, but the sound quality was more than worth it. Every time I opened a sound file and saw how the background noise was almost nil, I ruffled the Marantz’s hair lovingly.

    As for linguists destroying language, I have a flippant answer and a more meaningful one:

    Flippant: I don’t exactly see anyone else rushing out to learn anything about these languages, botanists included. The analogy above of the burning home is very appropriate; some of these languages don’t have a native speaker that’s younger than 60.

    More meaningful: Linguistic fieldwork is not usually a speedy affair. Linguists are often trying to learn the language they’re studying and exposing themselves to the culture, possibly living there for months. Although my fieldwork was with English-speaking Lebanese immigrants, I was spending time at community centers, mosques, and in people’s homes. I only learned scraps of Arab culture in all the time I did my fieldwork, but it was certainly not an affair of me waltzing in, barking questions, and running off, data in hand.

    You can’t be a white guy going into an Arab neighborhood in Michigan with a tape recorder and asking ‘Can I ask you personal information and record you? I’m not from the government, really!’ and expect to get too far unless you’ve invested yourself in the culture a bit.

  10. With respect to Verafides above, I have to answer the “house on fire” comment.

    Personally, I’d be happy to volunteer to go out with a linguist to help come up with vocabulary lists, especially for areas where I am familiar. I think a number of us would love to do that, and a number of us have done it.

    Here’s the question: what knowledge is valuable? Certainly, it’s nice to know how people named the parts of their bodies, and how they organized their social lives. That’s where most people live, and I fully understand that this is what most people want to pass on.

    But what about the plants and animals? I would argue they are far more important, to a wider audience. The problem most conservation biologists face is total lack of knowledge: the species are poorly known, and they have only been studied for a short time. Traditional knowledge can be enormously helpful both in understanding natural history (how and where species live, what they do, etc.), and in understanding how conditions have changed in the lifetime of the informant. Typically, we biologists don’t have that knowledge. And it matters, because we are now the caretakers of these areas, and it’s hard to take good care when you don’t know enough.

    An example might help. I work in Southern California, and I work on habitat restoration. One of the hot areas around here is native grassland restoration. We know that native grasslands (which used to be very common) were pretty intensively managed by the local Chumash and Gabrieleno. We know little beyond this, and to my knowledge, the traditional management practices and terminology have disappeared. This is a huge problem for us restorationists, because many of the native plants we are trying to restore did well under some sort of human management regime. We don’t know what that was, and since many weeds do well under our modern disturbance patterns, it’s difficult to know how we can get the natives to grow without getting them swamped with weeds. I would give quite a lot to sit down with a Chumash or Gabrieleno elder who knew about the old practices, but to my knowledge, those people passed on two generations ago. Now, government agencies are paying hundreds of thousands of dollars to rediscover that knowledge, and now it is done in the name of conservation, by people like me. That is the price of the loss of those languages and their technical lexicons. I don’t know how to price the cost of the loss of philosophical terms, but I do know that the loss of lizard and plant knowledge can be very expensive.

    In any case, I appreciate the work Verafides is doing, and I appreciate the correction–my knowledge is a lexicon, not a language. However, the central observation remains. Even if they are not languages, there are a number of endangered lexicons in our world, and that is a problem. Rediscovery is every bit as expensive as learning it the first time.

  11. I was one of David’s advisees at Swarthmore, and I can say that his research interests (and those of most linguists) are not related the lexicon per se, but rather to cognitive systems reflected in the lexicon and in other structures in the language. The type of knowledge that David is documenting has more to do with understanding the general cognitive structures of the mind and the space of possible languages than in more practical concerns like botany.

    There are even reasons to be suspicious of those who would mine traditional cultures for technical information. In too many instances, that information is promptly patented and sold back to them.

  12. I am a freelance radio journalist and documentarist. My experience with the Pmd 660 is that, while it does offer a few “pro” advantages such as XLR in, the mic preamps hiss is just too gross. The one I tested was nowhere near the quality of the Pmd 671 (the larger, upgraded version). These mp3s demonstrate the difference in sound between the 660 and the 671:

    Pmd 671 w. Neumann u68 studio mic (expensive!).

    Pmd 660 w. Neumann u68.

    Pmd 671 w. Electrovoice 635a (oldschool dynamic reporter mic)

    Pmd 660 w. Electrovoice 635a.

    While this test was done a while ago, I am not sure Marantz has adressed this issue.

    However, the Oades brothers modded version of the Pmd 660 should do the trick. Note the examples at the bottom of the page.

    If anyone has experience with this upgraded version, please share with me in the comments on this page. While I have not personally tested the Oades Mod I would definitely go for that if I were to buy a recorder.

  13. I, too, am a linguist who does fieldwork on an endangered language – an indigenous language of California with fewer native speakers left than the fingers on one hand. I was happy to see that an article on language documentation made it onto BoingBoing. I too use a Marantz PMD 660, with a head-mounted condenser microphone. I know that many linguists use Marantz recorders with the Oade Brothers mod, and I have used both modded and unmodified PMD660s. There is a little improvement in the preamp hiss with the Oade Brothers mod, but I have found that using a condenser mic with either a modded or unmodded PMD660 produces superior recordings (in terms of noise) to a dynamic mic with either version of the recorder. When I bought my own recorder I skipped the mod and spent my money on a really nice condenser mic, and I don’t regret it. In any case, the little, more affordable Marantz recorder is incredibly far superior to the minidisc recorders and magnetic tapes that were common in the field just a few years ago, so as a poor but dedicated field linguist I am counting my blessings.

    I’d like to respond to Heteromeles, as well. It is true that each linguist goes to the field with personal interests, pet projects, and a sense of what is the basic information that should be documented first so that something useful survives of an endangered language. Many of our ideas of what information should be acquired first are shaped by what previous generations of linguists have done. For example, many linguists collect lists of words designed by M. Swadesh for lexicostatistics, even though lexicostatistics is mostly considered obsolete. Yet having lists of semantically similar words in many languages is still quite useful, so the practice continues. It is always difficult to decide what sort of information to gather first – especially in a language that is losing ground quickly. A lot of linguists are happy to document anything that is volunteered by their consultants, including extensive lists of flora and fauna terms. I have even known linguists who have consulted with specialists to scientifically identify the species named by speakers, and linguists who couldn’t take botanists into the field but have brought back some specimens, labeled with their native names. We are aware of the gaps in our knowledge, but we are only human. I am sure that the sensitivity of linguists to what sort of information is culturally and scientifically valuable varies from researcher to researcher, but none of us is working to exclude information from the records so I think it is a bit unreasonable to blame linguists for the loss of information that happens as languages become obsolete.

    Heteromeles must also consider that native names for different species, and even the ability to distinguish between some of those species, are part of the great mass of cultural knowledge that is passed down from generation to generation in many cultures. When the breakdown in the generational inheritance of this information starts causing the language to be lost, it also frequently causes cultural knowledge like names for different grasses to be lost. A child who learns English in school and has little interest in speaking the ancestral language is likely also to shop at the grocery store instead of participating in the traditional subsistence methods for which botanical terms were previously very important. In many cases, the information which would be so dear to Heteromeles is lost before linguists even arrive on the scene.

    As far as preserving the technical botany lexicon — Heteromeles, if you feel so passionately that this is a problem akin to language endangerment, then the solution is to teach this terminology to your son, make recordings, and write down everything that you can so that some interested person down the road might be saved some effort in recovering it. If that wouldn’t help to solve your problem, then you are probably not the last speaker of a dying language — you are just in the same boat as every researcher whose field is underfunded.

  14. Wow it’s nice to see some people take an interest in endangered languages! Despite the flashy talk you sometimes see in newspapers or glossy magazines, there’s really very little interest in what we do. Most people feel that these languages (and the cultures that go with them) are better off dead in our age of universalism..

    Xeni – I use an Olympus DS-2, actually. Several other linguists have recommended it, because the audio is pretty good, for its size. It also has an external mic jack, and doesn’t do too much of the crazy compression stuff. I’ve recorded about 25 hours of speakers on it, and it is nice because it is OUT OF THE WAY. People need to concentrate, and think, when they tell you stuff, and a big thing sticking in their face makes them blank out. Also, if I drop it in the river, I’m only out $135…

    Botanist Person – Honestly, I don’t particularly care about the “wider audience.” I’m not a tool for turning Indians into soundbites for “The Learning Channel.” Nor am I a tool for modern western scientists – I honestly don’t care if you guys want certain kinds of information out of these people – it’s not my job to serve it to you. If you want it, come and get it – learn the language, like I have, and come and sit with the old people and ask them. There is no other way to knowledge than that (They will tell you the same thing, by the way).

    You’re absolutely right – the loss of botanical and biological information is a significant problem. However, the SouthernCal linguist Marketa is absolutely right – expertise varies a great deal. Also, be aware that it varies among speakers, too. In some communities, herbal and biological knowledge is restricted to particular people, and has complex social and religious issues associated with it. So, you may end up working with speakers for 40 years before you even come across someone who knows or wants to talk about those things. When most linguists do, they are more than happy to oblige.

    Have you checked, by the way? Chumash in particular has had a lot of work done on it. It went extinct in 1965, which is not that long ago. Maybe you just don’t know where to look?

    Not everything is about prices… There’s a lot more to what they have to say than social lives and body parts …

    I’m not too keen on squeezing indigenous people so that the modern locals can have nice parks. If you ask them, they’ll usually tell you that this is the price we pay for what we’ve done. Maybe it’s not SUPPOSED to be fixed? Kill the caretakers, exterminate their families, throw all their books in the river, and still hope to have what they had? Seems unreasonable to hope for, don’t you think?

    Regardless, I’m glad you care about these issues, and hope it will motivate you to do something about it.

  15. Back to the gadgets.

    I have an Olympus WS100 in addition to my Marantz. It is a cheaper handheld than the D2, and I am sure that its internal mic is not as good as the Olympus D2. I have used my Olympus in situations where I want a less conspicuous recorder (e.g. in conversational settings where a big mic in the middle of the room would change the comfort level of speakers). I usually work with a single speaker at a time and I have found that after some initial laughter and awkwardness, most people forget they are wearing a head-mounted mic. I use the (discontinued) AKG C420, which has proven to be surprisingly durable. It is less conspicuous than a hand held mic and has the benefit of maintaining a fixed position near the speaker’s mouth and out of their airstream. Of course for languages with more robust speaker populations than the one I work on, it might make more sense to find an inconspicuous way to record multiple people at once.

    The real reason I rarely use the handheld Olympus is that it records in .WMA format. Even if this format isn’t too lossy for you, it has major compatibility issues, and it makes it very difficult to work with these files using sound analysis and editing programs. The other issue I have with the Olympus is that its recording time is limited. I can put a 12G card in my Marantz and be o.k. if I need to record a lot of sessions and am unable to set my computer up to dump data every couple of hours. I bring both into the field, but I always end up using the Marantz.

    I am not a phonetician, so I don’t do a lot of acoustic analysis. But I operate with the philosophy that if I archive the best possible recordings, then other linguists who may want to do phonetic analysis will be able to use what I have done and that the recordings I make will be maximally useful (in terms of ease of use and quality) to the communities I work in.

    I should also add that all of the Marantz PMD660 recorders I have worked with (both Oade Bros. mod and unmodified versions) were purchased in the last 6-8 months. I am probably less sensitive to preamp hiss than journalists are, but perhaps Marantz has recently improved the 660.

  16. Yes, you’re absolutely right about the Olympus and compatibility and losses! The .wma format is for sure a braindead decision on their part. I use a Mac to work with the files, and you can get the Mac to mess with them okay. It just takes some noodling. Also, Praat will work with .wma, too.

    Really, though, I wasn’t too worried about those things. The main reason for this is because I’ve done fairly reliable acoustic work on recordings from much much worse equipment (tabletop audiocassette recorders from the 1980s, for example). Even tapes that are full of hiss and crackle still give you good fundamental frequency and formant information, overall. Sure, you can’t measure some of the fancy stuff, but I think we even did spectral tilt on vowels (Cree has much preaspiration) using older recordings like that.

    Really, though, for doing the kind of acoustic measurements that the laboratory phoneticians want, ANY fieldwork situation is going to be insufficient. You need a soundproof booth, a fixed mic at a fixed distance from the speaker, etc. Usually, when I show ANY acoustic stuff to one of those people, they’re horrified. So, given that I can’t please them ANYWAY, I might as well do it cheaply, reliably, and out of the consultant’s face, I figure.

    But in the end, the Marantz setup you’re using is the best solution for fieldwork, for sure.

  17. Something else –

    The late Dale Kinkade once showed me a very old photograph of Melville Jacobs doing fieldwork on the Northwest Coast in the 1930s. He had this 1920s Ford pickup truck with a GIGANTIC contraption on the back, complete with a wax cylinder and some kind of grammaphone. He was having a Salish speaker talk into it. I wish I had a copy of it, it was quite the sight!

    Here’s a photo of him some years later doing similar work.


