Oud-Hollands in Friesland

Wednesday, April 30, 2008

FrieslandA rare Dutch-only post here… This one’s too much work to explain in English.

De Volkskrant meldde vandaag dat de koningin het naar haar zin had in Makkum en Franeker: “Uitbundig pakken de Friezen uit terwijl de koninklijke familie zich vermaakt met overwegend oud-Hollandse activiteiten.” Waarom die oud-Hollandse activiteiten? Zijn oud-Frieze activiteiten minder leuk? Of worden er oud-Nederlandse activiteiten bedoeld? Friesland is toch echt geen onderdeel van Holland, is dat nooit geweest, en zal dat vermoedelijk ook nooit worden.


Chinglish for toddlers

Monday, February 11, 2008

The Chinese recognize that foreign tourists and business travellers are an important source of income, so more and more places include English translations on their Chinese signs. Unfortunately, only few Chinese are fluent in English, so the translations are not always correct. In fact, many of them are downright incomprehensible. Language Log has a growing number of examples of Chinglish, like this one, this one and this one.

Their latest example, borrowing from peer-see.com, provides little hope of the translations improving any time soon. It shows a set of blocks to help toddlers learn English, with pictures accompanied by the Chinese and English words. Kind of, anyway:

Chinglish for toddlers

For the full set, head over to the peer-see.com post.


German efficiency

Thursday, November 15, 2007

I was looking up a word just yet at Dictionary.com, and more so than by the definition, my attention was drawn by an advertisement banner:

Die Bahn streikt

I’ve retained enough high school German to understand that. For those who don’t, I’ll translate:

The German railways are striking — what do you think of this?
O Yes    O No    O Not interested
Vote and win EUR 10,000.      www.Initiative-Deutschland.net

Must be German efficiency to be able to answer a question about your opinion with a simple yes or no.


So many translations

Thursday, March 1, 2007

BooksThe automatic translation service from Babel Fish remains a source of fun. As Tersie discovered, the phrase so many birds is translated as zo vele vogels. While technically correct, that sounds rather archaic. We would typically say zo veel vogels.

I wondered whether Babel Fish was simply thrown off by the lack of context, so I feeded it the full sentence I see so many birds in the sky. The result: Ik zie zo vele vogels in de hemel. That is a good translation, but again has the archaic vele instead of the common veel. It also gives hemel instead of lucht for sky; hemel is much closer in meaning to heaven. Incidentally, this is one of a few sentences with the words in exactly the same order in Dutch and in English, so all words translate one on one. Usually Dutch puts the verb in a different location.

As I was typing this post, I heard the traffic information on the radio, so I tried another sentence: There are so many traffic jams today. This becomes Er is vandaag zo vele verkeersjam. That’s correct… sort of. Jam is a difficult word to translate, since it has two completely different meanings: traffic congestion or the jelly you put on your toast. The Fish oddly seems to have taken the second one, creating a Dutch sentence that literally means There is so much traffic jelly today. Try putting that on your toast!


Professor Harm tired cherry

Wednesday, January 24, 2007

BooksFollowing up on the previous post and Mark Liberman’s Language Log post, I thought I’d delve a little deeper into the Babel Fish translation of the opening paragraph of the Leiden University newsletter article on the WNT going online. In Dutch, it goes as follows:

Met ingang van zaterdag 27 januari is het Woordenboek der Nederlandsche Taal (WNT) voor iedereen gratis op het internet te raadplegen. Is dit nieuws alleen van belang voor neerlandici, filologen en taalkundigen? “Magnifiek”, reageert Harm Beukers, hoogleraar geschiedenis van de geneeskunde.

I gave a translation in my previous post, which I will modify here slightly to be more literal:

As of Saturday 27 January, the Dictionary of the Dutch Language will be on the internet for everyone to consult for free. Is this news only important to scholars of Dutch, philologists and linguists? “Magnificent,” responds Harm Beukers, professor in history of medicine.

Altavista’s Babel Fish service gives the following translation, as already provided by Mark in his LL post:

As of Saturday 27 January the dictionary is for free consult language (WNT) for everyone of the Nederlandsche on the Internet. Is this news important only for neerlandici, philologists and linguists? “magnificent”, Harm tired cherry, hoogleraar history of medicine react.

Surprisingly, the type of quotation marks used around magnifiek matters to Babel Fish. I used double quotes, whereas Mark used single quotes, resulting in a slightly different translation:

  • “Magnifiek”, reageert Harm Beukers, hoogleraar geschiedenis van de geneeskunde. is translated as “magnificent”, Harm tired cherry, hoogleraar history of medicine react.
  • ‘Magnifiek’, reageert Harm Beukers, hoogleraar geschiedenis van de geneeskunde. is translated as ‘ magnificently, react Harm tired cherry, hoogleraar history of medicine.

Why is the capital M lost in both cases? Why is the closing quotation mark lost in the single-quotes case? Why is a space inserted after the opening quotation mark in the single-quotes case? Why does the position of react depend on the type of quotation marks?

Moving on to the rest of the text, it is clear that the old-style spelling of Nederlandsche is confusing. Using modern spelling (Nederlandse), the translation is better, but still not good: As of Saturday 27 January the dictionary is for free consult of the Dutch language (WNT) for everyone on the Internet.

The word neerlandici (scholars of the Dutch language) is left untranslated. The singular form, neerlandicus, is also unknown to Babel Fish. The word hoogleraar (professor), a rather common Dutch word and easy to translate, poses another problem.

Most puzzling is the transformation of the name Harm Beukers into Harm tired cherry. If I remove the rest of text, leaving only the name, it yields the same translation. Removing the first name, leaving only Beukers, results in a translation of tired cherry. This makes absolutely zero sense to me. A Google search on {”beukers” “tired cherry”} comes up empty, only adding to my wonder. Where did Babel Fish pull this from? [Update (January 24th, 2007): See the comments for a further discussion and a likely answer.]

On a related note, can anyone tell me what Beukers means as a last name? As a word, it is something like batterers or bashers, from beuken, to batter, to bash, but I doubt it means the same as a name. If no one knows, I’ll consult the WNT in a few days and see what I can find there.

[Update 2 (January 24th, 2007): Mark Liberman posted a follow-up on Language Log, adding his thoughts on Babel Fish's handling of Beukers and hoogleraar and machine translation in general.]


World’s biggest dictionary goes online

Wednesday, January 24, 2007

BooksThis week’s Leiden University newsletter has a story on the Dictionary of the Dutch language becoming freely available online in a few days. I forwarded this report to Mark Liberman, co-creator and senior writer of Language Log, a weblog about language (no kidding!) I have greatly enjoyed reading since I discovered it last year. Mark found the story interesting enough that he posted it on LL, expanding it a bit to make it a more entertaining read than I could ever do.

Unfortunately, there doesn’t seem to be an English report on the WNT going online, so all the juicy details were lost on non-Dutch speakers. (Of which I’m sure there are many amongst the Language Log audience.) Not anymore, though, as I will provide a translation right here:

The Woordenboek der Nederlandsche Taal (WNT; Dictionary of the Dutch Language), will become freely available on the internet on Saturday, January 27, at wnt.inl.nl. Is this news important only to scholars of Dutch, philologists and linguists? “Magnificent,” responds Harm Beukers, professor in history of medicine.

Records
The Woordenboek der Nederlandsche Taal is a record-breaking piece of work. It required 134 years of work, from 1864 to 1998. It contains hundreds of thousands of entries with definitions of Dutch words and more than one and a half million quotes from sources from between 1500 and 1976. The dictionary was published in 686 parts collected in forty volumes. This makes it a very complete account of nearly five centuries of Dutch language history.

CD-ROM
On the other hand, this also makes it a bulky and even unwieldy dictionary; it is not one any person would readily have on their bookshelves. A trip to the university library or another scientific library is required to consult it. This situation improved when the dictionary was published on CD-ROM in 2000. (An incomplete edition, up to the W, was already published in 1995.) However, this CD-ROM edition had its own disadvantages, certainly compared to the online availability soon to be realized.

Useful sources
Professor Beukers is very happy the WNT will soon be available on the internet. Up to now, he had to cycle to the university library to do research. “There was the cd-rom, of course,” he says, “but I just never got around to buying it. The biggest advantage is that one can now consult the dictionary while writing a paper.” Rob Visser, professor in history of the natural sciences [and no relative of mine, --Ruud], is also delighted. “I only used the WNT sporadically, but if it becomes more easily accessible, I will certainly consult it more often. The WNT uses sources that are not always obvious for my area of work.” Visser recalls a student who quickly found a list of sources in the WNT they could use for their research on evolution.

Magnifying glass
Marietje van der Schaar, a researcher at the university’s philosophy department, also makes frequent use of the WNT and–because she often writes in English–the Oxford English Dictionary (OED), the English equivalent to the WNT. Van der Schaar: “It is wonderful that the WNT will be available online. I have the OED at home, but I can only read it with the magnifying glass that came with it. It is important for me to know how certain words were used in the past, and these dictionaries provide a lot of information on the development of words like kennen and weten. In modern English there is no distinction between these words; both are translated as to know. The OED tells me there was a distinction in the past: to ken and to wit.”

Definitions
All words in the online WNT can be looked up using the original 1863 spelling rules or modern rules. It is also possible to look for parts of words, like suffixes and prefixes, for word categories, like interjections and conjunctions, or for terms used in the definitions, like all words that have the term plant or ship in their definition.

Information outside the dictionary
An important advantage of the online WNT over the CD-ROM edition is that links could be added to information outside the dictionary. For instance, all words that have been published so far in the Etymologisch Woordenboek van het Nederlands (Etymological Dictionary of the Dutch Language), with the most recent developments in etymological research, are coupled to their equivalents in the WNT. Further links are available to similar words in Afrikaans, to figures of plants and animals, and to dialect charts. The source list of the online WNT was completely revised: it contains a large number of new works, which also turned out to be used for the printed WNT. This new source list allowed many entries in the WNT to be dated more accurately.

Using the online WNT will be free of charge. After a one-time registration as a user, the dictionary can be consulted wherever and whenever one wants to.

The newsletter article also contains two pieces of text set apart from the main body. The first piece explains how the WNT came to be:

Historical dictionary
The WNT is a historical dictionary. For every word, it lists the grammatical characteristics, the origin, the original meaning, and other meanings that developed over time. The WNT also gives derivations and compound words and information concerning usage in expressions and proverbs. Of particular note is the fact that the descriptions are fully based on an independent collection of source material: almost ten thousand literary and non-literary sources with millions of quotes. However, the WNT is also a historical dictionary in another sense.

New spelling rules
Matthias de Vries and Lammert te Winkel, the driving forces behind the WNT, created a new set of spelling rules to be used in the dictionary. These rules are appropriately known nowadays as the De Vries and Te Winkel spelling. In 1863, Te Winkel published De grondbeginselen der Nederlandsche spelling. Ontwerp der spelling voor het aanstaande Nederlandsch Woordenboek (The foundations of the Dutch spelling. Design for the spelling rules for the upcoming Dictionary of the Dutch Language). These rules soon became very popular and were adopted in Belgium already on November 21, 1863. De Vries and Te Winkel published the Woordenlijst voor de spelling der Nederlandsche taal (List of words for the spelling rules in the Dutch language) in 1866 to be used by the common man. The entire WNT was written according to these rules, surviving two spelling reforms before the WNT was completed in 1998.

1921
In order to finish before 2000, the board of the Instituut voor Nederlandse Lexicologie (Institute for Dutch Lexicology), founded in 1967 and overseeing work on the WNT ever since, decided in 1976 that no words first used after 1921 would be added. Words like vacantiegeld and zappen are therefore absent.

The second additional bit of text compares the WNT to some other large dictionaries, but I’ll leave that out here, because for some reason my weblog refuses to display the table properly. Suffice it to say the WNT is of equal size to the Oxford English Dictionary (OED), the Deutsches Wörterbuch (DWB) by the Grimm brothers and the Dai Kan-Wa Jiten (DKWJ; a Chinese-Japenese dictionary) by Tetsuji Morohashi. It has been said the WNT is actually the world’s biggest dictionary; in terms of pages, that certainly seems to be true, but the OED contains more entries. As often with size comparisons, the winner depends on the exact definition of “biggest”.


Plutoed planet gets Word of the Year honours

Wednesday, January 10, 2007

PlutoAstronomy meets linguistics! The American Dialect Society picked plutoed, from the verb to pluto, as 2006’s Word of the Year:

In its 17th annual words of the year vote, the American Dialect Society voted plutoed as the word of the year, in a run-off against climate canary. To pluto is to demote or devalue someone or something, as happened to the former planet Pluto when the General Assembly of the International Astronomical Union decided Pluto no longer met its definition of a planet.

Pluto’s demotion to dwarf planet is getting this little ball of rock and ice more fame in a single year than in all 76 years combined since it was first discovered!


Teens use 20 words for third of speech… as do we all

Monday, December 18, 2006

BBCLast week, the BBC—which, if anything, has a reputation of reliability—ran a story on teenagers’ extremely poor language skills. The article begins as follows:

Britain’s teenagers risk becoming a nation of “Vicky Pollards” held back by poor verbal skills, research suggests. And like the Little Britain character the top 20 words used, including yeah, no, but and like, account for around a third of all words, the study says.

It then goes on to provide some nuances, which of course were ignored by most other media reporting on this research. Hence, the main point came to be the factoid that teenagers use only 20 words for a third of their speech and writing.

Over at Language Log, there are two posts showing why this is hugely misleading. First off, Mark Liberman writes:

I’m sure that Britain’s teens would benefit from additional vocabulary instruction. But the assertion that they “use just 20 words for a third of everything they say” is a spectacularly lousy argument for this conclusion.

Here’s why. The Zipf’s-law distribution of words, whether in speech or in writing, whether produced by teens or the elderly or anyone in between, means that the commonest few words will account for a substantial fraction of the total number of word-uses. And in modern English, the fraction accounted for by the commonest 20 orthographical word-forms is in the range of 25-40%, with the 33% claimed for the British teens being towards the low side of the observed range.

For example, in the Switchboard corpus — about 3 million words of conversational English collected from mostly middle-aged Americans in 1990-91 — the top 20 words account for 38% of all word-uses. In the Brown corpus, about a million words of all sorts of English texts collected in 1960, the top 20 words account for 32.5% of all word-uses. In a collection of around 120 million words from the Wall Street Journal in the years around 1990, the commonest 20 words account for 27.5% of all word-uses.

Following up on that, Geoff Pullum had a closer look at the original BBC article:

I took the entire text of the actual BBC article (…), computed the top 20 most frequent words in it, and worked out what percentage of the total it was. The answer is between 36 and 40 percent. (The difference depends on how much you collapse different word forms together into lexemes. Collapsing genitives and plurals with non-genitive singulars makes hardly any difference to the results, but treating is, are, was, and were as different words rather than as representatives of the verb be lowers the figure slightly. If you do the collapsing, the top 20 words make up over 39.5% of the text. If you don’t, the top 20 account for just over 36%.)

So this is the situation. This staggeringly stupid news report states that Britain’s teenagers are “held back by poor verbal skills” because the evidence shows that the top 20 words in their speech account for 33% of all the words they use — the implication being that they aren’t using enough words, they’re just repeating a few words like “yeah” and “no” and “but” and “like”. But in the staggeringly stupid article itself, the top 20 words account for substantially more than that. So Britain’s science writers (at least at the BBC) are even more verbally retarded.

In case you want to see the results I got (which you can easily check for yourself), here they are (with the lexeme collapsing done). There are 402 words in the text (if you replace hyphens by spaces), and this table shows the numbers of occurrences for the top 20 in frequency:

25   the
16   forms of the verb be
13   of
10   and
10   in
10   to
9    forms of the noun word
8    a
7    but
6    as
6    forms of the pronoun it
5    forms of the pronoun he
5    no
5    forms of the verb say
5    speech
4    by
4    forms of the noun school
4    that
4    which
4    with

These words account for 25 + 16 + 13 + 10 + 10 + 10 + 9 + 8 + 7 + 6 + 6 + 5 + 5 + 5 + 5 + 4 + 4 + 4 + 4 + 4 = 160 occurrences, and 160/402 = 39.8%.

Even if you insist on going with raw word forms with not even the singulars and plurals collapsed, my count shows the percentage only going down to 36%, which is still higher than the teenagers’ alleged 33%.

Ergo, the teenagers sampled in the study reported by the BBC are more verbally skilled than the writers of the BBC article.


Parentheses and slashes

Monday, December 11, 2006

ParenthesesI once read that one of the differences between Dutch and English (apart from several obvious ones) is the Dutch practice of using parentheses and slashes to allow for several possibilities in one sentence without having to write them all down explicitly. A simple example:

The student(s) that fail(s) the exam can try again in a few months.

Although there’s technically nothing wrong with this sentence in English, native speakers typically wouldn’t write this. In Dutch, though, this is quite common. Sometimes, though, people try to do too much. I found this sentence in a contract for my new savings account:

De bank is bevoegd (één van) de/het bankdienst(en) en/of product(en) te beëindigen en/of (de daarvoor verschuldigde vergoeding) te wijzigen.

I can’t even translate that literally, but I’ll try as best as I can:

The bank is allowed to terminate and/or change (one of) the service(s) and/or product(s) or its/their corresponding compensation.

I guess what they mean is they can either terminate or change any of the services and products I’m signing up for. In addition, they can change the compensation (e.g. intereset). (As I translated it, the compensation can also be terminated, but the original Dutch sentence doesn’t allow for that.)

And there’s more. A few lines down, there’s this beauty:

Voor zover met betrekking tot de in deze overeenkomst vermelde (aanvragen (tot wijziging) van) product(en) reeds eerder met dezelfde rekeninghouder één/meer overeenkomst(en) is/zijn aangegaan, …

Translated as literally as possible, that becomes:

If one/more contract(s) was/were signed by this account owner in relation to (requests for (changes in)) the services mentioned in the present contract, …

Note, in particular, the nested parentheses. I really wonder if the person that wrote these sentences thought this was the clearest and easiest way to put it.


Buffaloing Buffalo buffaloes

Monday, December 4, 2006