Wednesday 19 November 2014

No stone left unkerned

One of the difficulties I always have with Japanese is reading. While I can glide very quickly over English text (and, to a lesser extent, other Roman languages) my Japanese reading is painfully slow and hesitant. It's not only that my Japanese isn't up to it; I find there's something monotonous about the actual presentation of Japanese text that makes it hard for me to focus and analyse it. After a lot of musing, I think it's because I'm used to kerned, spaced Roman text. What Japanese reminds me of is good old monospaced text, from the days of yore.

"Kerned" means the font has been designed so that letters are spaced elegantly. A slim I takes up less space than a broad W, giving a more pleasing appearance to most readers. The font called Times New Roman is kerned, but Courier isn't.

I am Times New Roman and I am kerned. WWWWWIIIII.

I am Courier New and I am not kerned. WWWWWIIIII.

A lot of written Japanese (such as novels) includes oral features like certain changes to pronunciation. Sometimes it moves towards full-fledged dialect, but even with more standard writing there are some changes to the 'standard' form that adds to my confusion. To make matters worse, there are often three or more ways of writing the exact same word: kanji, hiragana or katakana. Sometimes these have identical meanings; sometimes not. Apparently お化け tends to mean a classic Japanese ghost, while オバケ (exactly the same pronunciation, "oh-ba-ke") tends to mean a cutesy Halloween ghost. ずれている花瓶 and ズレている花瓶 are both pronounced "zurete iru kabin", and are technically the exact same verb, but have mildly different implications.

If you're interested, they mean "the vase is out of place". The first version, in the usual hiragana, implies a physical displacement. The second, in katakana (usually reserved for foreign or emphasised words) implies the vase just doesn't fit in with the decor.

Anyway, the point of this post is, I thought it might be interesting (for both English and Japanese readers) to compare and contrast the two. This will require your computer to understand how to show Japanese characters, so if not, sorry.

Here I have attempted to produce roughly the same effect with English. There's really no way to substitute for the complications of the three Japanese scripts, and I haven't tried. English also provides more clues about word boundaries, because of the way syllables are constructed. For example, you can't start any English words with ng, pth, rd and so on. Japanese is a syllabic script, where any character can (in theory) appear at the start, middle or end of a word, so for the non-fluent reader, it's harder to judge boundaries.

I have, however:

  1. Removed all the spaces.
  2. Put it all in lower case.
  3. Made the font monospaced, giving a similar monotonous look to the text.
  4. Thrown in a few oral features, tweaking the written forms to show some features of authentic pronunciation.
  5. Reduced the spacing between lines of text, so it's harder to focus on the line you're reading.

Because I'm fairly kind, I kept the spelling alterations pretty mild, and didn't attempt to make the whole thing sound like I do.

For really authentic nostalgia, this should be printed in green text on a black screen, like the computers of my youth.

EDIT: after some time (and some readers) it occurs to me that uppercase is probably even harder to parse.

ELIZABETHPASSEDTHECHIEFOTHENIGHTINERSISTERSROOM,ANINTHMORNINGHADTHEPLEASUREOBEINGABLETOSENDATOLERABLEANSWERTOTHEINQUIRIESWHICHSHEVERYEARLYRECEIVEDFROMMRBINGLEYBYAHOUSEMAID,ANSOMETIMEAFTERWARDSFROMTHTWOELEGANTLADIESWHOWAITEDONISSISTERS.INSPITEOTHISAMENDMENT,HOWEVER,SHEREQUESTEDTOWAVEANOTESENTTOLONGBOURN,DESIRINERMOTHERTOVISITJANE,ANFORMEROWNJUDGEMENTOERSITUATION.THENOPEWASIMMEDIATELYDISPATCHED,ANITSCONTENTSASQUICKLYCOMPLIEBWITH.MRSBENNET,ACCOMPANIEDBYERTWOYOUNGESTGIRLS,REACHEDNETHERFIELDSOONAFTERTHEFAMILYBREAKFAST.

ADSHEFOUNJANEINANYAPPARENTDANGER,MRSBENNETWOULDABEENVERYMISERABLE;BUPBEINSATISFIEDONSEEINERTHATERILLNESSWASNOTALARMING,SHEADNOWISHOFERRECOVERINGIMMEDIATELY,ASHERRESTORATIONTOHEALTHWOULDPROBLYREMOVERFROMNETHERFIELD.SHEWOULDNOTLISTEN,THEREFORE,TOERDAUGHTERSPROPOSALOBEINGCARRIEDHOME;NEITHERDIDTHEAPOTHECARY,WHOARRIVEDABOUTTHESAMETIME,THINKITATALLADVISABLE.AFTERSITTINALITTLEWHILEWIJANE,ONMISSBINGLEYSAPPEARANCEANINVITATION,THEMOTHERANDTHREEDAUGHTERSALLATTENDEDERINTOTHEBREAKFASPPARLOUR.BINGLEYMETTHEMWITHHOPESTHAPMRSBENNETHADNOFFOUMMISSBENNETWORSETHANSHEEXPECTED.

The first, lowercase version:

elizabethpassedthechiefothenightinersistersroom,aninthmorninghadthepleasureobeingabletosendatolerableanswertotheinquirieswhichsheveryearlyreceivedfrommrbingleybyahousemaid,ansometimeafterwardsfromthtwoelegantladieswhowaitedonissisters.inspiteothisamendment,however,sherequestedtowaveanotesenttolongbourn,desirinermothertovisitjane,anformerownjudgementoersituation.thenopewasimmediatelydispatched,anitscontentsasquicklycompliebwith.mrsbennet,accompaniedbyertwoyoungestgirls,reachednetherfieldsoonafterthefamilybreakfast.

adshefounjaneinanyapparentdanger,mrsbennetwouldabeenverymiserable;bupbeinsatisfiedonseeinerthaterillnesswasnotalarming,sheadnowishoferrecoveringimmediately,asherrestorationtohealthwouldproblyremoverfromnetherfield.shewouldnotlisten,therefore,toerdaughtersproposalobeingcarriedhome;neitherdidtheapothecary,whoarrivedaboutthesametime,thinkitatalladvisable.aftersittinalittlewhilewijane,onmissbingleysappearanceaninvitation,themotherandthreedaughtersallattendederintothebreakfaspparlour.bingleymetthemwithhopesthapmrsbennethadnoffoummissbennetworsethansheexpected.

Even I have some difficulties with that. Note that the longer words are generally easier to read, because a fluent reader recognises words based on overall shape, not by reading individual letters. A long word tends to have a more distinctive shape and is more unique, whereas passages of short words require you to identify more words in the same space. For example, "judgement" and "accompanied" are easier to spot than "met them with".

And here's the original:

Elizabeth passed the chief of the night in her sister's room, and in the morning had the pleasure of being able to send a tolerable answer to the inquiries which she very early received from Mr. Bingley by a housemaid, and some time afterwards from the two elegant ladies who waited on his sisters. In spite of this amendment, however, she requested to have a note sent to Longbourn, desiring her mother to visit Jane, and form her own judgement of her situation. The note was immediately dispatched, and its contents as quickly complied with. Mrs. Bennet, accompanied by her two youngest girls, reached Netherfield soon after the family breakfast.

Had she found Jane in any apparent danger, Mrs. Bennet would have been very miserable; but being satisfied on seeing her that her illness was not alarming, she had no wish of her recovering immediately, as her restoration to health would probably remove her from Netherfield. She would not listen, therefore, to her daughter's proposal of being carried home; neither did the apothecary, who arrived about the same time, think it at all advisable. After sitting a little while with Jane, on Miss Bingley's appearance and invitation, the mother and three daughters all attended her into the breakfast parlour. Bingley met them with hopes that Mrs. Bennet had not found Miss Bennet worse than she expected.

For comparison, here's some actual Japanese. I believe it is technically kerned, but because of differences in the scripts, I still find it looks essentially monospaced to me.

メル・ヘイスティングスはいきなりヒステリックな笑い声を出した。「これであなたの頭が変だということがわかりましたよ、先生!あのやさしい、愛情深いアリスに心臓《ハート》がないだなんて!彼女はよく言ったものです。『わたしにはアタマがないのよ。そうでなきゃ、どんくさい新聞記者なんかと結婚するもんですか。でも、だからわたしにはハートがあるの。あなたと恋に落ちたのはわたしのハートなのよ。アタマじゃなくて』彼女はぼくを愛していたんです。わかりませんか」

ドクタ・ウインタースは彼をそっと連れ出そうとした。「もちろん、わかりますよ。どうぞこちらへ、ミスタ・ヘイスティングス。しばらく横になって休みましょう。気付けになにか持ってきます」

メルは導かれるままに近くの小部屋に入った。医者が持ってきた液体は飲んだが、横になろうとはしなかった。

「たしかに見ましたよ」呆然としながらもきっぱり彼はそう言った。「でも説明なんかどうでもいい。ぼくはアリスを知っていたんです。彼女はぼくや先生以上にちゃんとした人間だった。異常なんてちっともなかった――もっとも去年ごろから、ぼくらがあるとき一緒に火星に行ったと思いこんでいたけど」

And purely for completeness here's the same text roughly as it'd look given the Roman treatment. It's a little easier for me to read, especially once I added a spacing line between the characters, but still feels somewhat monotonous to me (is it the squareness of the characters, I wonder? I find the very distinctive sized っ makes a word much easier to parse...), and I imagine it makes little difference to a fluent reader:

メル・ヘイスティングス  は  いきなり  ヒステリック  な  笑い声  を  出した。  「これ  で  あなた  の  頭  が  変  だ  という  こと  が  わかりました  よ、  先生!  あの  やさしい、  愛情  深い  アリス  に  心臓  《ハート》  が  ない  だ  なんて!  彼女  は  よく  言った  もの  です。  『わたし  に  は  アタマ  が  ない  の  よ。  そう  で  なきゃ、  どん  くさい  新聞  記者  なんか  と  結婚  する  もん  です  か。  でも、  だから  わたし  に  は  ハート  が  ある  の。  あなた  と  恋  に  落ちた  の  は  わたし  の  ハート  な  の  よ。  アタマ  じゃ  なく  て』  彼女  は  ぼく  を  愛して  いたん  です。  わかりません  か」

ドクタ・ウインタース  は  彼  を  そっと  連れ出そう  と  した。  「もちろん、  わかります  よ。  どうぞ  こちら  へ、  ミスタ・ヘイスティングス。  しばらく  横  に  なって  休みましょ  う。  気付け  に  なにか  持って  きます」

メル  は  導かれる  まま  に  近く  の  小  部屋  に  入った。  医者  が  持って  き  た  液体  は  飲ん  だ  が、  横  に  なろ  う  と  は  し  なかった。

「たしか  に  見ました  よ」  呆然  と  し  ながら  も  きっぱり  彼  は  そう  言った。  「でも  説明  なんか  どう  でも  いい。  ぼく  は  アリス  を  知って  い  た  ん  です。  彼女  は  ぼく  や  先生  以上  に  ちゃんと  した  人間  だった。  異常  なんて  ちっとも  なかった――  もっとも  去年  ごろ  から、  ぼく  ら  が  ある  とき  一緒  に  火星  に  行った  と  思いこん  で  いた  けど」

Why?

There's reasons for these differences, including the fact that in normal Japanese, a reader can break up the text by the placement of the kanji (the more complicated characters). Even Japanese people find text written entirely in hiragana or katakana to be slow and inconvenient to read, apparently.

Studies indicate that spaces are helpful in reading kana text, but not in reading text with kanji. The kanji allow readers to break up the text enough to spot the word-shapes easily. I can see that to some extent, but would really have liked to see this analysed in more detail.

I noticed that the sample "kanji with kana" text has a very heavy density of kanji compared to anything I normally see: 86 of 199 (average) characters were kanji, so nearly half. This is about the proportion I found in a short formal newspaper article about politics. Formal writing often uses Chinese-derived kanji words where more ordinary writing would use a Japanese expression with one or no kanji. Quite a lot of these were proper nouns written out in full, which suggests it would decrease with a longer article, and also vary by subject. I've also seen it noted that kanji use is partly a deliberate choice to save space, as kanji are far more compact. Here, the kana mostly mark certain grammatical information in between kanji words; there are rarely more than two kana terms adjacent. Mostly we have a sort of alternation which means the word boundaries are actually pretty clear.

菅義偉官房長官は19日午前の記者会見で、消費増税の税収を財源に想定している「子ども・子育て支援新制度」について、増税時期を先送りしても「予定通り施行したい」と述べた。

菅義偉  官房長官  は  19日  午前  の  記者会見  で  消費増税  の  税収  を  財源  に  想定  して  いる  「子ども  子育て  支援  新制度」について、増税  時期  を  先送り  して  も  「予定  通り  施行  したい  」と  述べた。

As you can see, in most cases the word boundary occurs at the switch between kanji and kana. There are a couple of more complex cases where it's arguable (to me) whether something is one or two words, like "消費増税" (consumption tax increase) which can't quite be broken down the same way as the English can.

A random blogpost had 62 kanji out of 213 characters, so just over a quarter. As you might expect, it's much harder to pick out the words in this one.

黄のシールお使いになる前に店員に一声おかけください。使用には知識が必要なものに張られていることが多いのですが、自分で動かせることがお貸出し条件となります。赤のシール修理中のものや店員にも使い方が良く分かっていないものに張られています。

黄  の  シール  お使い  に  なる  前  に  店員  に  一声  おかけ  ください。使用  に  は  知識  が  必要  な  もの  に張られて  いる  こと  が  多い  の  です  が、自分  で  動かせる  こと  が  お貸出し  条件  と  なります。 赤  の  シール  修理中  の  もの  や  店員  に  も  使い  方  が  良く  分かって  いない  もの  に  張られて  います。

Having looked around, I can't find a reliable source, but I've seen several places stating that the average kanji proportion in writing is only around 30%, which feels about right for the stuff I see day-to-day (I rarely glance at a newspaper). Novels seem a bit higher, but I would say not much unless you look at Serious Literature. This helps explain why I find reading so difficult... not enough kanji, it would seem!

Basically it seems that reading in Japanese (as opposed to simply being able to read) is a substantially different skill from reading in English. You can't just transfer over existing ability and then learn the language; you have to train your brain to do a different kind of pattern-recognition to interpret the text. Until I manage that, I'm stuck with painstakingly pushing through a character or two at a time, trying to actively analyse rather than having the words and sentence structure leap out at me.

Right, musing over, time for bed.

No comments:

Post a Comment