Friday, July 15, 2011

fuck iso-8859-16

... and the ignorant bastards that cannot distinguish between an aesthetic choice when designing font faces and character encoding.

iso-8859-16 is not for Romanian, and the traditional and correct way to write in Romanian is with s+cedilla, t+cedilla ... which means iso-8859-2. Want ş with a tiny comma below the "s" ? Fucking design a new font face and give the rest of us a break. Rumanians wrote with s+cedilla for 150 years and did not lose sleep over it, but now we have to deal with two fucking character sets that don't translate and which look so much alike on screen that you need fucking glasses to distinguish between them.

Yes, you know who you are.

4 comments:

  1. Do you like how the Unicode Character Set handles it? I'm looking for weird situations in as many languages as I can find for my Perl Unicode class at YAPC. :)

    ReplyDelete
  2. not sure what you mean by "Unicode Character Set" when it's so capitalized and formal :)

    in UTF8 "București" ne "Bucureşti", so Unicode encodes ş and ș as different letters, which is a pain, since the shape is almost the same and most people, including professional linguists, do not distinguish between the two letters. Even if I was aware of this issue since it started, yesterday I wasted two hours trying to understand why the two strings are not equal ... did not see one was with comma the other with cedilla.

    The two shapes were used interchangeably ever since Rumania began using the Latin alphabet (1850s), for example in the same page the cedilla version was used for italics, capitalized titles or larger regulars, and the comma version was used for smaller type.

    ReplyDelete
  3. First of all the correct way to write Romanian (as standardized in SR 13411:1999) is using ș (U+0219 LATIN SMALL LETTER S WITH COMMA BELOW) and ț (U+021B LATIN SMALL LETTER T WITH COMMA BELOW) and their uppercase variants. This is covered by the "Latin Extended-B Unicode Block" characters.

    The "Latin Extended-A Unicode Block" which contain the S WITH CEDILLA and T WITH CEDILLA, wrongly used for Romanian, notes that the S and T with COMMA BELOW are preferred for Romanian (see Unicode Standard Version 6).

    If you look at any book printed before the 1990s they use S and T with COMMA BELOW. It's only since the 1990s that wrong computer standardization and implementation started this mess.

    Please read more about this confusion in Romanian diacritic marks - Kit blog and Secărică's page (in Romanian).

    Secondly, it's 2011, ISO-8859-2 (latin2) and ISO-8859-16 (latin10) and any other legacy character sets must die! Everything should be written in or converted to UTF-8.

    I don't know what font are you using but i haven't had this problems with the DejaVu fonts on GNU/Linux and Windows systems, even at small sizes.

    ReplyDelete
  4. Dear Arpad,

    the "comma" versus "cedilla" problem is a fake one: it's a matter of designing type, not a matter of code points. ş and ș are the same letter.

    I am looking right now at a book published by the Romanian Academy right after WWI, and ş are with cedilla, except for capital Ş, which are with comma. Even "Monitorul Oficial" used cedilla and comma for aesthetic effects since it was first published, same way all the major publishing houses did.

    The printing houses from right before 1989 used a lot a single type face in order to save costs.

    I have already written at large to Secarica about this fucking problem when he was just lobbying about this. I was unable to persuade him because the reason the "cedilla" version was considered "un-Rumanian" was because was used by the Turkish version of the Latin alphabet, and for some weird reason that was not regarded as a good thing.

    Anyway, the "no diacritics" version of Rumanian writing is gaining ground and public acceptance. The Rumanian Academy and ICI can decree as much as they want to that the "interchange standard for Romanian" is iso-88590-16, or change orthographic rules every 10 years, or swap letters around, what is used in the wild is what is makes sense to people who don't have their heads stuffed up their ass.

    This is about ignorant asses going on a power trips and changing stuff for bizarre (if not xenophobic or plain racist) reasons all the time, not about what font is showing better the difference between the "comma" and the "cedilla".

    I will waste no more time with this, I'm wasting enough time already normalizing text.

    ReplyDelete