Ruby, Unicode and Tcl

Posted by David N. Welton Sun, 22 Oct 2006 04:49:36 GMT

Apparently Tim Bray gave a talk at RubyConf about "RubyConf: I18n, M17n, Unicode, and all that", which discussed the approaches other languages take. It's a pity that Tcl wasn't mentioned, as it basically gets things right. It is fully unicode aware, and generally "just works", because all strings are unicode strings, by default - there isn't a separate syntax, or commands to create and manipulate multibyte strings and characters.

% set c \u0065

e
% string bytelength $c
1
% string length $c
1

% set c \u2022
•
% string bytelength $c
3
% string length $c
1

That works for regular expressions, too.

2 comments | atom

Trackbacks

Use the following link to trackback from your own site:
http://journal.dedasys.com/trackbacks?article_id=ruby-unicode-and-tcl&day=22&month=10&year=2006

  1. Anton Tagunov
    1 day later:

    "because all strings are unicode strings, by default"

    could be

    "because all strings are unicode strings, UTF-8 encoded, by default"

    ?

    (Judging by bytelenth)

  2. Michael Schlenker
    1 day later:

    More or less, Tcl uses a variant of UTF-8 (no NULL bytes, those are encoded to ease use of C string functions) internally, and UCS-2 in other places.

    Its an internal detail and 'string bytelength' is more of a debugging tool than an heavily used command. If you use Tcl you don't see the bytelength anywhere, unless you really want to see it, usually you just configure the encodings on the boundaries of I/O and never worry again.