Apparently Tim Bray gave a talk at RubyConf about “RubyConf: I18n, M17n, Unicode, and all that”, which discussed the approaches other languages take. It’s a pity that Tcl wasn’t mentioned, as it basically gets things right. It is fully unicode aware, and generally “just works”, because all strings are unicode strings, by default – there isn’t a separate syntax, or commands to create and manipulate multibyte strings and characters.
% set c u0065 e % string bytelength $c 1 % string length $c 1 % set c u2022 • % string bytelength $c 3 % string length $c 1
That works for regular expressions, too.