Apparently Tim Bray gave a talk at RubyConf about “RubyConf: I18n, M17n, Unicode, and all that”, which discussed the approaches other languages take. It’s a pity that Tcl wasn’t mentioned, as it basically gets things right. It is fully unicode aware, and generally “just works”, because all strings are unicode strings, by default – there isn’t a separate syntax, or commands to create and manipulate multibyte strings and characters.
% set c u0065
e
% string bytelength $c
1
% string length $c
1
% set c u2022
•
% string bytelength $c
3
% string length $c
1
That works for regular expressions, too.