[xmlsec] Re: non us-ascii filenames in user locale

xs04.jmdesp at free.fr xs04.jmdesp at free.fr
Tue Jun 29 03:52:55 PDT 2004

Selon Roumen Petrov <xmlsec at roumenpetrov.info>:
> When application create xmldsig xml file should be in user locale.
> Sample:
> 1.) LANG=fr_FR, charmap ISO-8859-1
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
>   <SignedInfo>
> ......
> In method "xmlSecTransformCtxExecute" from "transforms.c" we know 
> document encoding and url.

Roumen, at the end the problem is : "how to interpret url when they have non
ASCII content ?"

Url ought not directly include non ASCII character, but to use the %XX form for

It is recommended, but not garanteed, that those %XX characters represent
content in utf-8.

The *simple* solution is for the application to decode the %XX encoding, and
send the result directly to the OS api. Then it's up to the personn who created
the file to set the correct parameters so that it works.

This is in fact how most browser handle url, I believe Mozilla does nothing else.

Now can we be smarter ? Not a lot.

Under Unix, if you have reason to believe the url is in utf-8, it would be
better to convert it to the encoding of the locale before trying to open the
Fortunately, if the string is valid UTF-8, there's a 99,96% probability
it's really UTF-8 (the 99,96% number comes from a test over several hundred
thousand messages on usenet).
So you can just decode the %xx encoding and try to convert the string to the
local encoding. If it fails, then you just use the string as is.

Under Windows, you can just use the ANSI file API and do the same thing. 

You can try to use the Windows unicode API, but :
- you'll need to special case Win9x/Me as described earlier
- if the url is not UTF-8, you have no idea what it is, so you can not convert
to Unicode. On the other hand, if it's neither UTF-8, nor CP_ACP, you know
nothing about what to do with it, also if the content is not CP_ACP compatible
it will *not* be possible to open the file under 9x/Me, so you might only handle
those two cases, and ignore the possibility it's something else.

More information about the xmlsec mailing list