[xmlsec] Re: non us-ascii filenames in user locale

Roumen Petrov xmlsec at roumenpetrov.info
Fri Jun 25 02:46:26 PDT 2004


Aleksey Sanin wrote:

> [SNIP]
>
>> Before to xmlSecTransformCtxUriExecute(...) when encoding is not 
>> NULL(is it posible?) or UTF-8 we can convert ctx->url from UTF-8 to 
>> "document encoding", to replace temporary ctx->url with new string 
>> and to call xmlSecTransformCtxUriExecute.
>
> It's a guess. Who said that the document filename is in the document
> locale???


A.) From libxml "Encodings support" page 
(http://www.xmlsoft.org/encoding.html) :
....
for examples when adding a text node to a document, the content would 
have to be provided in the document encoding
....

B.) From rfc2396 (http://www.ietf.org/rfc/rfc2396.txt):
....
  However, there is currently
   no provision within the generic URI syntax to accomplish this
   identification. An individual URI scheme may require a single
   charset, define a default charset, or provide a way to indicate the
   charset used.

   It is expected that a systematic treatment of character encoding
   within URI will be developed as a future modification of this
   specification."
....

C.) From "XML-Signature Syntax and Processing " 
(http://www.w3.org/TR/xmldsig-core/)
....
4.3.3.1 The URI Attribute ..."
The URI attribute identifies a data object using a URI-Reference, as 
specified by RFC2396 [URI]. The set of allowed characters for URI 
attributes is the same as for XML, namely [Unicode]. However, some 
Unicode characters are disallowed from URI references including all 
non-ASCII characters and the excluded characters listed in RFC2396 [URI, 
section 2.4]. However, the number sign (#), percent sign (%), and square 
bracket characters re-allowed in RFC 2732 [URI-Literal] are permitted. 
Disallowed characters must be escaped as follows:

Each disallowed character is converted to [UTF-8] as one or more octets.
Any octets corresponding to a disallowed character are escaped with the 
URI escaping mechanism (that is, converted to %HH, where HH is the 
hexadecimal notation of the octet value).
The original character is replaced by the resulting character sequence.
....



 >From A. I expect in Reference node URI to be in document encoding.
 >From B. I see that we are free to use in URI any charset.
C.  define that we should use UTF-8 encoding.


When document encoding is not acceptable as default charset for 
"Reference URIs" might we should provide in xmlsec way "to indicate the 
charset used" ?



For me solution is clear. I will create xmldsig document with encoding 
same as user locale charmap and filename(URI) will be converted from 
locale charmap to UTF-8 and escaped. Later from UTF-8 URI I will convert 
back to charset specified in xmldsig document encoding.
When I would like to use UTF-8 URI I will create xmldsig document in 
UTF-8 encoding.
When I would like to use URI in ISO-8859-1 or CP1251  I will create 
xmldsig document in corresponding encoding.


Regards,
Roumen Petrov




More information about the xmlsec mailing list