[xmlsec] non us-ascii filenames in user locale
igor at zlatkovic.com
Wed Jun 23 07:31:22 PDT 2004
On 22.06.2004 21:50, Aleksey Sanin wrote:
> > Oh, you need also an easy autoreply for all these "but it works on NT"
> > posts which follow :-)
> Well, the interesting thing is that it might not be that bad. We have
> this problem, because NT allows one to have Unicode filenames. Thus,
> you can have Russian filename on Windows NT box with German default
> locale. AFAIK, this does not work on Win9x and the Russian filename
> would be corrupted (i.e. it will have all these '?' characters) if
> the default locale is not Russian.
Win9x and NT don't use the same filesystem. A file name on NTFS will
always be displayed correctly if you have a font with the required
glyphs. A file name on FATXX will always be a guesswork, because its
interpretation depends on the current locale.
It is not an oddity of a platform which is to be accepted and worked
around. Rather it is a simple fact that Win9x isn't fit for computing.
> A long time ago when I was doing client programming, the full solution
> for this problem was pretty complex. The interesting fact is that while
> _wfopen() function is not implemented on Win9x, the stub for this
> function is still present in MSVC runtime dll. This means that you can
> use this function in your program and the program will be loaded
> correctly (the function will be found in the dll!) but when you try to
> call it, you'll get an error back. Thus the solution was the following:
> 1) Function XYZ() always accepts UTF8 string for the filename.
> 2) In runtime, function XYZ() determines the operating system and
> - if it is WinNT then filename is converted from UTF8 to UCS2 and
> _wfopen() function is used
> - if it is Win9x then filename is converted from UTF8 to current
> locale, fopen() function is used and programmer crosses his/her
> fingers that user never has a filename in different locale.
> Not sure if this is an overkill for LibXML2 or not but this is the best
> solution for this problem I know.
Since we deal with UTF-8 in libxml and NTFS deals with UTF-16, such
conversion will probably not be avoidable in a long term. The conversion
is guaranteed to succeed without data loss, good that it's all that
there is to it.
When I think of Win9x again, well, Microsoft doesn't support these any
longer, why should we? Most such computers connected to the net are
probably not doing more than spreading infected mail.
More information about the xmlsec