Comical / Feature Requests / #14 Support for Unicode

Nobody/Anonymous - 2006-03-26

Logged In: NO

I can confirm this. It works for some non-ascii chars but if
I put a Japanese character in the filename comical refuses
to open it. And if I put a Japanese character in one of the
image filenames it crashes with this bt:

Program received signal SIGABRT, Aborted.
[Switching to Thread 1083197792 (LWP 2490)]
0x00002aaaac09913d in raise () from /lib/libc.so.6
(gdb) bt

0 0x00002aaaac09913d in raise () from /lib/libc.so.6

1 0x00002aaaac09a86e in abort () from /lib/libc.so.6

2 0x00002aaaabb7f7d7 in

__gnu_cxx::__verbose_terminate_handler ()
from /usr/lib/libstdc++.so.6

3 0x00002aaaabb7d866 in __gxx_personality_v0 () from

/usr/lib/libstdc++.so.6

4 0x00002aaaabb7d893 in std::terminate () from

/usr/lib/libstdc++.so.6

5 0x00002aaaabb7d97a in __cxa_throw () from

/usr/lib/libstdc++.so.6

6 0x00000000004350ad in ComicBookZIP::ExtractStream ()

7 0x000000000042f276 in ComicBook::Entry ()

8 0x00002aaaab94a942 in wxThreadInternal::PthreadStart ()

from /usr/lib/libwx_baseu-2.6.so.0

9 0x00002aaaabf5b12a in start_thread () from

/lib/libpthread.so.0

10 0x00002aaaac1313c3 in clone () from /lib/libc.so.6

11 0x0000000000000000 in ?? ()

I can look into making a patch for it if I have time.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-03-27

Logged In: NO

This is exactly the kind of bug report I asked for a few
months back when I switched to Unicode here. Can you point
me toward some comic books which Comical objects to?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-03-27

Logged In: NO

Not really. I didn't have any comic books with those types
of characters so I just added some of the characters into
them myself, both in the images inside and the filename of
the comic. For example, I added unicode char U+BB82 to the
files. On further testing, if the filename of the comic
itself contains unicode it works fine for rar archives but
it refuses to open with unzip, implying a limitation with
the unzip library you're using. Forget what I said earlier
about it working for "some" non-ascii chars; zips don't work
for any non-ascii. At least this fails gracefully and
outputs an error and continues. I see in ComicBookZip.cpp
that you assume the filename is ascii and convert to that
(this again might be because of the unzip limitation).

For the filenames of the images inside an archive, if they
contain unicode then it crashes with the below backtrace. I
could only test zip files for this since there is no linux
support for creating rar archives (f**kin rarlabs).

Also, can you please add a debugging mode to the makefile?
CXXFLAGS = '-g -ggdb -D_DEBUG' and LDFLAGS = '-g -ggdb'
would need to be added. This would allow me to see in which
line it actually failed in the backtrace.

-Steven Sheehy

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-04-18

Logged In: NO

I made a patch to fix the unicode problems with Comical. It
fixes all problems I mentioned below (at least on linux
amd64...needs testing elsewhere). I used mb_str(wxConvLocal)
to convert from unicode to the system's ANSI code page. I
also went ahead and rewrote the unicode stuff in
ComicBookRar to not have to use #ifdef wxUSE_UNICODE
everywhere. I've tested with and without unicode support
compiled into Comical and it works properly for both. It
works for both RAR and ZIPs with non-ASCII chars in their
filename and it works for non-ASCII chars for the images
inside the archive (not able to test the latter for RAR
archives since I'm on Linux).

I also rewrote setPassword to convert the inputted password
to ANSI code page before sending to the rar and zip
libraries. This part is untested since I don't have any
password protected archives. Patch is here:

http://www.utdallas.edu/~sas014510/unicode.patch

This patch was made with svn rev 157 since current versions
don't work.

Steven Sheehy (steven[d0t]sheehy[@t]gmail[d0t]com)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dennis Lim - 2006-04-24

Logged In: YES
user_id=117202

Latest version 0.8 of comical is using the minizip libraries.
Unfortunately this requires a filename.toAscii() call while
opening the file. This is moving towards less unicode
support, not more.
I'm not sure why the wxZipImputStream was abandoned. At
least that had support for unicode on the filenames. We can
work out the other problems later.
I'm a programmer and I don't mind helping to fix bugs but so
far I've not encountered any other unicode related problems
with the comics I read.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-04-24

Logged In: NO

Um...did you not notice my post right below yours? I fixed
the problem with unicode, including zips. The minizip does
not require ascii, it just requires that the filename passed
to it is encoded in the system's ANSI codepage since it just
calls the system call fopen(), thus they can't be wide
character strings. Why not try out the patch I posted and
then comment?

-Steven S.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dennis Lim - 2006-04-25

Logged In: YES
user_id=117202

Thanks for the patch. I've tested your patch on v0.8 on windows.
The problem I originally faced was with a (c) copyright
symbol on the path. The patch has fixed this.
Further testing with some chinese string I copied somewhere
didn't work. The test string was "韩剧热线诚聘[韩、英语翻
译].cbz". Testing with comical v0.7 worked fine.

I guess I was just venting frustration at having something
working get broken. The move to minizip made it more trouble
to compile on windows. I had to download some DLL from here
and header files from there, etc. I was actually hoping to
move towards wxZipInputStream and perhaps even create a
wrapper to wxRarInputStream to solve this. Oh well.

Anyway, I'll be using your patch for now since it solves my
particular problem. However it probably is not a long term
solution since it does not handle all cases. If you could
get it to work with the string I paste above please let us
know. Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-04-25

Logged In: NO

Thanks for testing. I knew it worked for me but it's good to
have other people confirm it. Can you also test unicode in
the files inside the archive?

I tested that string and it worked perfectly for me. I'm not
sure why it failed for you. By fails, do you mean it crashes
or outputs an error or what? Can you put some printfs to see
if it's passed to minizip fine?

BTW, it may be that your cbz file is actually a cbr file.
When this happens, comical refuses to open it (another bug
not related to unicode). Try renaming it to cbr and see if
it works.

-Steven S.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dennis Lim - 2006-04-27

Logged In: YES
user_id=117202

Hmm.. the email sent out by sourceforge seems to have
mangled the filename. I can not see the chinese chars even
though I have 'asian fonts' installed on XP. (I don't know
how these things work on linux)
However, from the web page, the filename looks fine. i.e. I
can still see the chinese characters.
Perhaps if you used the email to test, it was mangled. try
to cut and paste from the web. You should be able to
actually see some chinese characters before you test. Also,
perhaps the behavior is different on linux and XP. I'm
running on windowsXP.
It fails inside minizip. We get a null from the unzOpenFile
or something.

The problem may not be a big deal. I tested with winzip and
it doesn't seem to want to open it either. However,
comical0.7 does open it fine. I guess it depends on what
tools were used to create this files.

At this point I would consider your patch an improvement
over the existing 0.8 code base. Have you considered
submitting in the patch section?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-04-27

Logged In: NO

As you can see, I'm not registered so I don't get
sourceforge email. I used the chinese characters you posted
on the web. On linux, we don't have to install "asian fonts"
since we have proper unicode support built right in. That's
the beauty of unicode, all the necessary characters are in
one encoding...don't know why ms makes you install extra
stuff to do that. If I can't reproduce your problem, then I
can't really fix it. So feel free to have a look at if you
can. I don't think minizip is the problem since it works for
me just fine on linux. You may just want to create a simple
test file that uses fopen() passed with a utf8 filename to
see if it works on windows.

btw, can you verify if the bug I submitted on the bug page
happens to you?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dennis Lim - 2006-04-28

Logged In: YES
user_id=117202

I think that what happens in windows is that the unicode
support is there in the applications. However, if you try
to view a chinese text, it will just output some funny
square characters instead. This is due to lack of font. The
fonts are not installed by default because unicode fonts
can be rather big.

It do like unicode and it would be great if the minizip
interfaces were all using that. Unfortunately, they use
char*.

At this point, your patch fixes my problem with
the 'copyright' symbol. I do not read manga yet so I'm not
personally facing any problems with that. I'm more
interested to see if the original poster has any sample
strings or sample files that I can test with.
The string I posted below is interesting in that although
comical 0.7 can open it, winzip cannot. Therefore, it may
not be a fair test.

So I'm just going to use your patch and leave it at that
unless somebody is facing a problem and can submmit a
sample file that does not work. (i.e. actual situation, not
some contrived test). I don't want to be spending time
fixing something which nobody needs. Who knows, I may
revisit this later and do something about it.

BTW, about the bug, I tested and posted in the bug section.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nobody/Anonymous - 2006-04-28

Logged In: NO

I think you're confused. It is completely acceptable that
minizip uses char for its strings. The other choice would
be wchar_t (ie wxString in unicode mode). char uses 8-bit
characters, whereas wchar_t is usually 32 bits. Unicode
comes in many shapes and sizes: 7, 8, 16, 32 bits encodings
are all possible. So to encode unicode into 8 bits, one
would just use utf8. However, not all input is necessarily
unicode. It could be ISO-8859-1, CP1252, etc. depending on
the operating system's ansii code page. So in wxwidgets, the
sequence of encodings for a system that is CP1252, for
example, would be CP1252(from file being opened)->wide
character unicode (ISO 10646 stored in wxString)->CP1252
(using mb_str() in my patch). Perhaps your system's codepage
is not an 8-bit encoding and that is why it's not working.
The copyright symbol is actually still less than 8 bits (but
above ASCII) so that's probably why that worked. Maybe you
need to force a 8bit encoding being passed off to minizip.
Try using wxConvUTF8 instead of wxConvLocal and see if that
works. Make sure you replace all of them if you do.

I tried the above and it worked for me, but my system's code
page is utf8 so that is to be expected.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dennis Lim - 2006-05-02

Logged In: YES
user_id=117202

You're right. I got confused. I forgot unicode can be
encoded in UTF8. It is usually the case that unicode is
encoded in wchar_t.

I'm not too familliar with what my OS native encoding is. I
believe that Windows has 2 sets of API, one for ASCII and
one for unicode indicated by a postfix W or A. This is
usually handled at compile time depending on the unicode
define. (and unicode versi9on usually uses wchar_t)

Anyway, I tested with wxConvUTF8 and it didn't work either.
I believe that looking at the bytestream, the OS cannot
automatically determine UTF16 or UTF8, therefore this
determination is by convention or API documentation. Also,
usually windows is using UTF16 for it's API so that's why
it's not working.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hito - 2006-06-21

Logged In: YES
user_id=880396

I do not think winzip (or winrar) has proper unicode
support. The only program I know that does it nicely is
7-zip (and that's why I use it).

I, too, am looking for an image viewing program with unicode
support. If this program will become one, it would be great :)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Support for Unicode

Cross-platform CBR/CBZ (comic book) reader

Group

Searches

Help

#14 Support for Unicode

Discussion

0 0x00002aaaac09913d in raise () from /lib/libc.so.6

1 0x00002aaaac09a86e in abort () from /lib/libc.so.6

2 0x00002aaaabb7f7d7 in

3 0x00002aaaabb7d866 in __gxx_personality_v0 () from

4 0x00002aaaabb7d893 in std::terminate () from

5 0x00002aaaabb7d97a in __cxa_throw () from

6 0x00000000004350ad in ComicBookZIP::ExtractStream ()

7 0x000000000042f276 in ComicBook::Entry ()

8 0x00002aaaab94a942 in wxThreadInternal::PthreadStart ()

9 0x00002aaaabf5b12a in start_thread () from

10 0x00002aaaac1313c3 in clone () from /lib/libc.so.6

11 0x0000000000000000 in ?? ()