Menu

#4 Today's CVS: segmentation rules ?

open-accepted
nobody
None
5
2014-03-29
2007-01-26
No

I have 2 straightforward files already aligned:

1610 lines, one file in JA, one in EN

files are taken from a user manual, with chapter heads etc.

I spent the whole day manually aligning the files and I just need them to be exported to TMX.

So, I open them one by one, in b2t, I set language and encoding, and when b2t loads them I get a totally unexpected display: The Japanese side is much shorter than expected and the full display is only 1400 lines or so.

So I go to tools, set segmentation to line breaks and load the files again.

Here again I get a weird segmentation. For ex, the chapter heads come split: number on a line, chapter name in the other.

I can't find any way to set segmentation rules, or rather unset them. So the situation is a little frustrating...

Discussion

  • Raymond Martin

    Raymond Martin - 2008-02-15

    Logged In: YES
    user_id=1111672
    Originator: NO

    This situation will be looked into. The present segmentation is based upon very simple rules (i.e. a simple regexp string) and probably cannot handle Japanese to any reasonable extent (although I haven't tested with Japanese text). Better segmentation will be implemented as B2T moves to version 1.0.

    Raymond

     
  • Raymond Martin

    Raymond Martin - 2008-02-15
    • status: open --> open-accepted
     
MongoDB Logo MongoDB