This HTML text is in UTF-8.

The fixed USF file, USFV100_sample-Arabic-fixed.usf [ CRC: 932092AC ] is here:
http://www.faireal.net/matroska/USFV100_sample-Arabic-fixed.usf

Explanation for Arabic Fix for USFV100_sample.usf
2003-03-01

The first letter LAM ل is D984 in UTF8, and the last letter is D89F.

Therefore, in the file USFV100_sample.usf, [ D89F D8A8 ... D985 D984 ] at 001E5E-001EA9 should be [ D984 D985 ... D8A8 D89F ] to be a standard Arabic stream in UTF-8.

As this sample in Arabic also contains left-to-right sections in it (i.e. tags <b> and </b>), you will see something like this in your editor:

What you will see in your editor:
PNG Image

In Binary

Original

 ADDRESS   00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
----------------------------------------------------------
 00001E50                                            D8 9F
 00001E60  D8 A8 D8 B3 D8 AD D9 81 20 3C 62 3E D8 A9 D9 8A
 00001E70  D8 A8 D8 B1 D8 B9 D9 84 D8 A7 3C 2F 62 3E 20 D8
 00001E80  A9 D8 BA D9 91 D9 84 D9 84 D8 A7 20 D9 86 D9 88
 00001E90  D9 85 D9 84 D9 83 D8 AA D9 8A 20 D8 A7 D9 84 20
 00001EA0  D8 A7 D8 B0 D8 A7 D9 85 D9 84

Fixed

 ADDRESS   00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
----------------------------------------------------------
 00001E50                                            D9 84
 00001E60  D9 85 D8 A7 D8 B0 D8 A7 20 D9 84 D8 A7 20 D9 8A
 00001E70  D8 AA D9 83 D9 84 D9 85 D9 88 D9 86 20 D8 A7 D9
 00001E80  84 D9 84 D9 91 D8 BA D8 A9 20 3C 62 3E D8 A7 D9
 00001E90  84 D8 B9 D8 B1 D8 A8 D9 8A D8 A9 3C 2F 62 3E 20
 00001EA0  D9 81 D8 AD D8 B3 D8 A8 D8 9F                  

In Source Text

Original

    <subtitle start="00:01:36.000" stop="00:01:40.000">
      <text alignment="TopCenter">
        <font face="Bitstream Cyberbit" size="-2">
          Russian : Почему же они не говорят <b>по-русски</b>?<br/>
          Chinese : 他們爲什麽不說中文(<b>台灣</b>)?<br/>
          Arabic : ؟بسحف  <b>ةيبرعلا</b> ةغّللا نوملكتي ال اذامل</font></text>
      <text style="NarratorSpeaking">
          from <i>The "anyone can be provincial!" page</i> :<br/>
          http://www.trigeminal.com/samples/provincial.html
      </text>
    </subtitle>

Fixed

    <subtitle start="00:01:36.000" stop="00:01:40.000">
      <text alignment="TopCenter">
        <font face="Bitstream Cyberbit" size="-2">
          Russian : Почему же они не говорят <b>по-русски</b>?<br/>
          Chinese : 他們爲什麽不說中文(<b>台灣</b>)?<br/>
          Arabic : لماذا لا يتكلمون اللّغة <b>العربية</b> فحسب؟</font></text>
      <text style="NarratorSpeaking">
          from <i>The "anyone can be provincial!" page</i> :<br/>
          http://www.trigeminal.com/samples/provincial.html
      </text>
    </subtitle>

In Pic (What you will see in your editor; Not the final presentation)

Original

PNG Image

Fixed

PNG Image

link

Unicode Standard Annex #9 The Bidirectional Algorithm
http://www.unicode.org/unicode/reports/tr9/
This document describes specifications for the positioning of characters flowing from right to left, such as Arabic or Hebrew.

seelie317@faireal.net

[ index ]