Shift Tables – National Language SMS in 160 characters without Unicode

Posted by on Dec 9, 2010 in Support Blog

Topic Keywords: , , ,

The fact that messages that contain characters outside of the GSM character set require Unicode encoding and are limited to 70 characters per message instead of the expected 160 character limit, is a frustrating limitation for many languages.

Messages longer than 70 characters can, of course, be sent, but they are sent as multipart (segmented) messages, and reassembled by the receiving client.  If a message that contains Unicode characters is longer than 70 characters, it is broken into segments of 67 characters for sending.  To send a 160 character message requires 3 SMS messages if the message contains characters that are not part of the GSM character set.

The original GSM protocol was developed in Western Europe, so it includes most Western European characters, plus capital letters in the Greek alphabet in order to facilitate Greek SMS support.  You can view a table of characters in the GSM character set at the following link:  http://www.nowsms.com/long-sms-text-messages-and-the-160-character-limit

A recent posting on the NowSMS Technical Forums discusses recent developments in the 3GPP specifications to add additional national language support to the SMS standard, and eventually overcome these limitations.  For additional discussion of this topic, we recommend joining the discussion at http://www.nowsms.com/discus/messages/1/69650.html.   The start of this discussion is highlighted below.

Question:

I have a question about sending SMS messages using Turkish national characters. Specifically, these characters are a problem:

Ğ, ğ, Ş, ş, İ, ı, ç (upper case Ç is ok?)

I can send messages that contain these characters ok, but NowSMS encodes the messages with Unicode. This means if I send a message longer than 70 characters, it costs me to send two or more messages … instead of normal 160 character limit.

But in Turkey I have heard that there is a way to send these national characters without forcing the whole message to use Unicode encoding. I do not know how it works, but I have heard that this feature is called a locking shift table. Instead of the standard GSM 7-bit character table, mobile phones in Turkey must support a locking shift table that replaces the GSM characters with national characters for Turkey.

Can NowSMS support this SMS locking shift table?

Response:

The fact that messages that contain characters outside of the GSM character set require Unicode encoding and are limited to 70 characters per message instead of the expected 160 character limit, is indeed frustrating for many languages.

The shift tables that you mention are a relatively new development. There is a concept of a locking shift table that replaces the GSM 7-bit character set, and a single shift table which provides additional characters.

You are correct that these shift tables can replace and extend the default GSM 7-bit character set table so that more national characters can fit into a single SMS.

We’ve received a number of inquiries from handset testing labs about them, but I’m not sure that they are used in production systems. (Perhaps they are in active use in Turkey as I can see that there was national legislation there that prompted the 3GPP to develop a solution.)

In addition to Turkey, there are shift tables defined for the Spanish and Portuguese languages. These were all added in 3GPP release 8 (3GPP TS 23.038 and 23.040), which only started being released in 2008.

The Spanish shift table adds ç, Á, Í, Ó, Ú, á, í, ó, and ú.

The Portuguese shift tables add support for the following national language characters: Á À Â Ã ª á à â ã É Ê é ê Í í Ó Ô Õ º ó ô õ Ú Ü ú ü ` ç ∞

3GPP Release 9 adds 10 shift tables for languages of the Indian subcontinent: Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil , Telugu, and Urdu.

Update:  Shift table support is now available in NowSMS.  Additional information and discussion is available at the following link:  http://www.nowsms.com/discus/messages/1/70000.html.

For comments and further discussion, please click here to visit the NowSMS Technical Forums (Discussion Board)...

2 Responses to “Shift Tables – National Language SMS in 160 characters without Unicode”

  1. Update: Shift table support is now available in NowSMS. Additional information and discussion is available at the following link: http://www.nowsms.com/discus/messages/1/70000.html.

  2. SMS Shift Table support is available in NowSMS version 2011.03.21 and later.

    http://www.nowsms.com/category/updates

    2-way SMS automatically decodes messages that are encoded with any of the currently defined 13 shift tables (Turkish, Spanish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil , Telugu, and Urdu).

    Outbound message encoding using SMS shift tables is only performed when explicitly configured.

    To enable support for one or more SMS shift tables, it is necessary to edit the SMSGW.INI file, and under the [SMSGW] section header, add:

    ShiftTables=##,##,##

    The values specified are the 3GPP assigned decimal numeric values of shift tables that should be enabled for encoding outbound messages.

    Supported values include:

    1 – Turkish
    2 – Spanish
    3 – Portuguese
    4 – Bengali
    5 – Gujarati
    6 – Hindi
    7 – Kannada
    8 – Malayalam
    9 – Oriya
    10 – Punjabi
    11 – Tamil
    12 – Telugu
    13 – Urdu

    It is very important to note that if a message encoded with shift tables is received by a device that does not support shift tables, that device will not display the message correctly. In the case of Turkish, Spanish and Portuguese, the character will appear as a somewhat similar character on a device that does not support shift tables. However, for the Indian languages, the result will be unreadable if shift table encoding is received on a device that does not support it.