SMPP Character Set Issues

Posted by on Feb 20, 2014 in Support Blog

Topic Keywords: , , ,

This article provides troubleshooting advice when sending SMS messages with NowSMS and experiencing one or more of the following issues:

a.) Some characters such as @, $, £ or € are not correct.

b.) Some or all messages are truncated or garbage (possibly only messages over a certain length).

c.) Some accented characters are not correct (è, é, etc.)

d.) Some or all Greek characters are not correct (Δ, Φ, etc.)

e.) Messages containing non-English/Latin characters are not correct (Arabic, Chinese, Japanese, Korean, etc.)

f.) Emoticon/Emoji characters are not correct ( 🙂 , etc.)

One of the most frustrating issues facing some SMS implementations is character set issues, especially for SMPP environments, where different providers use different character sets and tend to be oblivious about how their implementation differs from others.

The NowSMS approach is to offer as much flexibility as we can with regard to character set handling, however limitations of some provider implementations may require compromises and trade offs.

In this post, we’re going to focus on common issues, troubleshooting and configuration options to address commonly encountered problems.

Unfortunately, the process of finding the best settings for use with your provider can often require trial and error testing methods.

For a technical discussion of these issues, see http://www.nowsms.com/long-sms-text-messages-and-the-160-character-limit

The first step of troubleshooting is to perform test messages containing problem characters from the NowSMS web interface only, using the built in “Send Text Message” form. This is very important for troubleshooting. It is necessary to determine whether the character encoding problem is with input from a client to NowSMS or with output from NowSMS to the SMSC provider.

If the characters are sent correctly using the “Send Text Message” form, then the encoding issue is input related. If using HTTP, NowSMS expects UTF-8 character encoding to be used for all text (which is the character set that the web form is configured to use). If you cannot convert your text to UTF-8, you can add a parameter to your HTTP request to tell NowSMS what character set is being used. More information can be found here: http://www.nowsms.com/discus/messages/1/5754.html

This article will focus on output issues where there is a character set issue between NowSMS and the SMSC provider, or in other words, issues where the character problem can be recreated using the built-in “Send Text Message” form.

We recommend working through each of these issues in sequence. Verify that you are not experiencing the problem described before moving on to the next issue, as the solution may be related to an issue that you do not realise you are experiencing.

a.) Some characters such as @, $, £ or € are not correct.

The character set used for SMS is different than the standard character sets used by computers.

By default, NowSMS uses the GSM SMS character set encoding over SMPP. However, some SMPP providers expect one of the computer character sets to be used.

Test first with @ and $ characters. If those characters do not work, go into the “Advanced Settings” for the SMPP connection in NowSMS and try changing the “SMSC Character Set” to “iso-8859-1 as Default”. Note that in order to save and activate that setting change it is necessary to press “OK” twice then “Apply” when you are returned to the “SMSC” list. (If NowSMS asks if you wish to test the SMSC connection, it is ok to answer “No”.) Wait about 60 seconds for the server to activate the settings change and try another test message.

Other character sets can also be tried, but they are more rarely used.

Assuming that @ and $ are correct, but the € character is not correct, try changing the “SMSC Character Set” to “iso-8859-15”. If this causes problems, it may be necessary to manually add a setting to the SMSGW.INI file. Under the [SMPP – server:port] section header of SMSGW.INI, which contains your SMPP configuration information, there should be a setting SMSCCharset=iso-8859-15. Below that setting, on a new line add SMSCCharsetDefault=Yes

b.) Some or all messages are truncated or garbage (possibly only messages over a certain length).

The @ character is encoded as a NULL value in the GSM SMS character set, so if messages are truncated where the @ character should appear, this is a good indication that the provider expects iso-8859-1 or iso-8859-15 as described in the previous section.

If simple short text messages containing only English alphabet characters appear as corrupt, then it is possible that you are interfacing with an older SMPP server that expects all text messages to be in 7-bit packed encoding, which is the actual over the air format. Under the “Advanced Settings” for the SMPP connection in NowSMS, enable the “Encode text messages with 7-bit packing” setting. Note that in order to save and activate that setting change it is necessary to press “OK” twice then “Apply” when you are returned to the “SMSC” list. (If NowSMS asks if you wish to test the SMSC connection, it is ok to answer “No”.) Wait about 60 seconds for the server to activate the settings change and try another test message.

If only longer messages are impacted … then try the following different settings combinations under the Advanced settings for the SMPP connection:

Note: When changing this setting, to apply it, it is necessary to press “OK” twice, then “Apply” and either wait 1 minute for the server to load the changed settings, or restart the service.

1.) “Encode long messages with 7-bit packed encoding” – NOT CHECKED
“Use TLV parameters for port numbers and segmentation” – NOT CHECKED
“Use WDP Adaptation for WAP Push and MMS” – NOT CHECKED

2.) “Encode long messages with 7-bit packed encoding” – CHECKED
“Use TLV parameters for port numbers and segmentation” – NOT CHECKED
“Use WDP Adaptation for WAP Push and MMS” – NOT CHECKED

3.) “Encode long messages with 7-bit packed encoding” – NOT CHECKED
“Use TLV parameters for port numbers and segmentation” – CHECKED
“Use WDP Adaptation for WAP Push and MMS” – NOT CHECKED

4.) “Encode long messages with 7-bit packed encoding” – NOT CHECKED
“Use TLV parameters for port numbers and segmentation” – NOT CHECKED
“Use WDP Adaptation for WAP Push and MMS” – CHECKED

If your SMPP provider can support long messages, at least one of these options should work.

Some providers might prefer that you do not segment long messages, but instead send the entire long message in one transaction and allow the provider to perform segmentation for delivery. This is often referred to as the “message payload” method. This setting can be enabled in NowSMS by enabling “Use WDP Adaptation for WAP Push and MMS” and disabling “Use TLV Parameters for port numbers and segmentation” (option #4 above).

Another situation that can affect only some long messages is that some SMPP providers have different encoding expectations if a message contains “message class” encoding. Normal text messages will have a DCS/data_coding value of 0 (standard text) or 8 (Unicode text). If the problem messages show a different DCS value in the NowSMS logs, refer to the following discussion threads for information on advanced settings: http://www.nowsms.com/discus/messages/1/71597.html http://www.nowsms.com/discus/messages/1/71862.html

c.) Some accented characters are not correct (è, é, etc.)
and/or
d.) Some or all Greek characters are not correct (Δ, Φ, etc.)

Refer to the GSM character set table at http://www.nowsms.com/long-sms-text-messages-and-the-160-character-limit

Are the characters in the GSM character set?

If they are not in the GSM character set, NowSMS is using Unicode format (DCS/data_coding=8) to encode the message, and you may need to speak with your provider about enabling Unicode support. It is possible to to disable automatic Unicode detection for SMS text messages submitted via HTTP. When DisableHttpUnicodeSMS=Yes is set under the [SMSGW] header of SMSGW.INI, NowSMS will disable automatic Unicode detection and will replace Unicode characters with a close equivalent, or with – or ?.

If the characters are in the GSM character set, it is possible that you are using the iso-8859-1 or iso-8859-15 character set, and those characters are not in that character set. In NowSMS 2014.02.17 and later, there are configuration settings that can force unicode encoding to be used for these characters. Under the [SMSGW] header, add either TestUnicodeSMSForISO88591=Yes or TestUnicodeSMSForASCII=Yes. The latter setting forces Unicode if a message contains characters outside of the ASCII (7-bit subset of iso-8859-1) character set.

The upside of forcing Unicode encoding is that no characters are lost. The downside is that segmentation occurs for these messages at 70 characters instead of 160.

As an alternative to Unicode encoding, it has been observed that some SMPP providers who use iso-8859-1 or iso-8859-15 based encoding, use custom character set tables to encode the missing GSM characters in unused/reserved portions of the character set. NowSMS supports character set conversion overrides for this scenario. More detail is available at http://www.nowsms.com/discus/messages/1/72341.html.

e.) Messages containing non-English/Latin characters are not correct (Arabic, Chinese, Japanese, Korean, etc.)

NowSMS uses Unicode format (DCS/data_coding=8) to encode a message if it contains any characters outside the 7-bit GSM character set. You may need to speak with your provider about enabling Unicode support. It is possible to to disable automatic Unicode detection for SMS text messages submitted via HTTP. When DisableHttpUnicodeSMS=Yes is set under the [SMSGW] header of SMSGW.INI, NowSMS will disable automatic Unicode detection and will replace Unicode characters with a close equivalent, or with – or ?.

f.) Emoticon/Emoji characters are not correct ( 🙂 , etc.)

🙁 How very sad for you.

Emoticon and Emoji encoding is discussed in excruciating detail at http://www.nowsms.com/emoticons

For comments and further discussion, please click here to visit the NowSMS Technical Forums (Discussion Board)...