6 min read

What is Unicode SMS?

by | Nov 16, 2021

Unicode SMS allows you to include emojis and send messages in any foreign and regional language other than English. Languages such as Chinese, Cyrillic, Arabic, Hindi, Kannada, Marathi, etc contain non-standard characters and must be sent using the Unicode.

 

 

What is Unicode?

Unicode is a worldwide standard for character encoding used to support characters in scripts other than ASCII. The Unicode Consortium, a non-profit organization maintains, develops, and promotes the Unicode Standard.

ASCII is based on the English alphabet and consists of only 128 characters while Unicode supports more than 1 million characters.

Unicode supports character sets in languages around the globe. Unicode uses 16 bits while  ASCII characters use only about 7 bits. The two most common types of Unicode include UTF-8 and UTF-16. With UTF-8, the number of bits used changes depending on the character.

What is SMS Unicode?

Unicode SMS Messaging

Unicode SMS refers to messages encoded using the Unicode standard. Unicode messages contain non-ASCII characters that are not in the default GSM character set. The GSM character set has 128 letters including English alphabets A-Z, numbers 0-9, and symbols such as !, @, &, etc.

The standard SMS character limit is 160 (without concatenation). Since a single Unicode character requires twice as much space compared to the 1 byte required by standard GSM characters, Unicode messages are shorter and can contain only up to 70 characters. Unicode messages that exceed the character limit are segmented into multiple parts. Please note that inadvertently including unicode characters can result in sending of multi-part messages.

If you want to send SMS globally, Kaleyra’s API allows automatic detection of Unicode SMS. The system will automatically detect the language of the SMS and charge you accordingly.