Unicode Support in Delphi: UnicodeString and AnsiString in Delphi

Unicode Support in Delphi: UnicodeString and AnsiString in Delphi 

Recently I was migrating an old Delphi 7 application to newest Delphi XE4 application. The main road block I faced with unicode strings (UnicodeString and AnsiString). Delphi started to support unicode string from Delphi 2009 version. So there are some points regarding unicode characters (UnicodeString and AnsiString) you need to keep in mind while migrating your delphi application older than Delphi 2009 to the newer version. Let me first tell you what unicode is and why do we need this in modern applications?

What is Unicode?

Unicode is a character encoding scheme that allows virtually all alphabets to be encoded into a single character set. Unicode "enabled" applications can use characters and symbols of all of the worlds written languages for storage, retrieval, and display by digital computers.

Unicode is the name of an international character set, encompassing the symbols of all written alphabets of the world, of today and of the past, plus a few more. Unicode includes also technical symbols, punctuations, and many other characters used in writing text, even if not part of any alphabet. 

Why to use Unicode?

Using unicode, Delphi developers become able to serve a global market with their applications -- even if they don’t do anything special to localize or internationalize their applications. Windows itself supports many different localized versions, and Delphi applications need to be able to adapt and work on machines running any of the large number of locales that Windows supports, including the Japanese, Chinese, Greek, or Russian versions of Windows. 

Users of your software may be entering non-ANSI text into your application or using non-ANSI based path names. ANSI-based applications won’t always work as desired in those scenarios. Windows applications built with a fully Unicode-enabled Delphi will be able to handle and work in those situations. Even if you don’t translate your application into any other spoken languages, your application still needs to be able to work properly -- no matter what the end user’s locale is.

UnicodeString and AnsiString in Delphi 

The default string in Delphi 2009 is the new UnicodeString type. By default, the UnicodeString type will have an affinity for UTF-16, the same encoding used by Windows. This is a change from previous versions which had AnsiString as the default type. 

UnicodeString is assignment compatible with all other string types; however, assignments between AnsiStrings and UnicodeStrings will do type conversions as appropriate. Thus, an assignment of a UnicodeString type to an AnsiString type could result in data-loss. That is, if a UnicodeString contains high-order byte data, a conversion of that string to AnsiString will result in a loss of that high-order byte data.

The important thing to note here is that this new UnicodeString behaves pretty much like strings always have (with the notable exception of their ability to hold Unicode data, of course). You can still add any string data to them, you can index them, you can concatenate them with the ‘+’ sign, etc.

For example, instances of a UnicodeString will still be able to index characters. Consider the following code:

 var
   MyChar: Char;
   MyString: string;
 begin
   MyString := ‘This is a string’;
   MyChar := MyString[1];
 end;

The variable MyChar will still hold the character found at the first index position, i.e. ‘T’. This functionality of this code hasn’t changed at all. Similarly, if we are handling Unicode data:

 var
   MyChar: Char;
   MyString: string;
 begin
   MyString := ‘世界您好‘;
   MyChar := MyString[1];
 end;

The variable MyChar will still hold the character found at the first index position, i.e. ‘世’.