Unit CastleUnicode

Description

Unicode utilities.

Uses

Overview

Classes, Interfaces, Objects and Records

Name Description
Class TUnicodeCharList  
Record TCastleStringIterator Iterate over String that contains Unicode characters suitable for both FPC (with default String = AnsiString) and Delphi (with default String = UnicodeString).

Functions and Procedures

function StringLength(const S: String): Integer;
function StringCopy(const S: String; const StartIndex, CountToCopy: Integer): String;
function StringEnding(const S: String; const StartIndex: Integer): String;
function UnicodeCharToString(const C: TUnicodeChar): String;
function UnicodeCharToReadableString(const C: TUnicodeChar): String;
function StringWithHtmlEntities(const S: String): String;

Types

TUnicodeChar = Cardinal;

Description

Functions and Procedures

function StringLength(const S: String): Integer;

Length of the string, in Unicode characters.

This is like standard Pascal Length, but safe for Unicode, and working with both FPC and Delphi default String (see https://castle-engine.io/coding_conventions#strings_unicode ).

This works taking into account that:

  • with FPC, we expect String = AnsiString and holding UTF-8 data,

  • with Delphi we expect String = UnicodeString and holding UTF-16 data.

See https://castle-engine.io/coding_conventions#strings_unicode .

function StringCopy(const S: String; const StartIndex, CountToCopy: Integer): String;

Copy a number of Unicode characters from given string, from given position.

This is like standard Pascal Copy, but safe for Unicode, and working with both FPC and Delphi default String (see https://castle-engine.io/coding_conventions#strings_unicode ).

StartIndex is 1-based, i.e. the first Unicode character in String has index 1, last Unicode character has index StringLength(S).

In case CountToCopy, it is guaranteed to only copy the maximum possible characters, without causing any memory overruns.

Note that it doesn't try to deal with strings that may end abruptly in the middle of a Unicode character (that may span multiple Pascal Char (AnsiChar or WideChar) values, possible both in case of UTF-8 in AnsiString and UTF-16 in UnicodeString). The results of such abrupt ending are undefined: this routine may copy the partial (unfinished) Unicode character, or it may reject the unfinished partial character altogether.

This works taking into account that:

  • with FPC, we expect String = AnsiString and holding UTF-8 data,

  • with Delphi we expect String = UnicodeString and holding UTF-16 data.

See https://castle-engine.io/coding_conventions#strings_unicode .

function StringEnding(const S: String; const StartIndex: Integer): String;

Copy all characters from given string, from given position. StartIndex is 1-based, i.e. the first Unicode character in String has index 1, last Unicode character has index StringLength(S).

This is like SEnding, but safe for Unicode, and working with both FPC and Delphi default String (see https://castle-engine.io/coding_conventions#strings_unicode ).

function UnicodeCharToString(const C: TUnicodeChar): String;

Express single Unicode character code as a String that you can write.

function UnicodeCharToReadableString(const C: TUnicodeChar): String;

Like UnicodeCharToString, but in case C is not a printable character (like ASCII control characters with code < 32), show it as '#' + character number.

Use this only for debugging, or to display error messages, because the output is not 100% unambiguous: if the original string contains a sequence like #xxx, we make no attempt to "quoute" this sequence. This the output is ambiguous, both for human and machine processing. It is just "useful enough" for some cases of debugging output.

To have unambiguous output, use StringWithHtmlEntities. This uses HTML entity encoding and takes care to also quote special '&'. StringWithHtmlEntities it converts also characters above 128, like Polish and Chinese, to numbers – it is up to your needs whether this is more readable or not, depends on how do you output this in practice.

function StringWithHtmlEntities(const S: String): String;

Convert all special Unicode characters in the given string to HTML entities. This is a helpful routine to visualize a string with any Unicode characters using simple ASCII.

"Special" Unicode characters is "anything outside of safe ASCII range, which is between space and ASCII code 128". The resulting string contains these special characters encoded as HTML entities that show the Unicode code point in hex. Like &#xNNNN; (see https://en.wikipedia.org/wiki/Unicode_and_HTML ). Converts also ampersand & to &amp; to prevent ambiguities.

Tip: You can check Unicode codes by going to e.g. https://codepoints.net/U+F3 for &#xF3;. Just edit this URL in the WWW browser address bar.

Types

TUnicodeChar = Cardinal;

This item has no description.


Generated by PasDoc 0.16.0-snapshot.