Unit CastleUnicode

Description

Uses

Classes, Interfaces, Objects and Records

Functions and Procedures

Types

Constants

Variables

Description

Unicode utilities.

Uses

Overview

Classes, Interfaces, Objects and Records

Name	Description
Class `TUnicodeCharList`
Record `TCastleStringIterator`	Iterate over String that contains Unicode characters suitable for both FPC (with default String = AnsiString) and Delphi (with default String = UnicodeString).

Functions and Procedures

function StringLength(const S: String): Integer;

function StringCopy(const S: String; const StartIndex, CountToCopy: Integer): String;

function StringEnding(const S: String; const StartIndex: Integer): String;

function UnicodeCharToString(const C: TUnicodeChar): String;

function UnicodeCharToReadableString(const C: TUnicodeChar): String;

function StringWithHtmlEntities(const S: String): String;

Types

TUnicodeChar = Cardinal;

Description

Functions and Procedures

function StringLength(const S: String): Integer;

Length of the string, in Unicode characters.

This is like standard Pascal Length, but safe for Unicode, and working with both FPC and Delphi default String (see https://castle-engine.io/coding_conventions#strings_unicode ).

This works taking into account that:

with FPC, we expect String = AnsiString and holding UTF-8 data,
with Delphi we expect String = UnicodeString and holding UTF-16 data.

See https://castle-engine.io/coding_conventions#strings_unicode .

function StringCopy(const S: String; const StartIndex, CountToCopy: Integer): String;

Copy a number of Unicode characters from given string, from given position.

This is like standard Pascal Copy, but safe for Unicode, and working with both FPC and Delphi default String (see https://castle-engine.io/coding_conventions#strings_unicode ).

StartIndex is 1-based, i.e. the first Unicode character in String has index 1, last Unicode character has index StringLength(S).

In case the parameters indicate that we would copy more characters than there exist, this routine guarantees to only copy the maximum possible characters (without causing any issues like memory overruns). For example, StringCopy('foobar', 4, 100) will return 'bar'.

The result is undefined when the string ends abruptly in the middle of a Unicode character (that spans multiple Pascal Char (AnsiChar or WideChar) values, possible both in case of UTF-8 in AnsiString and UTF-16 in UnicodeString). The input string is considered incorrect in this case, and results are undefined: maybe we will copy the partial (unfinished) Unicode character (thus making also the output incorrect), maybe we will reject the unfinished partial character. However, we guarantee that we will not cause any memory overruns (thus, potential crashes or security issues) in this case.

This works taking into account that:

with FPC, we expect String = AnsiString and holding UTF-8 data,
with Delphi we expect String = UnicodeString and holding UTF-16 data.

See https://castle-engine.io/coding_conventions#strings_unicode .

function StringEnding(const S: String; const StartIndex: Integer): String;

Copy all characters from given string, from given position. StartIndex is 1-based, i.e. the first Unicode character in String has index 1, last Unicode character has index StringLength(S).

This is like SEnding, but safe for Unicode, and working with both FPC and Delphi default String (see https://castle-engine.io/coding_conventions#strings_unicode ).

function UnicodeCharToString(const C: TUnicodeChar): String;

Express single Unicode character code as a String that you can write.

function UnicodeCharToReadableString(const C: TUnicodeChar): String;

Like UnicodeCharToString, but in case C is not a printable character (like ASCII control characters with code < 32), show it as '#' + character number.

Use this only for debugging, or to display error messages, because the output is not 100% unambiguous: if the original string contains a sequence like #xxx, we make no attempt to "quoute" this sequence. This the output is ambiguous, both for human and machine processing. It is just "useful enough" for some cases of debugging output.

To have unambiguous output, use StringWithHtmlEntities. This uses HTML entity encoding and takes care to also quote special '&'. StringWithHtmlEntities it converts also characters above 128, like Polish and Chinese, to numbers – it is up to your needs whether this is more readable or not, depends on how do you output this in practice.

function StringWithHtmlEntities(const S: String): String;

Convert all special Unicode characters in the given string to HTML entities. This is a helpful routine to visualize a string with any Unicode characters using simple ASCII.

"Special" Unicode characters is "anything outside of safe ASCII range, which is between space and ASCII code 128". The resulting string contains these special characters encoded as HTML entities that show the Unicode code point in hex. Like &#xNNNN; (see https://en.wikipedia.org/wiki/Unicode_and_HTML ). Converts also ampersand & to & to prevent ambiguities.

Tip: You can check Unicode codes by going to e.g. https://codepoints.net/U+F3 for ó. Just edit this URL in the WWW browser address bar.

Types

TUnicodeChar = Cardinal;

This item has no description.

Generated by PasDoc 0.16.0-snapshot.

API Reference

Unit CastleUnicode

Description

Uses

Overview

Classes, Interfaces, Objects and Records

Functions and Procedures

Types

Description

Functions and Procedures

Types