Unit CastleUnicode

Description

Unicode utilities.

Uses

Overview

Classes, Interfaces, Objects and Records

Name Description
Class TUnicodeCharList  

Functions and Procedures

function UTF8CharacterLength(p: PChar): Integer;
function UTF8Length(const s: string): PtrInt; overload;
function UTF8Length(p: PChar; ByteCount: PtrInt): PtrInt; overload;
function UTF8CharStart(UTF8Str: PChar; Len, CharIndex: PtrInt): PChar;
function UTF8Copy(const s: string; StartCharIndex, CharCount: PtrInt): string;
function UTF8SEnding(const S: String; const StartCharIndex: PtrInt): String;
function UTF8CharacterToUnicode(p: PChar; out CharLen: integer): TUnicodeChar;
function UnicodeToUTF8(CodePoint: TUnicodeChar): string;
function UnicodeToUTF8Inline(CodePoint: TUnicodeChar; Buf: PChar): integer;
function UTF8ToHtmlEntities(const S: String): String;
function StringLength(const S: String): Integer;
function StringCopy(const S: String; const StartIndex, CountToCopy: Integer): String;

Types

TUnicodeChar = Cardinal;

Description

Functions and Procedures

function UTF8CharacterLength(p: PChar): Integer;

This item has no description.

function UTF8Length(const s: string): PtrInt; overload;

This item has no description.

function UTF8Length(p: PChar; ByteCount: PtrInt): PtrInt; overload;

This item has no description.

function UTF8CharStart(UTF8Str: PChar; Len, CharIndex: PtrInt): PChar;

This item has no description.

function UTF8Copy(const s: string; StartCharIndex, CharCount: PtrInt): string;

This item has no description.

function UTF8SEnding(const S: String; const StartCharIndex: PtrInt): String;

This item has no description.

function UTF8CharacterToUnicode(p: PChar; out CharLen: integer): TUnicodeChar;

Return unicode character pointed by P. CharLen is set to 0 only when pointer P is Nil, otherwise it's always > 0.

The typical usage of this is to iterate over UTF-8 string char-by-char, like this:

var
  C: TUnicodeChar;
  TextPtr: PChar;
  CharLen: Integer;
begin
  TextPtr := PChar(S);
  C := UTF8CharacterToUnicode(TextPtr, CharLen);
  while (C > 0) and (CharLen > 0) do
  begin
    Inc(TextPtr, CharLen);
    // here process C...
    C := UTF8CharacterToUnicode(TextPtr, CharLen);
  end;
end;

function UnicodeToUTF8(CodePoint: TUnicodeChar): string;

function UTF8CharacterToUnicode(const S: string): TUnicodeChar;

function UnicodeToUTF8Inline(CodePoint: TUnicodeChar; Buf: PChar): integer;

This item has no description.

function UTF8ToHtmlEntities(const S: String): String;

Convert all special Unicode characters in the given UTF-8 string to HTML entities. This is a helpful routine to visualize a string with any Unicode characters using simple ASCII.

"Special" Unicode characters is "anything outside of safe ASCII range, which is between space and ASCII code 128". The resulting string contains these special characters encoded as HTML entities that show the Unicode code point in hex. Like &#xNNNN; (see https://en.wikipedia.org/wiki/Unicode_and_HTML ). Converts also ampersand & to & to prevent ambiguities.

Tip: You can check Unicode codes by going to e.g. https://codepoints.net/U+F3 for ó. Just edit this URL in the WWW browser address bar.

function StringLength(const S: String): Integer;

Length of the string, in Unicode characters.

This works taking into account that:

  • with FPC, we expect String = AnsiString and holding UTF-8 data,

  • with Delphi we expect String = UnicodeString and holding UTF-16 data.

See https://castle-engine.io/coding_conventions#strings_unicode .

function StringCopy(const S: String; const StartIndex, CountToCopy: Integer): String;

Copy a number of given Unicode characters from given string.

StartIndex is 1-based, i.e. the first Unicode character in String has index 1, last Unicode character has index StringLength(S).

In case CountToCopy, it is guaranteed to only copy the maximum possible characters, without causing any memory overruns.

Note that it doesn't try to deal with strings that may end abruptly in the middle of a Unicode character (that may span multiple Pascal Char (AnsiChar or WideChar) values, possible both in case of UTF-8 in AnsiString and UTF-16 in UnicodeString). The results of such abrupt ending are undefined: this routine may copy the partial (unfinished) Unicode character, or it may reject the unfinished partial character altogether.

This works taking into account that:

  • with FPC, we expect String = AnsiString and holding UTF-8 data,

  • with Delphi we expect String = UnicodeString and holding UTF-16 data.

See https://castle-engine.io/coding_conventions#strings_unicode .

Types

TUnicodeChar = Cardinal;

This item has no description.


Generated by PasDoc 0.16.0-snapshot.