Unit CastleUnicode
Description
Unicode utilities.
Uses
Overview
Classes, Interfaces, Objects and Records
Name | Description |
---|---|
Class TUnicodeCharList |
|
Record TCastleStringIterator |
Iterate over String that contains Unicode characters suitable for both FPC (with default String = AnsiString) and Delphi (with default String = UnicodeString). |
Functions and Procedures
function StringLength(const S: String): Integer; |
function StringCopy(const S: String; const StartIndex, CountToCopy: Integer): String; |
function StringEnding(const S: String; const StartIndex: Integer): String; |
function UnicodeCharToString(const C: TUnicodeChar): String; |
function UnicodeCharToReadableString(const C: TUnicodeChar): String; |
function StringWithHtmlEntities(const S: String): String; |
Types
TUnicodeChar = Cardinal; |
Description
Functions and Procedures
function StringLength(const S: String): Integer; |
Length of the string, in Unicode characters. This is like standard Pascal This works taking into account that:
See https://castle-engine.io/coding_conventions#strings_unicode . |
function StringCopy(const S: String; const StartIndex, CountToCopy: Integer): String; |
Copy a number of Unicode characters from given string, from given position. This is like standard Pascal StartIndex is 1-based, i.e. the first Unicode character in String has index 1, last Unicode character has index StringLength(S). In case the parameters indicate that we would copy more characters than there exist, this routine guarantees to only copy the maximum possible characters (without causing any issues like memory overruns). For example, The result is undefined when the string ends abruptly in the middle of a Unicode character (that spans multiple Pascal Char (AnsiChar or WideChar) values, possible both in case of UTF-8 in AnsiString and UTF-16 in UnicodeString). The input string is considered incorrect in this case, and results are undefined: maybe we will copy the partial (unfinished) Unicode character (thus making also the output incorrect), maybe we will reject the unfinished partial character. However, we guarantee that we will not cause any memory overruns (thus, potential crashes or security issues) in this case. This works taking into account that:
See https://castle-engine.io/coding_conventions#strings_unicode . |
function StringEnding(const S: String; const StartIndex: Integer): String; |
Copy all characters from given string, from given position. StartIndex is 1-based, i.e. the first Unicode character in String has index 1, last Unicode character has index StringLength(S). This is like SEnding, but safe for Unicode, and working with both FPC and Delphi default String (see https://castle-engine.io/coding_conventions#strings_unicode ). |
function UnicodeCharToString(const C: TUnicodeChar): String; |
Express single Unicode character code as a String that you can write. |
function UnicodeCharToReadableString(const C: TUnicodeChar): String; |
Like UnicodeCharToString, but in case C is not a printable character (like ASCII control characters with code < 32), show it as '#' + character number. Use this only for debugging, or to display error messages, because the output is not 100% unambiguous: if the original string contains a sequence like #xxx, we make no attempt to "quoute" this sequence. This the output is ambiguous, both for human and machine processing. It is just "useful enough" for some cases of debugging output. To have unambiguous output, use StringWithHtmlEntities. This uses HTML entity encoding and takes care to also quote special '&'. StringWithHtmlEntities it converts also characters above 128, like Polish and Chinese, to numbers – it is up to your needs whether this is more readable or not, depends on how do you output this in practice. |
function StringWithHtmlEntities(const S: String): String; |
Convert all special Unicode characters in the given string to HTML entities. This is a helpful routine to visualize a string with any Unicode characters using simple ASCII. "Special" Unicode characters is "anything outside of safe ASCII range, which is between space and ASCII code 128". The resulting string contains these special characters encoded as HTML entities that show the Unicode code point in hex. Like Tip: You can check Unicode codes by going to e.g. https://codepoints.net/U+F3 for |
Types
TUnicodeChar = Cardinal; |
This item has no description. |
Generated by PasDoc 0.16.0-snapshot.