Unit al5strings

Uses
Classes, Interfaces, Objects and Records
Constants
Variables

Description

Functions to integrate Pascal String with Allegro AL_STR. Also implements Allegro's UNICODE support.

About string manipulation

By default, Delphi RTL libraries defines STRING as UNICODESTRING. Since Allegro expects ANSISTRING this means you should use convert funcions as UTFToString and UTF8Encode to work properly, wich makes such operations non compatible with Free Pascal.

This unit defines a collection of functions and procedures that works like RTL string manipulation ones (i.e. SysUtils and Strings unit) but using the AL_STR type, ensuring your code will work both Delphi and Free Pascal without changes. It includes a few conversion functions as well if you need them.

About UTF-8 string routines

Some parts of the Allegro API, such as the font routines, expect Unicode strings encoded in UTF-8. The UTF8 basic routines are provided to help you work with UTF-8 strings, however it does not mean you need to use them.

Briefly, Unicode is a standard consisting of a large character set of over 100,000 characters, and rules, such as how to sort strings. A code point is the integer value of a character, but not all code points are characters, as some code points have other uses. Unlike legacy character sets, the set of code points is open ended and more are assigned with time.

Clearly it is impossible to represent each code point with a 8-bit byte (limited to 256 code points) or even a 16-bit integer (limited to 65536 code points). It is possible to store code points in a 32-bit integers but it is space inefficient, and not actually that useful (at least, when handling the full complexity of Unicode; Allegro only does the very basics). There exist different Unicode Transformation Formats for encoding code points into smaller code units. The most important transformation formats are UTF-8 and UTF-16.

UTF-8 is a variable-length encoding which encodes each code point to between one and four 8-bit bytes each. UTF-8 has many nice properties, but the main advantages are that it is backwards compatible with C strings, and ASCII characters (code points in the range 0-127) are encoded in UTF-8 exactly as they would be in ASCII.

UTF-16 is another variable-length encoding, but encodes each code point to one or two 16-bit words each. It is, of course, not compatible with traditional C strings. Allegro does not generally use UTF-16 strings.

Here is a diagram of the representation of the word "ål", with a NUL terminator, in both UTF-8 and UTF-16.

String

å

l

NUL

Code points

U+00E5 (229)

U+006C (108)

U+0000 (0)

UTF-8 bytes

0xC3, 0xA5

0x6C

0x00

UTF-16LE bytes

0xE5, 0x00

0x6C, 0x00

0x00, 0x00

You can see the aforementioned properties of UTF-8. The first code point U+00E5 ("å") is outside of the ASCII range (0-127) so is encoded to multiple code units – it requires two bytes. U+006C ("l") and U+0000 (NUL) both exist in the ASCII range so take exactly one byte each, as in a pure ASCII string. A zero byte never appears except to represent the NUL character, so many functions which expect C-style strings will work with UTF-8 strings without modification.

On the other hand, UTF-16 represents each code point by either one or two 16-bit code units (two or four bytes). The representation of each 16-bit code unit depends on the byte order; here we have demonstrated little endian.

Both UTF-8 and UTF-16 are self-synchronising. Starting from any offset within a string, it is efficient to find the beginning of the previous or next code point.

Not all sequences of bytes or 16-bit words are valid UTF-8 and UTF-16 strings respectively. UTF-8 also has an additional problem of overlong forms, where a code point value is encoded using more bytes than is strictly necessary. This is invalid and needs to be guarded against.

In the "ustr" functions, be careful whether a function takes code unit (byte) or code point indices. In general, all position parameters are in code unit offsets. This may be surprising, but if you think about it, it is required for good performance. (It also means some functions will work even if they do not contain UTF-8, since they only care about storing bytes, so you may actually store arbitrary data in the ALLEGRO_USTRs.)

For actual text processing, where you want to specify positions with code point indices, you should use al_ustr_offset to find the code unit offset position. However, most of the time you would probably just work with byte offsets.

Overview

Functions and Procedures

function al_string_to_str (const aString: ShortString): AL_STR; overload; inline;
function al_string_to_str (const aString: AnsiString): AL_STR; overload; inline;
function al_string_to_str (const aString: UnicodeString): AL_STR; overload; inline;
function al_str_to_string (const aString: AL_STR): String; overload; inline;
function al_str_to_string (const aString: AL_STRptr): String; overload; inline;
function al_str_to_shortstring (const aString: AL_STR): ShortString; overload; inline;
function al_str_to_shortstring (const aString: AL_STRptr): ShortString; overload; inline;
function al_str_to_ansistring (const aString: AL_STR): AnsiString; overload; inline;
function al_str_to_ansistring (const aString: AL_STRptr): AnsiString; overload; inline;
function al_str_to_unicodestring (const aString: AL_STR): UnicodeString; overload; inline;
function al_str_to_unicodestring (const aString: AL_STRptr): UnicodeString; overload; inline;
function al_str_format (const aFmt: AL_STR; const aArgs : array of const) : AL_STR;
function al_ustr_new (const s: AL_STR): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_new_from_buffer (const s: AL_STRptr; size: AL_SIZE_T): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;
procedure al_ustr_free (us: ALLEGRO_USTRptr); CDECL; external ALLEGRO_LIB_NAME;
function al_cstr (const us: ALLEGRO_USTRptr): AL_STRptr; CDECL; external ALLEGRO_LIB_NAME;
procedure al_ustr_to_buffer (const us: ALLEGRO_USTRptr; buffer: AL_STRptr; size: AL_INT); CDECL; external ALLEGRO_LIB_NAME;
function al_cstr_dup (const us: ALLEGRO_USTRptr): AL_STRptr; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_dup (const us: ALLEGRO_USTRptr): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_dup_substr (const us: ALLEGRO_USTRptr; start_pos, end_pos: AL_INT): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_empty_string: ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;
function al_ref_cstr (out info: ALLEGRO_USTR_INFO; const s: AL_STR): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;
function al_ref_buffer (out info: ALLEGRO_USTR_INFO; const s: AL_STRptr; size: AL_SIZE_T): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;
function al_ref_ustr (out info: ALLEGRO_USTR_INFO; const us: ALLEGRO_USTRptr; star_pos, end_pos: AL_INT): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_size (const us: ALLEGRO_USTRptr): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_length (const us: ALLEGRO_USTRptr): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_offset (const us: ALLEGRO_USTRptr;index: AL_INT): AL_INT; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_next (const us: ALLEGRO_USTRptr; var aPos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_prev (const us: ALLEGRO_USTRptr; var aPos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_insert_chr (us: ALLEGRO_USTRptr; aPos: AL_INT; c: AL_INT32) : AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_remove_chr (us: ALLEGRO_USTRptr; apos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_assign (us1: ALLEGRO_USTRptr; const us2: ALLEGRO_USTRptr): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_assign_cstr (us1: ALLEGRO_USTRptr; const s: AL_STR): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_equal (const us1, us2: ALLEGRO_USTRptr): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_compare (const u, v: ALLEGRO_USTRptr): AL_INT; CDECL; external ALLEGRO_LIB_NAME;
function al_ustr_ncompare (const u, v: ALLEGRO_USTRptr): AL_INT; CDECL; external ALLEGRO_LIB_NAME;
function al_utf8_width (c: AL_INT32): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME;

Types

ALLEGRO_USTRptr = ˆALLEGRO_USTR;
ALLEGRO_USTR = _al_tagbstring;
ALLEGRO_USTR_INFOptr = ˆALLEGRO_USTR_INFO;
ALLEGRO_USTR_INFO = _al_tagbstring;

Description

Functions and Procedures

function al_string_to_str (const aString: ShortString): AL_STR; overload; inline;
 
function al_string_to_str (const aString: AnsiString): AL_STR; overload; inline;
 
function al_string_to_str (const aString: UnicodeString): AL_STR; overload; inline;

Converts Pascal strings to AL_STR.

function al_str_to_string (const aString: AL_STR): String; overload; inline;
 
function al_str_to_string (const aString: AL_STRptr): String; overload; inline;

Converts AL_STR or AL_STRptr to a STRING.

function al_str_to_shortstring (const aString: AL_STR): ShortString; overload; inline;
 
function al_str_to_shortstring (const aString: AL_STRptr): ShortString; overload; inline;

Converts AL_STR or AL_STRptr to a Pascal string.

function al_str_to_ansistring (const aString: AL_STR): AnsiString; overload; inline;
 
function al_str_to_ansistring (const aString: AL_STRptr): AnsiString; overload; inline;

Converts AL_STR or AL_STRptr to an ANSISTRING.

function al_str_to_unicodestring (const aString: AL_STR): UnicodeString; overload; inline;
 
function al_str_to_unicodestring (const aString: AL_STRptr): UnicodeString; overload; inline;

Converts AL_STR or AL_STRptr to an UNICODESTRING.

function al_str_format (const aFmt: AL_STR; const aArgs : array of const) : AL_STR;

Formats a string with given arguments.

It works exactly like RTL SysUtils.Format but using AL_STR instead of STRING.

function al_ustr_new (const s: AL_STR): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;

Creates a new string containing a copy of the C-style string s. The string must eventually be freed with al_ustr_free.

See also
al_ustr_new_from_buffer
Creates a new string containing a copy of the buffer pointed to by s of the given size in bytes.
al_ustr_assign
Overwrites the string us1 with another string us2.
al_ustr_dup
Returns a duplicate copy of a string.
function al_ustr_new_from_buffer (const s: AL_STRptr; size: AL_SIZE_T): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;

Creates a new string containing a copy of the buffer pointed to by s of the given size in bytes. The string must eventually be freed with al_ustr_free.

See also
al_ustr_new
Creates a new string containing a copy of the C-style string s.
procedure al_ustr_free (us: ALLEGRO_USTRptr); CDECL; external ALLEGRO_LIB_NAME;

Frees a previously allocated string. Does nothing if the argument is Nil.

See also
al_ustr_new
Creates a new string containing a copy of the C-style string s.
al_ustr_new_from_buffer
Creates a new string containing a copy of the buffer pointed to by s of the given size in bytes.
function al_cstr (const us: ALLEGRO_USTRptr): AL_STRptr; CDECL; external ALLEGRO_LIB_NAME;

Gets a AL_STRptr pointer to the data in a string. This pointer will only be valid while the ALLEGRO_USTR object is not modified and not destroyed. The pointer may be passed to functions expecting C-style strings, with the following caveats:

  • ALLEGRO_USTRs are allowed to contain embedded NUL ($00) bytes. That means al_ustr_size (u) and Length (al_cstr (u)) may not agree.

  • An ALLEGRO_USTR may be created in such a way that it is not NUL terminated. A string which is dynamically allocated will always be NUL terminated, but a string which references the middle of another string or region of memory will not be NUL terminated.

  • If the ALLEGRO_USTR references another string, the returned C string will point into the referenced string. Again, no NUL terminator will be added to the referenced string.

See also
al_ustr_to_buffer
Writes the contents of the string into a pre-allocated buffer of the given size in bytes.
al_cstr_dup
Creates a NUL ($00) terminated copy of the string.
al_ustr_assign_cstr
Overwrites the string us1 with the contents of the string s.
procedure al_ustr_to_buffer (const us: ALLEGRO_USTRptr; buffer: AL_STRptr; size: AL_INT); CDECL; external ALLEGRO_LIB_NAME;

Writes the contents of the string into a pre-allocated buffer of the given size in bytes. The result will always be NUL terminated, so a maximum of size - 1 bytes will be copied.

See also
al_cstr
Gets a AL_STRptr pointer to the data in a string.
al_cstr_dup
Creates a NUL ($00) terminated copy of the string.
function al_cstr_dup (const us: ALLEGRO_USTRptr): AL_STRptr; CDECL; external ALLEGRO_LIB_NAME;

Creates a NUL ($00) terminated copy of the string. Any embedded NUL bytes will still be presented in the returned string. The new string must eventually be freed with al_free.

If an error occurs Nil is returned.

See also
al_cstr
Gets a AL_STRptr pointer to the data in a string.
al_ustr_to_buffer
Writes the contents of the string into a pre-allocated buffer of the given size in bytes.
al_free
Like FreeMem, releases the memory occupied by pointer p.
function al_ustr_dup (const us: ALLEGRO_USTRptr): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;

Returns a duplicate copy of a string. The new string will need to be freed with al_ustr_free.

See also
al_ustr_dup_substr
Returns a new copy of a string, containing its contents in the byte interval [start_pos, end_pos).
al_ustr_free
Frees a previously allocated string.
function al_ustr_dup_substr (const us: ALLEGRO_USTRptr; start_pos, end_pos: AL_INT): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;

Returns a new copy of a string, containing its contents in the byte interval [start_pos, end_pos). The new string will be NUL terminated and will need to be freed with al_ustr_free.

If necessary, use al_ustr_offset to find the byte offsets for a given code point that you are interested in.

Note

This is used because the way the C language works. I didn't test if Pascal do need this kind of stuff. Future versions of Allegro.pas would not include this function, so don't use it unless your really need to (and tell me if you really need it to remove this warning from documentation).

See also
al_ustr_dup
Returns a duplicate copy of a string.
al_ustr_free
Frees a previously allocated string.
function al_ustr_empty_string: ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;

Returns a pointer to a static empty string. The string is read only and must not be freed.

function al_ref_cstr (out info: ALLEGRO_USTR_INFO; const s: AL_STR): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;

Creates a string that references the storage of a C-style string. The information about the string (e.g. its size) is stored in the info parameter. The string will not have any other storage allocated of its own, so if you allocate the info structure on the stack then no explicit "free" operation is required.

The string is valid until the underlying C string disappears.

Example:

VAR
  Info: ALLEGRO_USTR_INFO;
  us: ALLEGRO_USTRptr;
BEGIN
  us := al_ref_cstr (Info, 'my string')
END;

See also
al_ref_buffer
Creates a string that references the storage of an underlying buffer.
al_ref_ustr
Creates a read-only string that references the storage of another ALLEGRO_USTR string.
function al_ref_buffer (out info: ALLEGRO_USTR_INFO; const s: AL_STRptr; size: AL_SIZE_T): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;

Creates a string that references the storage of an underlying buffer. The size of the buffer is given in bytes. You can use it to reference only part of a string or an arbitrary region of memory.

The string is valid while the underlying memory buffer is valid.

See also
al_ref_cstr
Creates a string that references the storage of a C-style string.
al_ref_ustr
Creates a read-only string that references the storage of another ALLEGRO_USTR string.
function al_ref_ustr (out info: ALLEGRO_USTR_INFO; const us: ALLEGRO_USTRptr; star_pos, end_pos: AL_INT): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME;

Creates a read-only string that references the storage of another ALLEGRO_USTR string. The information about the string (e.g. its size) is stored in the structure pointed to by the info parameter. The new string will not have any other storage allocated of its own, so if you allocate the info structure on the stack then no explicit "free" operation is required.

The referenced interval is [start_pos, end_pos). Both are byte offsets.

The string is valid until the underlying string is modified or destroyed.

If you need a range of code-points instead of bytes, use al_ustr_offset to find the byte offsets.

See also
al_ref_cstr
Creates a string that references the storage of a C-style string.
al_ref_buffer
Creates a string that references the storage of an underlying buffer.
function al_ustr_size (const us: ALLEGRO_USTRptr): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME;

Returns the size of the string in bytes. This is equal to the number of code points in the string if the string is empty or contains only 7-bit ASCII characters.

See also
al_ustr_length
Returns the number of code points in the string.
function al_ustr_length (const us: ALLEGRO_USTRptr): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME;

Returns the number of code points in the string.

See also
al_ustr_size
Returns the size of the string in bytes.
al_ustr_offset
Returns the byte offset (from the start of the string) of the code point at the specified index in the string.
function al_ustr_offset (const us: ALLEGRO_USTRptr;index: AL_INT): AL_INT; CDECL; external ALLEGRO_LIB_NAME;

Returns the byte offset (from the start of the string) of the code point at the specified index in the string. A zero index parameter will return the first character of the string. If index is negative, it counts backward from the end of the string, so an index of -1 will return an offset to the last code point.

If the index is past the end of the string, returns the offset of the end of the string.

See also
al_ustr_length
Returns the number of code points in the string.
function al_ustr_next (const us: ALLEGRO_USTRptr; var aPos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;

Finds the byte offset of the next code point in string, beginning at aPos. aPos does not have to be at the beginning of a code point.

This function just looks for an appropriate byte; it doesn't check if found offset is the beginning of a valid code point. If you are working with possibly invalid UTF-8 strings then it could skip over some invalid bytes.

Returns

True on success, and aPos will be updated to the found offset. Otherwise returns False if aPos was already at the end of the string, and aPos is unmodified.

See also
al_ustr_prev
Finds the byte offset of the previous code point in string, before aPos.
function al_ustr_prev (const us: ALLEGRO_USTRptr; var aPos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;

Finds the byte offset of the previous code point in string, before aPos. aPos does not have to be at the beginning of a code point.

This function just looks for an appropriate byte; it doesn't check if found offset is the beginning of a valid code point. If you are working with possibly invalid UTF-8 strings then it could skip over some invalid bytes.

Returns

True on success, and aPos will be updated to the found offset. Otherwise returns False if aPos was already at the end of the string, and aPos is unmodified.

See also
al_ustr_next
Finds the byte offset of the next code point in string, beginning at aPos.
function al_ustr_insert_chr (us: ALLEGRO_USTRptr; aPos: AL_INT; c: AL_INT32) : AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME;

Inserts a code point into us beginning at byte offset aPos. aPos cannot be less than 0. If aPos is past the end of us then the space between the end of the string and aPos will be padded with NUL ('\0') bytes.

Returns

The number of bytes inserted, or 0 on error.

See also
al_ustr_offset
Returns the byte offset (from the start of the string) of the code point at the specified index in the string.
al_ustr_remove_chr
Removes the code point beginning at byte offset pos.
function al_ustr_remove_chr (us: ALLEGRO_USTRptr; apos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;

Removes the code point beginning at byte offset pos.

Use al_ustr_offset to find the byte offset for a code-points offset.

Returns

True on success. If apos is out of range or apos is not the beginning of a valid code point, returns False leaving the string unmodified.

See also
al_ustr_offset
Returns the byte offset (from the start of the string) of the code point at the specified index in the string.
al_ustr_insert_chr
Inserts a code point into us beginning at byte offset aPos.
function al_ustr_assign (us1: ALLEGRO_USTRptr; const us2: ALLEGRO_USTRptr): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;

Overwrites the string us1 with another string us2.

Returns

True on success, False on error.

See also
al_ustr_assign_cstr
Overwrites the string us1 with the contents of the string s.
function al_ustr_assign_cstr (us1: ALLEGRO_USTRptr; const s: AL_STR): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;

Overwrites the string us1 with the contents of the string s.

Returns

True on success, False on error.

See also
al_ustr_assign
Overwrites the string us1 with another string us2.
function al_ustr_equal (const us1, us2: ALLEGRO_USTRptr): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME;

Returns True if the two strings are equal. This function is more efficient than al_ustr_compare so is preferable if ordering is not important.

See also
al_ustr_compare
This function compares us1 and us2 by code point values.
function al_ustr_compare (const u, v: ALLEGRO_USTRptr): AL_INT; CDECL; external ALLEGRO_LIB_NAME;

This function compares us1 and us2 by code point values. It returns zero if the strings are equal, a positive number if us1 comes after us2, else a negative number.

This does not take into account locale-specific sorting rules. For that you will need to use another library.

See also
al_ustr_ncompare
This function compares us1 and us2 by code point values.
al_ustr_equal
Returns True if the two strings are equal.
function al_ustr_ncompare (const u, v: ALLEGRO_USTRptr): AL_INT; CDECL; external ALLEGRO_LIB_NAME;

This function compares us1 and us2 by code point values. It returns zero if the strings are equal, a positive number if us1 comes after us2, else a negative number.

This does not take into account locale-specific sorting rules. For that you will need to use another library.

See also
al_ustr_compare
This function compares us1 and us2 by code point values.
al_ustr_equal
Returns True if the two strings are equal.
function al_utf8_width (c: AL_INT32): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME;

Returns the number of bytes that would be occupied by the specified code point when encoded in UTF-8. This is between 1 and 4 bytes for legal code point values. Otherwise returns 0.

Types

ALLEGRO_USTRptr = ˆALLEGRO_USTR;

Pointer to ALLEGRO_USTR.

ALLEGRO_USTR = _al_tagbstring;

An opaque type representing a string. ALLEGRO_USTRs normally contain UTF-8 encoded strings, but they may be used to hold any byte sequences, including Nil.

ALLEGRO_USTR_INFOptr = ˆALLEGRO_USTR_INFO;

Pointer to ALLEGRO_USTR_INFO.

ALLEGRO_USTR_INFO = _al_tagbstring;

A type that holds additional information for an ALLEGRO_USTR that references an external memory buffer.

See also
al_ref_cstr
Creates a string that references the storage of a C-style string.
al_ref_buffer
Creates a string that references the storage of an underlying buffer.
al_ref_ustr
Creates a read-only string that references the storage of another ALLEGRO_USTR string.

Generated by PasDoc 0.15.0. Generated on 2024-11-10 15:15:06.