Allegro.pas 5.2.0Introduction Units Class Hierarchy Classes, Interfaces, Objects and Records Types Variables Constants Functions and Procedures Identifiers
|
Unit al5strings
Uses Classes, Interfaces, Objects and Records Constants Variables
Description
Functions to integrate Pascal String with Allegro AL_STR. Also implements Allegro's UNICODE support.
About string manipulation
By default, Delphi RTL libraries defines STRING as UNICODESTRING . Since Allegro expects ANSISTRING this means you should use convert funcions as UTFToString and UTF8Encode to work properly, wich makes such operations non compatible with Free Pascal.
This unit defines a collection of functions and procedures that works like RTL string manipulation ones (i.e. SysUtils and Strings unit) but using the AL_STR type, ensuring your code will work both Delphi and Free Pascal without changes. It includes a few conversion functions as well if you need them.
About UTF-8 string routines
Some parts of the Allegro API, such as the font routines, expect Unicode strings encoded in UTF-8. The UTF8 basic routines are provided to help you work with UTF-8 strings, however it does not mean you need to use them.
Briefly, Unicode is a standard consisting of a large character set of over 100,000 characters, and rules, such as how to sort strings. A code point is the integer value of a character, but not all code points are characters, as some code points have other uses. Unlike legacy character sets, the set of code points is open ended and more are assigned with time.
Clearly it is impossible to represent each code point with a 8-bit byte (limited to 256 code points) or even a 16-bit integer (limited to 65536 code points). It is possible to store code points in a 32-bit integers but it is space inefficient, and not actually that useful (at least, when handling the full complexity of Unicode; Allegro only does the very basics). There exist different Unicode Transformation Formats for encoding code points into smaller code units. The most important transformation formats are UTF-8 and UTF-16.
UTF-8 is a variable-length encoding which encodes each code point to between one and four 8-bit bytes each. UTF-8 has many nice properties, but the main advantages are that it is backwards compatible with C strings, and ASCII characters (code points in the range 0-127) are encoded in UTF-8 exactly as they would be in ASCII.
UTF-16 is another variable-length encoding, but encodes each code point to one or two 16-bit words each. It is, of course, not compatible with traditional C strings. Allegro does not generally use UTF-16 strings.
Here is a diagram of the representation of the word "ål", with a NUL terminator, in both UTF-8 and UTF-16.
String |
å |
l |
NUL |
Code points |
U+00E5 (229) |
U+006C (108) |
U+0000 (0) |
UTF-8 bytes |
0xC3, 0xA5 |
0x6C |
0x00 |
UTF-16LE bytes |
0xE5, 0x00 |
0x6C, 0x00 |
0x00, 0x00 |
You can see the aforementioned properties of UTF-8. The first code point U+00E5 ("å") is outside of the ASCII range (0-127) so is encoded to multiple code units – it requires two bytes. U+006C ("l") and U+0000 (NUL) both exist in the ASCII range so take exactly one byte each, as in a pure ASCII string. A zero byte never appears except to represent the NUL character, so many functions which expect C-style strings will work with UTF-8 strings without modification.
On the other hand, UTF-16 represents each code point by either one or two 16-bit code units (two or four bytes). The representation of each 16-bit code unit depends on the byte order; here we have demonstrated little endian.
Both UTF-8 and UTF-16 are self-synchronising. Starting from any offset within a string, it is efficient to find the beginning of the previous or next code point.
Not all sequences of bytes or 16-bit words are valid UTF-8 and UTF-16 strings respectively. UTF-8 also has an additional problem of overlong forms, where a code point value is encoded using more bytes than is strictly necessary. This is invalid and needs to be guarded against.
In the "ustr" functions, be careful whether a function takes code unit (byte) or code point indices. In general, all position parameters are in code unit offsets. This may be surprising, but if you think about it, it is required for good performance. (It also means some functions will work even if they do not contain UTF-8, since they only care about storing bytes, so you may actually store arbitrary data in the ALLEGRO_USTRs.)
For actual text processing, where you want to specify positions with code point indices, you should use al_ustr_offset to find the code unit offset position. However, most of the time you would probably just work with byte offsets.
Overview
Functions and Procedures
Types
Description
Functions and Procedures
function al_string_to_str (const aString: ShortString): AL_STR; overload; inline; |
|
function al_string_to_str (const aString: AnsiString): AL_STR; overload; inline; |
|
function al_string_to_str (const aString: UnicodeString): AL_STR; overload; inline; |
Converts Pascal strings to AL_STR.
|
function al_str_to_string (const aString: AL_STR): String; overload; inline; |
|
function al_str_to_string (const aString: AL_STRptr): String; overload; inline; |
Converts AL_STR or AL_STRptr to a STRING .
|
function al_str_to_shortstring (const aString: AL_STR): ShortString; overload; inline; |
|
function al_str_to_shortstring (const aString: AL_STRptr): ShortString; overload; inline; |
Converts AL_STR or AL_STRptr to a Pascal string.
|
function al_str_to_ansistring (const aString: AL_STR): AnsiString; overload; inline; |
|
function al_str_to_ansistring (const aString: AL_STRptr): AnsiString; overload; inline; |
Converts AL_STR or AL_STRptr to an ANSISTRING .
|
function al_str_to_unicodestring (const aString: AL_STR): UnicodeString; overload; inline; |
|
function al_str_to_unicodestring (const aString: AL_STRptr): UnicodeString; overload; inline; |
Converts AL_STR or AL_STRptr to an UNICODESTRING .
|
function al_str_format (const aFmt: AL_STR; const aArgs : array of const) : AL_STR; |
Formats a string with given arguments.
It works exactly like RTL SysUtils.Format but using AL_STR instead of STRING .
|
function al_ustr_new (const s: AL_STR): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME; |
Creates a new string containing a copy of the C-style string s . The string must eventually be freed with al_ustr_free.
See also
- al_ustr_new_from_buffer
- Creates a new string containing a copy of the buffer pointed to by
s of the given size in bytes.
- al_ustr_assign
- Overwrites the string
us1 with another string us2 .
- al_ustr_dup
- Returns a duplicate copy of a string.
|
function al_ustr_new_from_buffer (const s: AL_STRptr; size: AL_SIZE_T): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME; |
Creates a new string containing a copy of the buffer pointed to by s of the given size in bytes. The string must eventually be freed with al_ustr_free.
See also
- al_ustr_new
- Creates a new string containing a copy of the C-style string
s .
|
procedure al_ustr_free (us: ALLEGRO_USTRptr); CDECL; external ALLEGRO_LIB_NAME; |
Frees a previously allocated string. Does nothing if the argument is Nil .
See also
- al_ustr_new
- Creates a new string containing a copy of the C-style string
s .
- al_ustr_new_from_buffer
- Creates a new string containing a copy of the buffer pointed to by
s of the given size in bytes.
|
function al_cstr (const us: ALLEGRO_USTRptr): AL_STRptr; CDECL; external ALLEGRO_LIB_NAME; |
Gets a AL_STRptr pointer to the data in a string. This pointer will only be valid while the ALLEGRO_USTR object is not modified and not destroyed. The pointer may be passed to functions expecting C-style strings, with the following caveats:
ALLEGRO_USTR s are allowed to contain embedded NUL ($00 ) bytes. That means al_ustr_size (u) and Length (al_cstr (u)) may not agree.
An ALLEGRO_USTR may be created in such a way that it is not NUL terminated. A string which is dynamically allocated will always be NUL terminated, but a string which references the middle of another string or region of memory will not be NUL terminated.
If the ALLEGRO_USTR references another string, the returned C string will point into the referenced string. Again, no NUL terminator will be added to the referenced string.
See also
- al_ustr_to_buffer
- Writes the contents of the string into a pre-allocated buffer of the given size in bytes.
- al_cstr_dup
- Creates a
NUL ($00 ) terminated copy of the string.
- al_ustr_assign_cstr
- Overwrites the string
us1 with the contents of the string s .
|
procedure al_ustr_to_buffer (const us: ALLEGRO_USTRptr; buffer: AL_STRptr; size: AL_INT); CDECL; external ALLEGRO_LIB_NAME; |
Writes the contents of the string into a pre-allocated buffer of the given size in bytes. The result will always be NUL terminated, so a maximum of size - 1 bytes will be copied.
See also
- al_cstr
- Gets a
AL_STRptr pointer to the data in a string.
- al_cstr_dup
- Creates a
NUL ($00 ) terminated copy of the string.
|
function al_cstr_dup (const us: ALLEGRO_USTRptr): AL_STRptr; CDECL; external ALLEGRO_LIB_NAME; |
Creates a NUL ($00 ) terminated copy of the string. Any embedded NUL bytes will still be presented in the returned string. The new string must eventually be freed with al_free .
If an error occurs Nil is returned.
See also
- al_cstr
- Gets a
AL_STRptr pointer to the data in a string.
- al_ustr_to_buffer
- Writes the contents of the string into a pre-allocated buffer of the given size in bytes.
- al_free
- Like
FreeMem , releases the memory occupied by pointer p .
|
function al_ustr_dup (const us: ALLEGRO_USTRptr): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME; |
Returns a duplicate copy of a string. The new string will need to be freed with al_ustr_free .
See also
- al_ustr_dup_substr
- Returns a new copy of a string, containing its contents in the byte interval
[start_pos, end_pos) .
- al_ustr_free
- Frees a previously allocated string.
|
function al_ustr_dup_substr (const us: ALLEGRO_USTRptr; start_pos, end_pos: AL_INT): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME; |
Returns a new copy of a string, containing its contents in the byte interval [start_pos, end_pos) . The new string will be NUL terminated and will need to be freed with al_ustr_free .
If necessary, use al_ustr_offset to find the byte offsets for a given code point that you are interested in.
Note
This is used because the way the C language works. I didn't test if Pascal do need this kind of stuff. Future versions of Allegro.pas would not include this function, so don't use it unless your really need to (and tell me if you really need it to remove this warning from documentation).
See also
- al_ustr_dup
- Returns a duplicate copy of a string.
- al_ustr_free
- Frees a previously allocated string.
|
function al_ustr_empty_string: ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME; |
Returns a pointer to a static empty string. The string is read only and must not be freed.
|
function al_ref_cstr (out info: ALLEGRO_USTR_INFO; const s: AL_STR): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME; |
Creates a string that references the storage of a C-style string. The information about the string (e.g. its size) is stored in the info parameter. The string will not have any other storage allocated of its own, so if you allocate the info structure on the stack then no explicit "free" operation is required.
The string is valid until the underlying C string disappears.
Example:
VAR
Info: ALLEGRO_USTR_INFO;
us: ALLEGRO_USTRptr;
BEGIN
us := al_ref_cstr (Info, 'my string')
END;
See also
- al_ref_buffer
- Creates a string that references the storage of an underlying buffer.
- al_ref_ustr
- Creates a read-only string that references the storage of another ALLEGRO_USTR string.
|
function al_ref_buffer (out info: ALLEGRO_USTR_INFO; const s: AL_STRptr; size: AL_SIZE_T): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME; |
Creates a string that references the storage of an underlying buffer. The size of the buffer is given in bytes. You can use it to reference only part of a string or an arbitrary region of memory.
The string is valid while the underlying memory buffer is valid.
See also
- al_ref_cstr
- Creates a string that references the storage of a C-style string.
- al_ref_ustr
- Creates a read-only string that references the storage of another ALLEGRO_USTR string.
|
function al_ref_ustr (out info: ALLEGRO_USTR_INFO; const us: ALLEGRO_USTRptr; star_pos, end_pos: AL_INT): ALLEGRO_USTRptr; CDECL; external ALLEGRO_LIB_NAME; |
Creates a read-only string that references the storage of another ALLEGRO_USTR string. The information about the string (e.g. its size) is stored in the structure pointed to by the info parameter. The new string will not have any other storage allocated of its own, so if you allocate the info structure on the stack then no explicit "free" operation is required.
The referenced interval is [start_pos, end_pos) . Both are byte offsets.
The string is valid until the underlying string is modified or destroyed.
If you need a range of code-points instead of bytes, use al_ustr_offset to find the byte offsets.
See also
- al_ref_cstr
- Creates a string that references the storage of a C-style string.
- al_ref_buffer
- Creates a string that references the storage of an underlying buffer.
|
function al_ustr_size (const us: ALLEGRO_USTRptr): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME; |
Returns the size of the string in bytes. This is equal to the number of code points in the string if the string is empty or contains only 7-bit ASCII characters.
See also
- al_ustr_length
- Returns the number of code points in the string.
|
function al_ustr_length (const us: ALLEGRO_USTRptr): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME; |
Returns the number of code points in the string.
See also
- al_ustr_size
- Returns the size of the string in bytes.
- al_ustr_offset
- Returns the byte offset (from the start of the string) of the code point at the specified
index in the string.
|
function al_ustr_offset (const us: ALLEGRO_USTRptr;index: AL_INT): AL_INT; CDECL; external ALLEGRO_LIB_NAME; |
Returns the byte offset (from the start of the string) of the code point at the specified index in the string. A zero index parameter will return the first character of the string. If index is negative, it counts backward from the end of the string, so an index of -1 will return an offset to the last code point.
If the index is past the end of the string, returns the offset of the end of the string.
See also
- al_ustr_length
- Returns the number of code points in the string.
|
function al_ustr_next (const us: ALLEGRO_USTRptr; var aPos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME; |
Finds the byte offset of the next code point in string, beginning at aPos . aPos does not have to be at the beginning of a code point.
This function just looks for an appropriate byte; it doesn't check if found offset is the beginning of a valid code point. If you are working with possibly invalid UTF-8 strings then it could skip over some invalid bytes.
Returns
True on success, and aPos will be updated to the found offset. Otherwise returns False if aPos was already at the end of the string, and aPos is unmodified.
See also
- al_ustr_prev
- Finds the byte offset of the previous code point in string, before
aPos .
|
function al_ustr_prev (const us: ALLEGRO_USTRptr; var aPos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME; |
Finds the byte offset of the previous code point in string, before aPos . aPos does not have to be at the beginning of a code point.
This function just looks for an appropriate byte; it doesn't check if found offset is the beginning of a valid code point. If you are working with possibly invalid UTF-8 strings then it could skip over some invalid bytes.
Returns
True on success, and aPos will be updated to the found offset. Otherwise returns False if aPos was already at the end of the string, and aPos is unmodified.
See also
- al_ustr_next
- Finds the byte offset of the next code point in string, beginning at
aPos .
|
function al_ustr_insert_chr (us: ALLEGRO_USTRptr; aPos: AL_INT; c: AL_INT32) : AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME; |
Inserts a code point into us beginning at byte offset aPos . aPos cannot be less than 0. If aPos is past the end of us then the space between the end of the string and aPos will be padded with NUL ('\0' ) bytes.
Returns
The number of bytes inserted, or 0 on error. See also
- al_ustr_offset
- Returns the byte offset (from the start of the string) of the code point at the specified
index in the string.
- al_ustr_remove_chr
- Removes the code point beginning at byte offset pos.
|
function al_ustr_remove_chr (us: ALLEGRO_USTRptr; apos: AL_INT): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME; |
Removes the code point beginning at byte offset pos.
Use al_ustr_offset to find the byte offset for a code-points offset.
Returns
True on success. If apos is out of range or apos is not the beginning of a valid code point, returns False leaving the string unmodified.
See also
- al_ustr_offset
- Returns the byte offset (from the start of the string) of the code point at the specified
index in the string.
- al_ustr_insert_chr
- Inserts a code point into
us beginning at byte offset aPos .
|
function al_ustr_assign (us1: ALLEGRO_USTRptr; const us2: ALLEGRO_USTRptr): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME; |
Overwrites the string us1 with another string us2 .
Returns
True on success, False on error.
See also
- al_ustr_assign_cstr
- Overwrites the string
us1 with the contents of the string s .
|
function al_ustr_assign_cstr (us1: ALLEGRO_USTRptr; const s: AL_STR): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME; |
Overwrites the string us1 with the contents of the string s .
Returns
True on success, False on error.
See also
- al_ustr_assign
- Overwrites the string
us1 with another string us2 .
|
function al_ustr_equal (const us1, us2: ALLEGRO_USTRptr): AL_BOOL; CDECL; external ALLEGRO_LIB_NAME; |
Returns True if the two strings are equal. This function is more efficient than al_ustr_compare so is preferable if ordering is not important.
See also
- al_ustr_compare
- This function compares
us1 and us2 by code point values.
|
function al_ustr_compare (const u, v: ALLEGRO_USTRptr): AL_INT; CDECL; external ALLEGRO_LIB_NAME; |
This function compares us1 and us2 by code point values. It returns zero if the strings are equal, a positive number if us1 comes after us2 , else a negative number.
This does not take into account locale-specific sorting rules. For that you will need to use another library.
See also
- al_ustr_ncompare
- This function compares
us1 and us2 by code point values.
- al_ustr_equal
- Returns
True if the two strings are equal.
|
function al_ustr_ncompare (const u, v: ALLEGRO_USTRptr): AL_INT; CDECL; external ALLEGRO_LIB_NAME; |
This function compares us1 and us2 by code point values. It returns zero if the strings are equal, a positive number if us1 comes after us2 , else a negative number.
This does not take into account locale-specific sorting rules. For that you will need to use another library.
See also
- al_ustr_compare
- This function compares
us1 and us2 by code point values.
- al_ustr_equal
- Returns
True if the two strings are equal.
|
function al_utf8_width (c: AL_INT32): AL_SIZE_T; CDECL; external ALLEGRO_LIB_NAME; |
Returns the number of bytes that would be occupied by the specified code point when encoded in UTF-8. This is between 1 and 4 bytes for legal code point values. Otherwise returns 0.
|
Types
ALLEGRO_USTR = _al_tagbstring; |
An opaque type representing a string. ALLEGRO_USTR s normally contain UTF-8 encoded strings, but they may be used to hold any byte sequences, including Nil .
|
ALLEGRO_USTR_INFO = _al_tagbstring; |
A type that holds additional information for an ALLEGRO_USTR that references an external memory buffer.
See also
- al_ref_cstr
- Creates a string that references the storage of a C-style string.
- al_ref_buffer
- Creates a string that references the storage of an underlying buffer.
- al_ref_ustr
- Creates a read-only string that references the storage of another ALLEGRO_USTR string.
|
Generated by PasDoc 0.15.0. Generated on 2024-11-10 15:15:06.
|