Stdlib.Uchar
SourceUnicode characters.
The type for Unicode characters.
A value of this type represents a Unicode scalar value which is an integer in the ranges 0x0000
...0xD7FF
or 0xE000
...0x10FFFF
.
bom
is U+FEFF, the byte order mark (BOM) character.
rep
is U+FFFD, the replacement character.
is_valid n
is true
if and only if n
is a Unicode scalar value (i.e. in the ranges 0x0000
...0xD7FF
or 0xE000
...0x10FFFF
).
The type for UTF decode results. Values of this type represent the result of a Unicode Transformation Format decoding attempt.
utf_decode_is_valid d
is true
if and only if d
holds a valid decode.
utf_decode_uchar d
is the Unicode character decoded by d
if utf_decode_is_valid d
is true
and Uchar.rep
otherwise.
utf_decode_length d
is the number of elements from the source that were consumed by the decode d
. This is always strictly positive and smaller or equal to 4
. The kind of source elements depends on the actual decoder; for the decoders of the standard library this function always returns a length in bytes.
utf_decode n u
is a valid UTF decode for u
that consumed n
elements from the source for decoding. n
must be positive and smaller or equal to 4
(this is not checked by the module).
utf_decode_invalid n
is an invalid UTF decode that consumed n
elements from the source to error. n
must be positive and smaller or equal to 4
(this is not checked by the module). The resulting decode has rep
as the decoded Unicode character.
utf_8_byte_length u
is the number of bytes needed to encode u
in UTF-8.