# unicode.h ## Overview
API for converting between UTF-8 encoded text and UTF-16 or UTF-32. All strings in The Machinery are UTF-8 encoded, but UTF-16 and UTF-32 are sometimes needed to communicate with external APIs. For example, Windows uses UTF-16.
## Index
`struct tm_unicode_api`

UTF-32
`utf8_encode()`
`utf8_decode()`
`utf8_num_codepoints()`
`utf8_decode_n()`
`utf8_to_utf32()`
`utf8_to_utf32_n()`
`utf32_to_utf8()`
`utf32_to_utf8_n()`

UTF-16
`utf16_encode()`
`utf16_decode()`
`utf8_to_utf16()`
`utf8_to_utf16_n()`
`utf16_to_utf8()`
`utf16_to_utf8_n()`

`TM_UNICODE_API_NAME`
`tm_codepoint_to_utf8()`
## API
### `struct tm_unicode_api`

UTF-32

#### `utf8_encode()` ~~~c uint8_t *(*utf8_encode)(uint8_t *utf8, uint32_t codepoint); ~~~ Encodes the `codepoint` as UTF-8 into `utf8` and returns a pointer to the position where to insert the next codepoint. `utf8` should have room for at least four bytes (the maximum size of a UTF-8 encoded codepoint). #### `utf8_decode()` ~~~c uint32_t (*utf8_decode)(const uint8_t **utf8); ~~~ Decodes and returns the first codepoint in the UTF-8 string `utf8`. The string pointer is advanced to point to the next codepoint in the string. Will generate an error message if the string is not a UTF-8 string. #### `utf8_num_codepoints()` ~~~c uint32_t (*utf8_num_codepoints)(const uint8_t *utf8); ~~~ Returns the number of codepoints in `utf8`. #### `utf8_decode_n()` ~~~c uint32_t (*utf8_decode_n)(uint32_t *codepoints, uint32_t n, const uint8_t **utf8); ~~~ Decodes the first `n` codepoints in `utf8` to the `codepoints` buffer. If `utf8` contains fewer than `n` codepoints -- decodes as many codepoints there are in `utf8`. Returns the number of decoded codepoints. The `utf8` pointer is advanced to point beyond the last decoded codepoint. #### `utf8_to_utf32()` ~~~c uint32_t *(*utf8_to_utf32)(const uint8_t *utf8, struct tm_temp_allocator_i *ta); ~~~ Converts a UTF-8 encoded string to a UTF-32 encoded one, allocated with the supplied temp allocator. Will generate an error message if the string is not a UTF-8 string. #### `utf8_to_utf32_n()` ~~~c uint32_t *(*utf8_to_utf32_n)(const uint8_t *utf8, uint32_t n, struct tm_temp_allocator_i *ta); ~~~ As `utf8_to_utf32()`, but uses an explicit length instead of a zero terminated string. Note that the result string will still be zero terminated. #### `utf32_to_utf8()` ~~~c uint8_t *(*utf32_to_utf8)(const uint32_t *utf32, struct tm_temp_allocator_i *ta); ~~~ Converts a UTF-32 encoded string to a UTF-8 encoded one, allocated with the specified temp allocator. Generates an error if the data is outside the UTF-8 encoding range. #### `utf32_to_utf8_n()` ~~~c uint8_t *(*utf32_to_utf8_n)(const uint32_t *utf32, uint32_t n, struct tm_temp_allocator_i *ta); ~~~ As `utf32_to_utf8()`, but uses an explicit length instead of a zero terminated string. Note that the result string will still be zero terminated.

UTF-16

#### `utf16_encode()` ~~~c uint16_t *(*utf16_encode)(uint16_t *utf16, uint32_t codepoint); ~~~ Encodes the codepoint as UTF-16 into `utf16` and returns a pointer to the position where to insert the next codepoint. `utf16` should have at room for at least two `uint16_t` (the maximum size of a UTF-16 encoded codepoint). #### `utf16_decode()` ~~~c uint32_t (*utf16_decode)(const uint16_t **utf16); ~~~ Decodes and returns the first codepoint in the UTF-16 string `utf16`. The string pointer is advanced to point to the next codepoint in the string. #### `utf8_to_utf16()` ~~~c uint16_t *(*utf8_to_utf16)(const uint8_t *utf8, struct tm_temp_allocator_i *ta); ~~~ Converts a UTF-8 encoded string to a UTF-16 encoded one, allocated with the supplied temp allocator. Will generate an error message if the data is outside the UTF-8 encoding range. #### `utf8_to_utf16_n()` ~~~c uint16_t *(*utf8_to_utf16_n)(const uint8_t *utf8, uint32_t n, struct tm_temp_allocator_i *ta); ~~~ As `utf8_to_utf16()` but uses an explicit length instead of a zero terminated string. Note that the result string will still be zero terminated. #### `utf16_to_utf8()` ~~~c uint8_t *(*utf16_to_utf8)(const uint16_t *utf16, struct tm_temp_allocator_i *ta); ~~~ Converts a UTF-16 encoded string to a UTF-8 encoded one, allocated with the specified temp allocator. Will generate an error message if the string is not a UTF-16 string. #### `utf16_to_utf8_n()` ~~~c uint8_t *(*utf16_to_utf8_n)(const uint16_t *utf16, uint32_t n, struct tm_temp_allocator_i *ta); ~~~ As `utf16_to_utf8()` but uses an explicit length instead of a zero terminated string. Note that the result string will still be zero terminated.
### `TM_UNICODE_API_NAME`
~~~c #define TM_UNICODE_API_NAME "tm_unicode_api" ~~~
### `tm_codepoint_to_utf8()`
~~~c #define tm_codepoint_to_utf8(cp) ~~~ Returns a UTF-8 string representing the codepoint `cp`. The string is stack allocated.