Unicode for C.
Find a file
2023-07-23 18:22:21 +02:00
u8c Update comments; Fix u8c_decode_utf8_length not validating; Add attribute u8c_DEPRECATED; Deprecate u8c_encode_utf16 and u8c_encode_utf16_length as they're untested (this is not permanent); 2023-07-23 18:22:21 +02:00
u8c-check Rename source directory: src => source; Make changelog plain-text; Update CMake style; Rewrite in C99; Use separate CMake lists; Update copyright and license notices; Use header identifiers instead of keys for include guards; Use ifdef/ifndef; Remove top 'u8c' header (keep uninm); License under the LGPL; Bump required CMake version; Rename misc header to u8c; Remove assert; Remove attributes:; - u8c_attr_abitag; - u8c_attr_allocsz; - u8c_attr_artif; - u8c_attr_cold; - u8c_attr_fmt; - u8c_attr_malloc; - u8c_attr_hot; - u8c_attr_pure; - u8c_attr_retnonnull; - u8c_attr_sect; - u8c_attr_used; - u8c_attr_noderef; - u8c_attr_noesc; - u8c_attr_dup; Remove type constant macros; Remove our types; Remove endian-related facilities; Remove memory functions; Remove bytesz and dbg; Fix version number; Remove fmt header; Add new header 'format'; Add new header 'character'; Replace utf header with new 'format' and 'character' headers; Remove math header; Remove impl header; Remove cstr header; Remove arr header; Make functions non-constexpr; Update naming convention; Implement UTF-16 conversions; Split cnv into multiple functions:; - encode_utf8 (UTF-32 to UTF-8); - decode_utf8 (UTF-8 to UTF-32); - encode_utf16 (UTF-32 to UTF-16); - decode_utf16 (UTF-16 to UTF-32); Use caller-provided buffer in conversion functions; Rename u8c::isupper to u8c_is_majuscule; Rename u8c::islower to u8c_is_minuscule; Update code style; Change type of version constant (now uint_least32_t); Use Git tagging for versioning; Don't throw exceptions; Update warning flags; Update optimisation flags; Rename u8c::unimax to u8c_MAX_CODE_POINT; Remove u8c::uniblk; Clean up code; Don't define functions in headers; Rename u8c::isspace to u8c_is_whitespace; Add more characters to u8c_is_whitespace; Rename u8c::ispunct to u8c_is_punctuation; Add more characters to u8c_is_punctuation; Remove u8c::isalnum; Rename u8c::uninm to u8c_unicode_name; Use caller-provided buffer in u8c_unicode_name; Add constant for the maximum length of a Unicode identifier: u8c_MAXIMUM_NAME_LENGTH; Add functions for determening the length of encodings and decodings:; - u8c_decode_utf8_length; - u8c_encode_utf8_length; - u8c_encode_utf16_length; - u8c_decode_utf16_length; Rename u8c_attr_const to u8c_UNSEQUENCED; Rename u8c_attr_inline to u8c_ALWAYS_INLINE; Add new attribute u8c_NO_DISCARD; Validate encodings; Rework readme; Rename u8c::isdigit and u8x::isxdigit to is_numeric and is_hexadecimal_numeric; Rename u8c::isalpha to u8c_is_alphabetic; Rename u8c::iscntrl to u8c_is_control; Rename u8c::issurro to u8c_is_surrogate; Optimise code; Update gitignore; 2023-07-23 12:53:07 +02:00
.gitignore Rename source directory: src => source; Make changelog plain-text; Update CMake style; Rewrite in C99; Use separate CMake lists; Update copyright and license notices; Use header identifiers instead of keys for include guards; Use ifdef/ifndef; Remove top 'u8c' header (keep uninm); License under the LGPL; Bump required CMake version; Rename misc header to u8c; Remove assert; Remove attributes:; - u8c_attr_abitag; - u8c_attr_allocsz; - u8c_attr_artif; - u8c_attr_cold; - u8c_attr_fmt; - u8c_attr_malloc; - u8c_attr_hot; - u8c_attr_pure; - u8c_attr_retnonnull; - u8c_attr_sect; - u8c_attr_used; - u8c_attr_noderef; - u8c_attr_noesc; - u8c_attr_dup; Remove type constant macros; Remove our types; Remove endian-related facilities; Remove memory functions; Remove bytesz and dbg; Fix version number; Remove fmt header; Add new header 'format'; Add new header 'character'; Replace utf header with new 'format' and 'character' headers; Remove math header; Remove impl header; Remove cstr header; Remove arr header; Make functions non-constexpr; Update naming convention; Implement UTF-16 conversions; Split cnv into multiple functions:; - encode_utf8 (UTF-32 to UTF-8); - decode_utf8 (UTF-8 to UTF-32); - encode_utf16 (UTF-32 to UTF-16); - decode_utf16 (UTF-16 to UTF-32); Use caller-provided buffer in conversion functions; Rename u8c::isupper to u8c_is_majuscule; Rename u8c::islower to u8c_is_minuscule; Update code style; Change type of version constant (now uint_least32_t); Use Git tagging for versioning; Don't throw exceptions; Update warning flags; Update optimisation flags; Rename u8c::unimax to u8c_MAX_CODE_POINT; Remove u8c::uniblk; Clean up code; Don't define functions in headers; Rename u8c::isspace to u8c_is_whitespace; Add more characters to u8c_is_whitespace; Rename u8c::ispunct to u8c_is_punctuation; Add more characters to u8c_is_punctuation; Remove u8c::isalnum; Rename u8c::uninm to u8c_unicode_name; Use caller-provided buffer in u8c_unicode_name; Add constant for the maximum length of a Unicode identifier: u8c_MAXIMUM_NAME_LENGTH; Add functions for determening the length of encodings and decodings:; - u8c_decode_utf8_length; - u8c_encode_utf8_length; - u8c_encode_utf16_length; - u8c_decode_utf16_length; Rename u8c_attr_const to u8c_UNSEQUENCED; Rename u8c_attr_inline to u8c_ALWAYS_INLINE; Add new attribute u8c_NO_DISCARD; Validate encodings; Rework readme; Rename u8c::isdigit and u8x::isxdigit to is_numeric and is_hexadecimal_numeric; Rename u8c::isalpha to u8c_is_alphabetic; Rename u8c::iscntrl to u8c_is_control; Rename u8c::issurro to u8c_is_surrogate; Optimise code; Update gitignore; 2023-07-23 12:53:07 +02:00
CHANGELOG.txt Update comments; Fix u8c_decode_utf8_length not validating; Add attribute u8c_DEPRECATED; Deprecate u8c_encode_utf16 and u8c_encode_utf16_length as they're untested (this is not permanent); 2023-07-23 18:22:21 +02:00
CMakeLists.txt Rename source directory: src => source; Make changelog plain-text; Update CMake style; Rewrite in C99; Use separate CMake lists; Update copyright and license notices; Use header identifiers instead of keys for include guards; Use ifdef/ifndef; Remove top 'u8c' header (keep uninm); License under the LGPL; Bump required CMake version; Rename misc header to u8c; Remove assert; Remove attributes:; - u8c_attr_abitag; - u8c_attr_allocsz; - u8c_attr_artif; - u8c_attr_cold; - u8c_attr_fmt; - u8c_attr_malloc; - u8c_attr_hot; - u8c_attr_pure; - u8c_attr_retnonnull; - u8c_attr_sect; - u8c_attr_used; - u8c_attr_noderef; - u8c_attr_noesc; - u8c_attr_dup; Remove type constant macros; Remove our types; Remove endian-related facilities; Remove memory functions; Remove bytesz and dbg; Fix version number; Remove fmt header; Add new header 'format'; Add new header 'character'; Replace utf header with new 'format' and 'character' headers; Remove math header; Remove impl header; Remove cstr header; Remove arr header; Make functions non-constexpr; Update naming convention; Implement UTF-16 conversions; Split cnv into multiple functions:; - encode_utf8 (UTF-32 to UTF-8); - decode_utf8 (UTF-8 to UTF-32); - encode_utf16 (UTF-32 to UTF-16); - decode_utf16 (UTF-16 to UTF-32); Use caller-provided buffer in conversion functions; Rename u8c::isupper to u8c_is_majuscule; Rename u8c::islower to u8c_is_minuscule; Update code style; Change type of version constant (now uint_least32_t); Use Git tagging for versioning; Don't throw exceptions; Update warning flags; Update optimisation flags; Rename u8c::unimax to u8c_MAX_CODE_POINT; Remove u8c::uniblk; Clean up code; Don't define functions in headers; Rename u8c::isspace to u8c_is_whitespace; Add more characters to u8c_is_whitespace; Rename u8c::ispunct to u8c_is_punctuation; Add more characters to u8c_is_punctuation; Remove u8c::isalnum; Rename u8c::uninm to u8c_unicode_name; Use caller-provided buffer in u8c_unicode_name; Add constant for the maximum length of a Unicode identifier: u8c_MAXIMUM_NAME_LENGTH; Add functions for determening the length of encodings and decodings:; - u8c_decode_utf8_length; - u8c_encode_utf8_length; - u8c_encode_utf16_length; - u8c_decode_utf16_length; Rename u8c_attr_const to u8c_UNSEQUENCED; Rename u8c_attr_inline to u8c_ALWAYS_INLINE; Add new attribute u8c_NO_DISCARD; Validate encodings; Rework readme; Rename u8c::isdigit and u8x::isxdigit to is_numeric and is_hexadecimal_numeric; Rename u8c::isalpha to u8c_is_alphabetic; Rename u8c::iscntrl to u8c_is_control; Rename u8c::issurro to u8c_is_surrogate; Optimise code; Update gitignore; 2023-07-23 12:53:07 +02:00
COPYING Rename source directory: src => source; Make changelog plain-text; Update CMake style; Rewrite in C99; Use separate CMake lists; Update copyright and license notices; Use header identifiers instead of keys for include guards; Use ifdef/ifndef; Remove top 'u8c' header (keep uninm); License under the LGPL; Bump required CMake version; Rename misc header to u8c; Remove assert; Remove attributes:; - u8c_attr_abitag; - u8c_attr_allocsz; - u8c_attr_artif; - u8c_attr_cold; - u8c_attr_fmt; - u8c_attr_malloc; - u8c_attr_hot; - u8c_attr_pure; - u8c_attr_retnonnull; - u8c_attr_sect; - u8c_attr_used; - u8c_attr_noderef; - u8c_attr_noesc; - u8c_attr_dup; Remove type constant macros; Remove our types; Remove endian-related facilities; Remove memory functions; Remove bytesz and dbg; Fix version number; Remove fmt header; Add new header 'format'; Add new header 'character'; Replace utf header with new 'format' and 'character' headers; Remove math header; Remove impl header; Remove cstr header; Remove arr header; Make functions non-constexpr; Update naming convention; Implement UTF-16 conversions; Split cnv into multiple functions:; - encode_utf8 (UTF-32 to UTF-8); - decode_utf8 (UTF-8 to UTF-32); - encode_utf16 (UTF-32 to UTF-16); - decode_utf16 (UTF-16 to UTF-32); Use caller-provided buffer in conversion functions; Rename u8c::isupper to u8c_is_majuscule; Rename u8c::islower to u8c_is_minuscule; Update code style; Change type of version constant (now uint_least32_t); Use Git tagging for versioning; Don't throw exceptions; Update warning flags; Update optimisation flags; Rename u8c::unimax to u8c_MAX_CODE_POINT; Remove u8c::uniblk; Clean up code; Don't define functions in headers; Rename u8c::isspace to u8c_is_whitespace; Add more characters to u8c_is_whitespace; Rename u8c::ispunct to u8c_is_punctuation; Add more characters to u8c_is_punctuation; Remove u8c::isalnum; Rename u8c::uninm to u8c_unicode_name; Use caller-provided buffer in u8c_unicode_name; Add constant for the maximum length of a Unicode identifier: u8c_MAXIMUM_NAME_LENGTH; Add functions for determening the length of encodings and decodings:; - u8c_decode_utf8_length; - u8c_encode_utf8_length; - u8c_encode_utf16_length; - u8c_decode_utf16_length; Rename u8c_attr_const to u8c_UNSEQUENCED; Rename u8c_attr_inline to u8c_ALWAYS_INLINE; Add new attribute u8c_NO_DISCARD; Validate encodings; Rework readme; Rename u8c::isdigit and u8x::isxdigit to is_numeric and is_hexadecimal_numeric; Rename u8c::isalpha to u8c_is_alphabetic; Rename u8c::iscntrl to u8c_is_control; Rename u8c::issurro to u8c_is_surrogate; Optimise code; Update gitignore; 2023-07-23 12:53:07 +02:00
COPYING.LESSER Rename source directory: src => source; Make changelog plain-text; Update CMake style; Rewrite in C99; Use separate CMake lists; Update copyright and license notices; Use header identifiers instead of keys for include guards; Use ifdef/ifndef; Remove top 'u8c' header (keep uninm); License under the LGPL; Bump required CMake version; Rename misc header to u8c; Remove assert; Remove attributes:; - u8c_attr_abitag; - u8c_attr_allocsz; - u8c_attr_artif; - u8c_attr_cold; - u8c_attr_fmt; - u8c_attr_malloc; - u8c_attr_hot; - u8c_attr_pure; - u8c_attr_retnonnull; - u8c_attr_sect; - u8c_attr_used; - u8c_attr_noderef; - u8c_attr_noesc; - u8c_attr_dup; Remove type constant macros; Remove our types; Remove endian-related facilities; Remove memory functions; Remove bytesz and dbg; Fix version number; Remove fmt header; Add new header 'format'; Add new header 'character'; Replace utf header with new 'format' and 'character' headers; Remove math header; Remove impl header; Remove cstr header; Remove arr header; Make functions non-constexpr; Update naming convention; Implement UTF-16 conversions; Split cnv into multiple functions:; - encode_utf8 (UTF-32 to UTF-8); - decode_utf8 (UTF-8 to UTF-32); - encode_utf16 (UTF-32 to UTF-16); - decode_utf16 (UTF-16 to UTF-32); Use caller-provided buffer in conversion functions; Rename u8c::isupper to u8c_is_majuscule; Rename u8c::islower to u8c_is_minuscule; Update code style; Change type of version constant (now uint_least32_t); Use Git tagging for versioning; Don't throw exceptions; Update warning flags; Update optimisation flags; Rename u8c::unimax to u8c_MAX_CODE_POINT; Remove u8c::uniblk; Clean up code; Don't define functions in headers; Rename u8c::isspace to u8c_is_whitespace; Add more characters to u8c_is_whitespace; Rename u8c::ispunct to u8c_is_punctuation; Add more characters to u8c_is_punctuation; Remove u8c::isalnum; Rename u8c::uninm to u8c_unicode_name; Use caller-provided buffer in u8c_unicode_name; Add constant for the maximum length of a Unicode identifier: u8c_MAXIMUM_NAME_LENGTH; Add functions for determening the length of encodings and decodings:; - u8c_decode_utf8_length; - u8c_encode_utf8_length; - u8c_encode_utf16_length; - u8c_decode_utf16_length; Rename u8c_attr_const to u8c_UNSEQUENCED; Rename u8c_attr_inline to u8c_ALWAYS_INLINE; Add new attribute u8c_NO_DISCARD; Validate encodings; Rework readme; Rename u8c::isdigit and u8x::isxdigit to is_numeric and is_hexadecimal_numeric; Rename u8c::isalpha to u8c_is_alphabetic; Rename u8c::iscntrl to u8c_is_control; Rename u8c::issurro to u8c_is_surrogate; Optimise code; Update gitignore; 2023-07-23 12:53:07 +02:00
README.txt Rename source directory: src => source; Make changelog plain-text; Update CMake style; Rewrite in C99; Use separate CMake lists; Update copyright and license notices; Use header identifiers instead of keys for include guards; Use ifdef/ifndef; Remove top 'u8c' header (keep uninm); License under the LGPL; Bump required CMake version; Rename misc header to u8c; Remove assert; Remove attributes:; - u8c_attr_abitag; - u8c_attr_allocsz; - u8c_attr_artif; - u8c_attr_cold; - u8c_attr_fmt; - u8c_attr_malloc; - u8c_attr_hot; - u8c_attr_pure; - u8c_attr_retnonnull; - u8c_attr_sect; - u8c_attr_used; - u8c_attr_noderef; - u8c_attr_noesc; - u8c_attr_dup; Remove type constant macros; Remove our types; Remove endian-related facilities; Remove memory functions; Remove bytesz and dbg; Fix version number; Remove fmt header; Add new header 'format'; Add new header 'character'; Replace utf header with new 'format' and 'character' headers; Remove math header; Remove impl header; Remove cstr header; Remove arr header; Make functions non-constexpr; Update naming convention; Implement UTF-16 conversions; Split cnv into multiple functions:; - encode_utf8 (UTF-32 to UTF-8); - decode_utf8 (UTF-8 to UTF-32); - encode_utf16 (UTF-32 to UTF-16); - decode_utf16 (UTF-16 to UTF-32); Use caller-provided buffer in conversion functions; Rename u8c::isupper to u8c_is_majuscule; Rename u8c::islower to u8c_is_minuscule; Update code style; Change type of version constant (now uint_least32_t); Use Git tagging for versioning; Don't throw exceptions; Update warning flags; Update optimisation flags; Rename u8c::unimax to u8c_MAX_CODE_POINT; Remove u8c::uniblk; Clean up code; Don't define functions in headers; Rename u8c::isspace to u8c_is_whitespace; Add more characters to u8c_is_whitespace; Rename u8c::ispunct to u8c_is_punctuation; Add more characters to u8c_is_punctuation; Remove u8c::isalnum; Rename u8c::uninm to u8c_unicode_name; Use caller-provided buffer in u8c_unicode_name; Add constant for the maximum length of a Unicode identifier: u8c_MAXIMUM_NAME_LENGTH; Add functions for determening the length of encodings and decodings:; - u8c_decode_utf8_length; - u8c_encode_utf8_length; - u8c_encode_utf16_length; - u8c_decode_utf16_length; Rename u8c_attr_const to u8c_UNSEQUENCED; Rename u8c_attr_inline to u8c_ALWAYS_INLINE; Add new attribute u8c_NO_DISCARD; Validate encodings; Rework readme; Rename u8c::isdigit and u8x::isxdigit to is_numeric and is_hexadecimal_numeric; Rename u8c::isalpha to u8c_is_alphabetic; Rename u8c::iscntrl to u8c_is_control; Rename u8c::issurro to u8c_is_surrogate; Optimise code; Update gitignore; 2023-07-23 12:53:07 +02:00
u8c.svg Version 23. 2021-09-14 15:22:39 +02:00
u8c_boxes.svg Version 17. 2021-06-15 20:40:03 +02:00
u8c_braille.svg Version 17. 2021-06-15 20:40:03 +02:00

U8C

Unicode for C.

- ABOUT

u8c is a C library for various Unicode-related functions. It is written in the
1999 edition of C - C99 - with support for C++11.

- FEATURES

u8c supports conversions to and from the UTF-8 and UTF-16 formats. UTF-32 is
used as the intermediate format.

The encoding and decoding functions automatically replace invalid code
sequences with the replacement character: U+FFFD REPLACEMENT CHARACTER.

Additionally, character trait functions can help determin the type of code
point: Alphabetic, control, numeric, punctuation, etc.

- INSTALLATION

A PKGBUILD is hosted on mandelbrot.dk at:
<https://mandelbrot.dk/pkgbuild_u8c>

- COMPILATION

u8c uses CMake as its build system. The flag 'U8C_CHECK' may be set to ON to
enable building of the check program.

- COPYRIGHT & LICENSE

Copyright 2021, 2023 Gabriel Bjørnager Jensen.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU Lesser General Public License as published by the Free
Software Foundation, either version 3 of the License, or (at your option) any
later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along
with this program. If not, see <https://www.gnu.org/licenses/>.