Trivial UTF-8 Manual

1 Introduction

Trivial UTF-8 is a small library for doing UTF-8-based in- and output on a Lisp implementation that already supports Unicode - meaning CHAR-CODE and CODE-CHAR deal with Unicode character codes.

The rationale for the existence of this library is that while Unicode-enabled implementations usually do provide some kind of interface to dealing with character encodings, these are typically not terribly flexible or uniform.

The Babel library solves a similar problem while understanding more encodings. Trivial UTF-8 was written before Babel existed, but for new projects you might be better off going with Babel. The one plus that Trivial UTF-8 has is that it doesn't depend on any other libraries.

2 Links and Systems

Here is the official repository and the HTML documentation for the latest version.

[system] "trivial-utf-8"
- Description: A small library for doing UTF-8-based input and output.
- Licence: ZLIB
- Author: Marijn Haverbeke marijnh@gmail.com
- Maintainer: Gábor Melis mega@retes.hu
- Homepage: https://common-lisp.net/project/trivial-utf-8/
- Bug tracker: https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues
- Source control: GIT
- Depends on: mgl-pax-bootstrap

3 Reference

[function] UTF-8-BYTE-LENGTH STRING

Calculate the amount of bytes needed to encode STRING.

[function] STRING-TO-UTF-8-BYTES STRING &KEY NULL-TERMINATE

Convert STRING into an array of unsigned bytes containing its UTF-8 representation. If NULL-TERMINATE, add an extra 0 byte at the end.

[function] UTF-8-GROUP-SIZE BYTE

Determine the amount of bytes that are part of the character whose encoding starts with BYTE. May signal UTF-8-DECODING-ERROR.

[function] UTF-8-BYTES-TO-STRING BYTES &KEY (START 0) (END (LENGTH BYTES))

Convert the START, END subsequence of the array of BYTES containing UTF-8 encoded characters to a STRING. The element type of BYTES may be anything as long as it can be COERCEd into an (UNSIGNED-BYTES 8) array. May signal UTF-8-DECODING-ERROR.

[function] READ-UTF-8-STRING INPUT &KEY NULL-TERMINATED STOP-AT-EOF (CHAR-LENGTH -1) (BYTE-LENGTH -1)

Read UTF-8 encoded data from INPUT, a byte stream, and construct a string with the characters found. When NULL-TERMINATED is given, stop reading at a null character. If STOP-AT-EOF, then stop at END-OF-FILE without raising an error. The CHAR-LENGTH and BYTE-LENGTH parameters can be used to specify the max amount of characters or bytes to read, where -1 means no limit. May signal UTF-8-DECODING-ERROR.

[condition] UTF-8-DECODING-ERROR SIMPLE-ERROR

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.gitignore		.gitignore
COPYING		COPYING
README		README
README.md		README.md
test.txt		test.txt
tests.lisp		tests.lisp
trivial-utf-8.asd		trivial-utf-8.asd
trivial-utf-8.lisp		trivial-utf-8.lisp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Trivial UTF-8 Manual

Table of Contents

[in package TRIVIAL-UTF-8]

1 Introduction

2 Links and Systems

3 Reference

[generated by MGL-PAX]

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

clasp-developers/trivial-utf-8

Folders and files

Latest commit

History

Repository files navigation

Trivial UTF-8 Manual

Table of Contents

[in package TRIVIAL-UTF-8]

1 Introduction

2 Links and Systems

3 Reference

[generated by MGL-PAX]

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages