atomvm
diff --git a/‎CHANGELOG.md‎
Lines changed: 10 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎UPDATING.md‎
Lines changed: 6 additions & 0 deletions b/‎UPDATING.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎doc/src/differences-with-beam.md‎
Lines changed: 104 additions & 5 deletions b/‎doc/src/differences-with-beam.md‎
Lines changed: 104 additions & 5 deletions
diff --git a/‎doc/src/memory-management.md‎
Lines changed: 104 additions & 10 deletions b/‎doc/src/memory-management.md‎
Lines changed: 104 additions & 10 deletions
diff --git a/‎doc/src/programmers-guide.md‎
Lines changed: 2 additions & 2 deletions b/‎doc/src/programmers-guide.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎libs/jit/include/jit.hrl‎
Lines changed: 4 additions & 0 deletions b/‎libs/jit/include/jit.hrl‎
Lines changed: 4 additions & 0 deletions
@@ -59,6 +59,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Reimplemented `lists:keyfind`, `lists:keymember` and `lists:member` as NIFs
 - Added `AVM_PRINT_PROCESS_CRASH_DUMPS` option
 - Added `lists:ukeysort/2`
+- Added support for big integers up to 256-bit (sign + 256-bit magnitude)
+- Added support for big integers in `binary_to_term/1` and `term_to_binary/1,2`
 
 ### Changed
 
@@ -69,6 +71,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Entry point now is `init:boot/1` if it exists. It starts the kernel application and calls `start/0` from the
   identified startup module. Users who started kernel application (typically for distribution) must no longer
   do it. Startint `net_kernel` is still required.
+- All arithmetic operations (`+`, `-`, `*`, `div`, `rem`, `abs`, etc.) now support integers up to 256-bit
+- All bitwise operations (`band`, `bor`, `bxor`, `bnot`, `bsl`, `bsr`) now support integers up to 256-bit
+- Float conversion functions now support converting to/from big integers
+- `bsl` now properly checks for overflow
+- `binary_to_integer/1` no longer accepts binaries such as `<<"0xFF">>` or `<<"  123">>`
+- `binary_to_integer` and `list_to_integer` do not raise anymore `overflow` error, they raise
+instead `badarg`.
 
 ### Fixed
 
@@ -79,6 +88,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - packbeam: fix memory leak preventing building with address sanitizer
 - Fixed a bug where empty atom could not be created on some platforms, thus breaking receiving a message for a registered process from an OTP node.
 - Fix a memory leak in distribution when a BEAM node would monitor a process by name.
+- Fix `list_to_integer`, it was likely buggy with integers close to INT64_MAX
 
 ## [0.6.7] - Unreleased
 
 
@@ -13,6 +13,12 @@ port socket driver, are also represented by a port and some matching code may ne
 `is_pid/1` to `is_port/1`.
 - Ports and pids can be registered. Function `globalcontext_get_registered_process` result now is
 a term that can be a `port()` or a `pid()`.
+- `bsl` (Bitshift left) now checks for overflows, this shouldn't be a practical issue for existing
+code, since integers were limited to 64 bits, however make sure to bitmask values before left
+bitshifts: e.g. `(16#FFFF band 0xF) bsl 252`.
+- `binary_to_integer` and `list_to_integer` do not raise `overflow` error anymore, they instead
+raise `badarg` when trying to parse an integer that exceeds 256 bits. Update any relevant error
+handling code.
 
 ## v0.6.4 -> v0.6.5
 
 
@@ -39,11 +39,110 @@ AtomVM does not implement some key features of the BEAM. Some of these limitatio
 worked on and this list might be outdated. Do not hesitate to check GitHub issues or contact us
 when in doubt.
 
-### Wide precision integers
-
-AtomVM currently only supports 64 bits integers. This is being worked on. However, please note
-that AtomVM is unlikely to support arbitrary precision integers as libraries for such support
-usually are quite large.
+### Integer precision and overflow
+
+AtomVM supports integers up to 256-bit with an additional sign flag, while BEAM supports unlimited
+precision integers. This fundamental difference has several implications:
+
+#### Integer limits
+
+- **Maximum value**: `16#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF` (256
+ones, which equals `2^256 - 1`)
+- **Minimum value**: `-16#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF` (which
+equals `-(2^256 - 1)`)
+
+Note that AtomVM does not use two's complement for big integers. The sign is stored as a separate
+flag, which means `INTEGER_MAX = -INTEGER_MIN`.
+
+#### Overflow errors
+
+Unlike BEAM, AtomVM raises `overflow` errors when integer operations exceed 256-bit capacity:
+
+```erlang
+IntMax = 16#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,
+% The following will raise an overflow error on AtomVM, but succeeds on BEAM:
+Result = IntMax + 1  % overflow error
+
+% Also applies to subtraction and multiplication:
+-IntMax - 1  % overflow error
+IntMax * 2   % overflow error
+```
+
+Handling overflows:
+
+```erlang
+safe_calc(MaybeOvfFun) ->
+    try MaybeOvfFun() of
+        I when is_integer(I) -> {ok, I}
+    catch
+        error:overflow -> {error, overflow}
+    end.
+
+% Returns `{ok, Result}`, Result is a 255 bit integer
+safe_calc(fun() -> factorial(57) end).
+
+% Returns `{error, overflow}`, since 261 bit integers are not allowed
+safe_calc(fun() -> factorial(58) end).
+```
+
+Overflow can also occur with:
+- Bit shift left operations: `1 bsl 257` raises overflow (shifting beyond the 256-bit boundary).
+When shifting values with multiple set bits, mask first to prevent overflow: `16#FFFF bsl 252`
+would overflow, but `(16#FFFF band 0xF) bsl 252` succeeds
+- Float to integer conversions: `ceil/1`, `round/1`, etc. when the result exceeds 256-bit
+
+Note: While BEAM raises `system_limit` error for operations like
+`1 bsl 2000000000000000000000000000000000`, AtomVM consistently uses `overflow` error for all
+integer capacity violations.
+
+Note: Integer literals larger than 256 bits in source code will compile successfully with
+Erlang/Elixir compilers, but the resulting BEAM files will fail to load on AtomVM. This also
+applies to compile-time constant expressions that evaluate to integers exceeding 256 bits, such as
+`1 bsl 300`. These expressions are evaluated by the compiler and stored as constants in the BEAM
+file, causing the same load-time failure. Always ensure that integer constants in your code are
+within AtomVM's supported range.
+
+Note: The `erlang:binary_to_term/1,2` function raises a `badarg` error when attempting to
+deserialize binary data containing an integer larger than 256 bits. This differs from BEAM, which
+can deserialize integers of any size. Applications that exchange serialized terms with BEAM nodes
+should be aware of this limitation.
+
+Note: String and binary conversion functions such as `erlang:binary_to_integer/1,2`,
+`erlang:list_to_integer/1,2`, and Elixir's `String.to_integer/1,2` raise a `badarg` error when the
+input represents an integer exceeding 256 bits. For example,
+`erlang:binary_to_integer(<<"10000000000000000000000000000000000000000000000000000000000000000">>, 16)`
+will fail with `badarg` on AtomVM, while it succeeds on BEAM. Applications parsing user input or
+external data should validate that numeric values fall within AtomVM's supported range.
+
+#### Bitwise operations edge cases
+
+The 256-bit limitation creates specific edge cases with bitwise operations that would require 257
+bits:
+
+On BEAM (unlimited precision), returns `-IntMax - 1` (requires 257 bits):
+
+```erlang
+1> IntMax = 16#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.
+115792089237316195423570985008687907853269984665640564039457584007913129639935
+2> integer_to_binary(-1 bxor IntMax, 16).
+<<"-10000000000000000000000000000000000000000000000000000000000000000">>
+3> integer_to_binary(bnot IntMax, 16).
+<<"-10000000000000000000000000000000000000000000000000000000000000000">>
+```
+
+On AtomVM (256-bit limited), returns 0 (cannot represent 257th bit):
+
+```erlang
+1> IntMax = 16#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.
+115792089237316195423570985008687907853269984665640564039457584007913129639935
+2> -1 bxor IntMax.
+0
+3> bnot IntMax.
+0
+```
+
+This occurs because AtomVM cannot create an integer with the 257th bit set to 1 with negative sign.
+Since `-0` is not allowed, the result is normalized to `0`.
 
 ### Bit syntax
 
 
@@ -180,7 +180,15 @@ loaded) a fixed size table.  Management of the global atom table is outside of t
 
 ### Integers
 
-An integer is represented as a single word, with the low-order 4 bits having the value `0xF` (`1111b`).  The high order word-size-6 bits are used to represent the integer value:
+AtomVM supports integers up to 256 bits with an additional sign bit stored outside the numeric
+payload. The representation strategy depends on the integer's size and uses canonicalization to
+ensure each value has exactly one representation.
+
+#### Immediate Integers
+
+Small integers are represented as a single word, with the low-order 4 bits having the value `0xF`
+(`1111b`).  The high order word-size-4 bits are used to represent the integer value using two's
+complement:
 
                                 |< 4>|
     +===========================+====+
@@ -189,11 +197,13 @@ An integer is represented as a single word, with the low-order 4 bits having the
     |                                |
     |<---------- word-size --------->|
 
-The magnitude of an integer is therefore limited to `2^{word-size - 4}` in an AtomVM program (e.g., on a 32-bit platform, `+- 134,217,728`).
+On 32-bit systems, immediate integers can represent signed values in the range `[-2^27, 2^27-1]` (28
+bits + 4-bit tag = 32 bits).
+On 64-bit systems, immediate integers can represent signed values in the range `[-2^59, 2^59-1]` (60
+bits + 4-bit tag = 64 bits).
 
-```{attention}
-Arbitrarily large integers (bignums) are not currently supported in AtomVM.
-```
+For integers outside these ranges, AtomVM uses boxed representations (see Boxed Integers section
+below).
 
 ### nil
 
@@ -242,6 +252,88 @@ A boxed term pointer is a single-word term that contains the address of the refe
 
 Because terms (and hence the heap) are always aligned on boundaries that are divisible by the word size, the low-order 2 bits of a term address are always 0.  Consequently, the high-order word-size - 2 (`1,073,741,824`, on a 32-bit platform) are sufficient to address any term address in the AtomVM address space, for 32-bit and greater machine architectures.
 
+### Boxed Integers
+
+AtomVM uses boxed integers for values that exceed the immediate integer range. There are two types
+of boxed integer representations: native integers (using int32_t or int64_t) and big integers (using
+arrays of uint32_t digits).
+
+#### Native Boxed Integers
+
+For integers that don't fit in immediate representation but can be stored in native C integer
+types, AtomVM uses boxed integers with two's complement encoding and a redundant sign bit in the
+header.
+
+**On 32-bit systems:**
+- Integers in range `[-2^31, -2^27-1] ∪ [2^27, 2^31-1]` are stored as boxed int32_t (single word
+payload)
+- Integers in range `[-2^63, -2^31-1] ∪ [2^31, 2^63-1]` are stored as boxed int64_t (two word
+payload)
+
+**On 64-bit systems:**
+- Integers in range `[-2^63, -2^59-1] ∪ [2^59, 2^63-1]` are stored as boxed int64_t (single word
+payload)
+
+The boxed header uses:
+- `0x8` (`001000b`) for positive integers (TERM_BOXED_POSITIVE_INTEGER)
+- `0xC` (`001100b`) for negative integers (TERM_BOXED_NEGATIVE_INTEGER)
+
+                              |< 6  >|
+    +=========================+======+
+    |    boxed-size (1 or 2)  |001X00| boxed[0] (X=0 for positive, X=1 for negative)
+    +-------------------------+------+
+    |     native integer value       | boxed[1] (int32_t or int64_t low word)
+    +--------------------------------+
+    |     high word (if int64_t on   | boxed[2] (32-bit systems only)
+    |         32-bit system)         |
+    +================================+
+    |                                |
+    |<---------- word-size --------->|
+
+#### Big Integers
+
+For integers beyond the native int64_t range (up to ±(2^256 - 1)), AtomVM uses an array of uint32_t
+digits representing the magnitude, with the sign stored as a flag in the boxed header. These big
+integers do NOT use two's complement encoding.
+
+The digits array:
+- Stores the absolute value of the integer
+- Uses little-endian ordering (digit[0] is least significant)
+- Omits leading zero digits to save space
+- Includes a dummy zero digit when necessary to avoid ambiguity with native boxed integers
+
+                              |< 6  >|
+    +=========================+======+
+    |    boxed-size (n)       |001X00| boxed[0] (X=0 for positive, X=1 for negative)
+    +-------------------------+------+
+    |         digit[0] (lsb)         | boxed[1] (uint32_t)
+    +--------------------------------+
+    |         digit[1]               | boxed[2] (uint32_t)
+    +--------------------------------+
+    |               ...              | ...
+    +--------------------------------+
+    |         digit[k-1] (msb)       | boxed[k] (uint32_t)
+    +--------------------------------+
+    |     0 (dummy digit if needed)  | boxed[n] (uint32_t)
+    +================================+
+    |                                |
+    |<---------- word-size --------->|
+
+**Canonicalization Rules:**
+- AtomVM ensures that integers are always stored in the most compact representation
+- Operations that produce results fitting in a smaller representation automatically convert to that
+representation
+- A dummy digit mechanism ensures that the smallest big integer always has more words than the
+largest native boxed integer. This is required when storing values such as `UINT64_MAX`
+(`0xFFFFFFFFFFFFFFFF`), that would require only 2 digits, but boxed-size field must allow to
+distinguish it from native boxed integers (such as `int64_t`)
+
+**Examples:**
+- The value 3 is always stored as an immediate integer (never as a boxed integer)
+- On a 64-bit system, 2^60 would be stored as a boxed int64_t, not as a big integer
+- The value 2^100 would be stored as a big integer with 4 uint32_t digits (plus potentially a dummy
+digit)
+
 ### References
 
 A reference (e.g., created via [`erlang:make_ref/0`](./apidocs/erlang/estdlib/erlang.md#make_ref0)) stores a 64-bit incrementing counter value (a "ref tick").  On 64 bit machines, a Reference takes up two words -- the boxed header and the 64-bit value, which of course can fit in a single word.  On 32-bit platforms, the high-order 28 bits are stored in `boxed[1]`, and the low-order 32 bits are stored in `boxed[2]`:
@@ -278,7 +370,7 @@ Tuples are represented as boxed terms containing a boxed header (`boxed[0]`), a
 
 ### Maps
 
-Maps are represented as boxed terms containing a boxed header (`boxed[0]`), a type tag of `0x3C` (`111100b`), followed by:
+Maps are represented as boxed terms containing a boxed header (`boxed[0]`), a type tag of `0x2C` (`101100b`), followed by:
 
 * a term pointer to a tuple of arity `n` containing the keys in the map;
 * a sequence of `n`-many words, containing the values of the map corresponding (in order) to the keys in the reference tuple.
@@ -300,7 +392,7 @@ The keys and values are single word terms, i.e., either immediates or pointers t
     |                       ...
     |       |                         |< 6  >|
     |       +=========================+======+
-    |       |    boxed-size (n)       |111100| boxed[0]
+    |       |    boxed-size (n)       |101100| boxed[0]
     |       +-------------------------+------+
     +-----------------<   keys               | boxed[1]
             +--------------------------------+
@@ -446,7 +538,7 @@ to `nil`.
     some
     binary                            |< 6  >|
     ^       +=========================+======+
-    |       |    boxed-size (5)       |100100| boxed[0]
+    |       |    boxed-size (5)       |000100| boxed[0]
     |       +-------------------------+------+
     |       |      match-or-binary-ref       | boxed[1]
     |       +--------------------------------+
@@ -464,15 +556,15 @@ A reference to a reference-counted binary counts as a reference, in which case t
 
 #### Sub-Binaries
 
-Sub-binaries are represented as boxed terms containing a boxed header (`boxed[0]`), a type tag of `0x28` (`001000b`)
+Sub-binaries are represented as boxed terms containing a boxed header (`boxed[0]`), a type tag of `0x28` (`101000b`)
 
 A sub-binary is a boxed term that points to a reference-counted binary, recording the offset into the binary and the length (in bytes) of the sub-binary.  An invariant for this term is that the `offset + length` is always less than or equal to the length of the referenced binary.
 
         some
         refc
         binary                            |< 6  >|
         ^       +=========================+======+
-        |       |    boxed-size (3)       |001000| boxed[0]
+        |       |    boxed-size (3)       |101000| boxed[0]
         |       +-------------------------+------+
         |       |              len               | boxed[1]
         |       +--------------------------------+
@@ -630,6 +722,8 @@ A given process heap and stack occupy a single region of malloc'd memory, and it
 
 Terms stored in the stack, registers, and process dictionary are either single-word terms (like atoms or pids) or term references, i.e., single-word terms that point to boxed terms or list cells in the heap.  These terms constitute the "roots" of the memory graph of all "reachable" terms in the process.
 
+Boxed integers, including both native boxed integers and big integers, are simple blob structures that are copied as-is during garbage collection. They do not contain any pointers or addresses that need to be updated during the garbage collection process.
+
 ### When does garbage collection happen?
 
 Garbage collection typically occurs as the result of a request for an allocation of a multi-word term in the heap (e.g., a tuple, list, or binary, among other types), and when there is currently insufficient space in the free space between the current heap and the current stack to accommodate the allocation.
 
@@ -19,7 +19,7 @@ Currently, AtomVM implements a strict subset of the BEAM instruction set.
 A high level overview of the supported language features include:
 
 * All the major Erlang types, including
-  * integers (with size limits)
+  * integers (integers with 256-bit magnitude plus separate sign)
   * floats
   * tuples
   * [lists](./apidocs/erlang/estdlib/lists.md)
@@ -740,7 +740,7 @@ The following Erlang type specification enumerates this type:
 Erlang/OTP uses the Christian epoch to count time units from year 0 in the Gregorian calendar.  The, for example, the value 0 in Gregorian seconds represents the date Jan 1, year 0, and midnight (UTC), or in Erlang terms, `{{0, 1, 1}, {0, 0, 0}}`.
 
 ```{attention}
-AtomVM is currently limited to representing integers in at most 64 bits, with one bit representing the sign bit.
+AtomVM is currently limited to representing time in at most 64 bits, with one bit representing the sign bit.
 However, even with this limitation, AtomVM is able to resolve microsecond values in the Gregorian calendar for over
 292,000 years, likely well past the likely lifetime of an AtomVM application (unless perhaps launched on a deep
 space probe).
 
@@ -20,6 +20,10 @@
 
 -define(JIT_FORMAT_VERSION, 1).
 
+% Before adding any new platform to the list below:
+% Is it 64-bit big endian? if so, `put_digits` function in jit.erl must be updated to support
+% big endian platforms.
+
 -define(JIT_ARCH_X86_64, 1).
 -define(JIT_ARCH_AARCH64, 2).
 -define(JIT_ARCH_ARMV6M, 3).