Skip to content

Commit 8913eea

Browse files
committed
Typeset N* as N$^\star$
1 parent 45ae129 commit 8913eea

File tree

1 file changed

+31
-31
lines changed

1 file changed

+31
-31
lines changed

part2-nstar.tex

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,12 @@ \chapter{Introduction}\label{chap:nstar-abstract}
1414

1515
\vspace{\baselineskip}
1616

17-
N*'s goal is to assist users with a simple but powerful type system, yet still allowing for low-level data manipulation.
18-
But before even being a usable programming language, N* aims at being a compiler backend (much like for example LLVM), and is used that way in the Zilch project. Differences with other compiler backends are mostly the type-system, allowing the compilation of Zilch source code into type-safe instructions.
17+
N$^\star$'s goal is to assist users with a simple but powerful type system, yet still allowing for low-level data manipulation.
18+
But before even being a usable programming language, N$^\star$ aims at being a compiler backend (much like for example LLVM), and is used that way in the Zilch project. Differences with other compiler backends are mostly the type-system, allowing the compilation of Zilch source code into type-safe instructions.
1919

2020
\vspace{\baselineskip}
2121

22-
Because N* supports compiling to multiple architectures, using different grammars, describing N* will at first be platform-agnostic, treating common aspects between all CPU architectures, and then will be divided into multiple categories, explaining in more details some features on a per-architecture basis\footnote{Note that the target executable format (ELF, PE, \ldots) is also considered as an architecture-specific thing, but should not influence N* much.}.
22+
Because N$^\star$ supports compiling to multiple architectures, using different grammars, describing N$^\star$ will at first be platform-agnostic, treating common aspects between all CPU architectures, and then will be divided into multiple categories, explaining in more details some features on a per-architecture basis\footnote{Note that the target executable format (ELF, PE, \ldots) is also considered as an architecture-specific thing, but should not influence N$^\star$ much.}.
2323

2424
\section*{Small notes on the notation used in the next chapters}\label{sec:nstar-abstract-notation}
2525

@@ -60,17 +60,17 @@ \chapter{Non platform-specific features}\label{chap:nstar-common}
6060

6161
\section{Types}\label{sec:nstar-common-ts}
6262

63-
One of the differences between classical assembly languages and N* is its type system.
64-
Compared to other higher level programming languages like Java, C++, etc, N* has a very simple yet powerful and expressive enough type system.
63+
One of the differences between classical assembly languages and N$^\star$ is its type system.
64+
Compared to other higher level programming languages like Java, C++, etc, N$^\star$ has a very simple yet powerful and expressive enough type system.
6565

6666
In programming, types are used mostly to prove at compile-time that a given program should behave well if it type-checks. While this works for more elaborated programming languages like Haskell, Idris, etc, most type systems aren't expressive enough to absolutely guarantee that everything will work at run-time (in fact, there is no possible way of doing this, because for example a memory allocation may fail, and this cannot be checked at compile-time). However, we can try to guarantee as much as possible.
6767

68-
N* doesn't try to solve this issue, because it would be really hard to target a dependently typed assembly language from a non-dependently typed programming language. But where all used assembly languages do not even consider types (only numbers, in fact), N* embeds a powerful type system used to remove the possibility of bugs (like incorrect structure addresses passed as a parameter function, or incoherent types in some instructions).
68+
N$^\star$ doesn't try to solve this issue, because it would be really hard to target a dependently typed assembly language from a non-dependently typed programming language. But where all used assembly languages do not even consider types (only numbers, in fact), N$^\star$ embeds a powerful type system used to remove the possibility of bugs (like incorrect structure addresses passed as a parameter function, or incoherent types in some instructions).
6969

7070
\subsection{Kinds}\label{subsec:nstar-common-ts-kinds}
7171

7272
Kinds (also known as types of types) mainly serve the purpose of indicating type sizes.
73-
There are three type of kinds in N*:
73+
There are three type of kinds in N$^\star$:
7474
\begin{itemize}
7575
\item Stack kind, denotating stack-like types, which can be safely offsetted
7676
\item Kinds whose size is abstracted away, useful to ask for any sized type
@@ -91,7 +91,7 @@ \subsection{Kinds}\label{subsec:nstar-common-ts-kinds}
9191
\subsection{Integer types}\label{subsec:nstar-common-ts-integer}
9292

9393
Numbers are the building block of any assembly language. Most of data manipulated is manipulated as numbers, e.g.\ addresses, characters, strings, enumerations, etc.
94-
This is not the case in N*, where ``integer'' only really means ``number''.
94+
This is not the case in N$^\star$, where ``integer'' only really means ``number''.
9595
The syntax for the types of integers is given in Figure~\ref{fig:nstar-common-ts-integer-syntax}.
9696

9797
\begin{figure}[htb]
@@ -105,7 +105,7 @@ \subsection{Integer types}\label{subsec:nstar-common-ts-integer}
105105

106106
Integers have two varying parameters: their sign and sizes.
107107
According to the sign (i.e.\ signed or unsigned), some operations may not perform the same (for example \texttt{mul} does not behave the same).
108-
The size is nothing more than the number of bits occupied by the integer (in N*, those are restricted to powers of 2 between $8$ and $64$ included).
108+
The size is nothing more than the number of bits occupied by the integer (in N$^\star$, those are restricted to powers of 2 between $8$ and $64$ included).
109109
Most operations should perform the same no matter the integer size, however it is recommended to search in the target architecture manual for further reference.
110110

111111
Kinds of integers are written under the form of inference rules in Figure~\ref{fig:nstar-common-ts-integer-kindrules}.
@@ -168,7 +168,7 @@ \subsection{Other atomic types}\label{subsec:nstar-common-ts-otheratomic}
168168
In all assembly languages, characters are merely syntactic sugar for their ASCII code. This is how their are put in the machine code anyway, so it is not a huge problem (it might even not be at all).
169169

170170
Pointers, on the other hand, are unabstracted memory addresses.
171-
In N*, there are two types of pointers: data pointers and stack pointers.
171+
In N$^\star$, there are two types of pointers: data pointers and stack pointers.
172172
Stack pointers are covered in Subsection~\ref{subsec:nstar-common-ts-stack}~``\nameref{subsec:nstar-common-ts-stack}''.
173173
Data pointers simply represent an address where we know (or not, see the Subsection~\ref{subsec:nstar-common-unsafe-derefliteraladdr}~``\nameref{subsec:nstar-common-unsafe-derefliteraladdr}'') that there is a value of the given pointed type.
174174

@@ -279,7 +279,7 @@ \subsection{Context types}\label{subsec:nstar-common-ts-records}
279279

280280
\subsection{Stack types}\label{subsec:nstar-common-ts-stack}
281281

282-
There is one stack type in N*: the stack constructor \texttt{::}.
282+
There is one stack type in N$^\star$: the stack constructor \texttt{::}.
283283
Note that there is no ``empty stack'' type as would be the case with e.g.\ lists in Haskell.
284284
The reason is that it forces the developer to abstract the stack to be able to have a ``stack tail'' (the part of the stack on the right of the stack constructor \texttt{::}) at some point.\footnote{It also serves the purpose to ensure that we cannot construct a stack from nothing, and that it should always be given to us, to e.g.\ the \texttt{main} function.}
285285
An example is given in Listing~\ref{lst:nstar-common-ts-records-stackmask}.
@@ -379,7 +379,7 @@ \subsection{Union types}\label{subsec:nstar-common-ts-unions}
379379

380380
\section{Expressions}\label{sec:nstar-common-expressions}
381381

382-
There are four types of expressions in N*: immediate values, data labels, registers and pointer offsets.
382+
There are four types of expressions in N$^\star$: immediate values, data labels, registers and pointer offsets.
383383
All of these can be used as the source operand of the \texttt{mov} instruction.
384384

385385
\subsection{Immediate values}\label{subsec:nstar-common-expressions-immediate}
@@ -397,7 +397,7 @@ \subsection{Immediate values}\label{subsec:nstar-common-expressions-immediate}
397397
\scalebox{.5}{
398398
\includegraphics{nstar/exprs/imm-int-grammar}
399399
}
400-
\caption{Grammar for immediate values in N*.}
400+
\caption{Grammar for immediate values in N$^\star$.}
401401
\label{fig:nstar-common-expressions-immediate-grammar}
402402
\end{figure}
403403

@@ -421,7 +421,7 @@ \subsection{Immediate values}\label{subsec:nstar-common-expressions-immediate}
421421
\infer2{\Xi;\Gamma;\chi;\sigma;\epsilon\vdash^T_{code}$ i : t$}
422422
\end{prooftree}
423423

424-
\caption{Type inference rules for immediate values in N*.}
424+
\caption{Type inference rules for immediate values in N$^\star$.}
425425
\label{fig:nstar-common-expressions-immediate-typerules}
426426
\end{figure}
427427

@@ -430,7 +430,7 @@ \subsection{Labels}\label{subsec:nstar-common-expressions-labels}
430430
\subsubsection{Data labels}\label{subsubsec:nstar-common-expressions-labels-data}
431431

432432
Data labels are pointers to some piece of data (may it even be an array).
433-
They can be dereferenced and offset using the constructs given in N*.
433+
They can be dereferenced and offset using the constructs given in N$^\star$.
434434
Grammar and inference rules are given respectively in Figure~\ref{fig:nstar-common-expressions-labels-data-grammar} and Figure~\ref{fig:nstar-common-expressions-labels-data-typerules}.
435435

436436
\begin{figure}[H]
@@ -439,7 +439,7 @@ \subsubsection{Data labels}\label{subsubsec:nstar-common-expressions-labels-data
439439
\scalebox{.5}{
440440
\includegraphics{nstar/exprs/label-expr-grammar}
441441
}
442-
\caption{Grammar for data labels in N*.}
442+
\caption{Grammar for data labels in N$^\star$.}
443443
\label{fig:nstar-common-expressions-labels-data-grammar}
444444
\end{figure}
445445

@@ -451,7 +451,7 @@ \subsubsection{Data labels}\label{subsubsec:nstar-common-expressions-labels-data
451451
\infer1{\Xi;\Gamma;\chi;\sigma;\epsilon\vdash^T_{code}$ l : *t$}
452452
\end{prooftree}
453453

454-
\caption{Type inference rules for data labels in N*.}
454+
\caption{Type inference rules for data labels in N$^\star$.}
455455
\label{fig:nstar-common-expressions-labels-data-typerules}
456456
\end{figure}
457457

@@ -467,7 +467,7 @@ \subsubsection{Code labels}\label{subsubsec:nstar-common-expressions-labels-code
467467
\scalebox{.5}{
468468
\includegraphics{nstar/exprs/label-value-grammar}
469469
}
470-
\caption{Grammar for code labels in N*.}
470+
\caption{Grammar for code labels in N$^\star$.}
471471
\label{fig:nstar-common-expressions-labels-code-grammar}
472472
\end{figure}
473473

@@ -482,14 +482,14 @@ \subsubsection{Code labels}\label{subsubsec:nstar-common-expressions-labels-code
482482
\infer1{\Xi;\Gamma;\chi;\sigma;\epsilon\vdash^T_{code}$ l<$\vec{s}$> : $\forall$().ctx$}
483483
\end{prooftree}
484484

485-
\caption{Type inference rules for code labels in N*.}
485+
\caption{Type inference rules for code labels in N$^\star$.}
486486
\label{fig:nstar-common-expressions-labels-code-typerules}
487487
\end{figure}
488488

489489
\subsection{Registers}\label{subsec:nstar-common-expressions-registers}
490490

491-
Registers in N* are abstracted away from the target architectures.
492-
Instead of manipulating registers \texttt{\%eax} or \texttt{\%rsp} or \texttt{\$a2}, which all are platform-specific, there are 6 general purpose registers in N*, named \texttt{r0} to \texttt{r5}.
491+
Registers in N$^\star$ are abstracted away from the target architectures.
492+
Instead of manipulating registers \texttt{\%eax} or \texttt{\%rsp} or \texttt{\$a2}, which all are platform-specific, there are 6 general purpose registers in N$^\star$, named \texttt{r0} to \texttt{r5}.
493493
They can be 4 to 8 bytes large depending on the target architecture mappings.
494494

495495
Grammar for all those registers is given in Figure~\ref{fig:nstar-common-expressions-registers-grammar} and inference rules are given in Figure~\ref{fig:nstar-common-expressions-registers-typerules}.
@@ -501,7 +501,7 @@ \subsection{Registers}\label{subsec:nstar-common-expressions-registers}
501501
\includegraphics{nstar/exprs/register-grammar}
502502
}
503503

504-
\caption{Grammar for registers in N*.}
504+
\caption{Grammar for registers in N$^\star$.}
505505
\label{fig:nstar-common-expressions-registers-grammar}
506506
\end{figure}
507507

@@ -512,7 +512,7 @@ \subsection{Registers}\label{subsec:nstar-common-expressions-registers}
512512
\infer0{\Xi;\Gamma;\chi$, r : t$;\sigma;\epsilon\vdash^T_{code}$ r : t$}
513513
\end{prooftree}
514514

515-
\caption{Type inference rules for registers in N*}
515+
\caption{Type inference rules for registers in N$^\star$}
516516
\label{fig:nstar-common-expressions-registers-typerules}
517517
\end{figure}
518518

@@ -568,12 +568,12 @@ \subsection{Pointer offsets}\label{fig:nstar-common-expressions-pointeroffsets}
568568

569569
\section{File sections}\label{sec:nstar-common-sections}
570570

571-
Sections in N* serve the exact same purpose as in other assembly languages. They divide a file into multiple parts depending on what the semantics of the current section is supposed to be (code, data, etc).
571+
Sections in N$^\star$ serve the exact same purpose as in other assembly languages. They divide a file into multiple parts depending on what the semantics of the current section is supposed to be (code, data, etc).
572572
Section names obviously differ from one target format to another. As an example, the ``\texttt{.rela.dyn}'' section from the ELF format may not exist in the PE format.
573573

574-
N* tries to unify target formats section names (simplifying targetting N* as a compiler backend) by having a fixed set of section names, all with different meanings. While you can put anything anywhere in classical assembly languages, this is not the case in N*.
574+
N$^\star$ tries to unify target formats section names (simplifying targetting N$^\star$ as a compiler backend) by having a fixed set of section names, all with different meanings. While you can put anything anywhere in classical assembly languages, this is not the case in N$^\star$.
575575

576-
Sections in N* can be named ``\texttt{data}'', ``\texttt{code}'', ``\texttt{rodata}'', ``\texttt{udata}'' or ``\texttt{extern}''. Each of them has defined semantics as described below.
576+
Sections in N$^\star$ can be named ``\texttt{data}'', ``\texttt{code}'', ``\texttt{rodata}'', ``\texttt{udata}'' or ``\texttt{extern}''. Each of them has defined semantics as described below.
577577

578578
\subsection{The \texttt{code} section}\label{subsec:nstar-common-sections-code}
579579

@@ -734,12 +734,12 @@ \section{Constant values}\label{sec:nstar-common-constvalue}
734734

735735
\section{Restrictions applied to branching}\label{sec:nstar-common-bs-restrictions}
736736

737-
Branching ((un-)\ conditional jumps, calls, etc) is restricted in N* in order to prevent stack leaks, unknown caller address return or even missing return values.
738-
It also is restricted not to integrate some sort of scoping, which would add some (mostly useless) complexity when using/generating N*.
737+
Branching ((un-)\ conditional jumps, calls, etc) is restricted in N$^\star$ in order to prevent stack leaks, unknown caller address return or even missing return values.
738+
It also is restricted not to integrate some sort of scoping, which would add some (mostly useless) complexity when using/generating N$^\star$.
739739

740740
\subsection{Returning to a known code-space address}\label{subsec:nstar-common-bs-restrictions-ret}
741741

742-
N* offers what we call ``continuations'', which are a mean of easily controlling the instruction control flow of the program.
742+
N$^\star$ offers what we call ``continuations'', which are a mean of easily controlling the instruction control flow of the program.
743743
Thanks to that, this is actually easy to check whether we would be returning to some valid piece of code: the continuation part of th context must be bound to something of the form $\forall().ctx$.
744744
When anything else is in the continuation (in fact, it cannot be, because we do not allow overwriting the continuation --- which can be worked around by moving the continuation somewhere else before), it is strictly impossible to either \texttt{call} or \texttt{jmp} to a label, or \texttt{ret} from a piece of code, which would yield a very nasty undefined behavior at runtime in mainstream assembly languages.
745745

@@ -789,7 +789,7 @@ \subsection{Pointer offsetting}\label{subsec:nstar-common-unsafe-ptroffset}
789789
\texttt{Vec<T>} in Rust, \texttt{std::vector<T>} in C++, \texttt{ArrayList<T>} in Java, and many other vector types are also implemented in terms of a pointer to a chunk of memory, but with an added container size, allowing to safely access elements of the vector without going out of bounds.
790790
But in Rust, we still need to have some \texttt{unsafe} blocks in your code, in order to use the container.
791791

792-
There is the same problem in N*.
792+
There is the same problem in N$^\star$.
793793
Because there is no built-in array type, we have to rely on pointers to be able to achieve such thing, therefore needing a way to offset a pointer to access the various elements in the array.
794794
It would also be really hard to manipulate plain array types (because, for example, of the size to store with the type).
795795
As offsetting a pointer can lead to an invalid address dereferencing (or at least dereferencing an address which doesn't belong to the application memory, i.e.\ allocated by another process), it is considered an unsafe operation and therefore needs to be wrapped in an \texttt{unsafe} block.
@@ -1210,7 +1210,7 @@ \subsection{Register mappings}\label{subsec:nstar-specific-x86amd64-registers}
12101210

12111211
\begin{tabularx}{\textwidth}{Y Y Y}
12121212
\toprule
1213-
N* & x86 & amd64 \\
1213+
N$^\star$ & x86 & amd64 \\
12141214
\midrule
12151215
r0 & eax & rax \\
12161216
r1 & ecx & rcx \\

0 commit comments

Comments
 (0)