Files
2024-08-21 13:00:10 -04:00

492 lines
17 KiB
Markdown

---
title: "Turbo-Basic XL and Atari BASIC parser tool"
author: https://github.com/dmsc/tbxl-parser
book: true
classoption: [oneside]
titlepage: true,
titlepage-text-color: "FFFFFF"
titlepage-rule-color: "FFFFFF"
titlepage-rule-height: 0
titlepage-background: "expr.pdf"
...
# Turbo-Basic XL and Atari BASIC parser tool
This program parses and tokenizes a _Turbo-Basic XL_ or _Atari BASIC_ listing in
a flexible format and produces any of three outputs:
- A tokenized binary file, directly loadable in the original _Turbo-Basic XL_
(or _Atari BASIC_ if the `-A` option is given) interpreter. This mode also
replaces variables with single letters by default, but with the `-f` option
writes the full variable names and with the `-x` option writes empty variable
names, making the program unable to be listed or edited.
This is the default operating mode, and also can be forced with the `-b`
command line switch.
- A minimized listing, replacing variable names with single letters, using
abbreviations, removing spaces and using Atari end of lines.
This mode is selected with the `-s` command line switch. Adding the `-f`
option keeps the names of variables with 2 or less characters.
- A pretty printed expanded listing, with one statement per line and
indentation, and standard ASCII line endings.
Note that this format can be read back again, but some statements are
transformed in the process, this can lead to problems in non-standard
`IF`/`THEN` constructs.
Currently, `IF`/`THEN` with statements after the `THEN` are converted to
multi-line `IF`/`ENDIF` statements.
This mode is selected with the `-l` command line switch.
## Example Programs
The following is an example of a simple program in free form:
```purebasic
' Example program
' One statement per line:
print "Hello All"
print "---------"
print "This is a heart: \00"
' Also, multiple statements per line:
for counter = 0 to 10 : ? "Iter: "; counter : next counter
' Line numbers
30
' And abbreviations:
g. 30
```
To generate a tokenized BAS file, loadable by _Turbo-Basic XL_, simply type:
basicParser samples/sample-1.txt
This will generate a `sample-1.bas` file in the same folder.
If on the other hand you want a minimized listing file in ATASCII format (suitable
for `ENTER` into _Atari BASIC_, type:
basicParser -l -A samples/sample-1.txt
This will generate a `sample-1.lst` file in the same folder.
There are more sample programs, located in the `samples` folder that illustrate
the free-form input format.
## Input listing format
The parser accepts standard listings for _Atari BASIC_ or _Trubo-Basic XL_
programs, with Atari or ASCII end of lines.
All the standard abbreviations available in the original interpreters are also
accepted.
As with _Turbo-Basic XL_, the input is case insensitive (uppercase, lowercase
and mixed case is supported).
### Line numbers
You can omit line numbers, only lines that are target to `GOTO` / `GOSUB` /
`THEN` needs them. If you use only labels, no line numbers are needed.
Also, line numbers can appear alone in a line, for better readability.
### Comments
Comments can be started by `'` in addition to the _Turbo-Basic XL_ `.`, `--`
or `rem`. In short listing an tokenized output formats all comments are
removed unless the `-k` option is given.
All comment types are supported in _Atari BASIC_ mode.
### Special characters inside string constants
Inside strings, special characters can be specified by using a backslash
followed by an hexadecimal number in upper-case, (i.e., `"\00\A0"` produces a
string with a "heart" and an inverse space "♥█"), this allows editing special
characters on any standard editor.
Note that to force a backslash before a valid hex number, you can use two
backslashes (i.e., ``"123\\456"`` produces ``123\456``).
### Extended string constants
There is support for extended strings, with embedded character names.
Extended strings start with with `["` and ends with `"]`, and can contain:
- Special characters with `{name}` or `{count*name}`, with count a decimal
number and name from the list:
`heart`, `rbranch`, `rline`, `tlcorner`, `lbranch`, `blcorner`, `udiag`,
`ddiag`, `rtriangle`, `brblock`, `ltriangle`, `trblock`, `tlblock`,
`tline`, `bline`, `blblock`, `clubs`, `brcorner`, `hline`, `cross`, `ball`,
`bbar`, `lline`, `bbranch`, `tbranch`, `lbar`, `trcorner`, `esc`, `up`,
`down`, `left`, `right`, `diamond`, `spade`, `vline`, `clr`,
`del`, `ins`, `tbar`, `rbar`, `eol`, `bell`.
- Inverse video characters surrounded by `~`.
- Multiple lines, you can terminate the string in a different line than the
start. Note that this will embed end-of-line characters in the string, so it
will only work in tokenized output, not short-listing output.
### Parameters and local variables for `PROC`
Arguments follow the `PROC` label after a comma, and local variables follow
after a semicolon:
```purebasic
D = 3
EXEC Testing, D+5, "Hello"
PRINT D
PROC Testing, A, B$(10); D
D = A + 1
PRINT D; " and "; B$
ENDPROC
```
As the example shows, string variables must include the dimensioned length,
as the parser adds a `DIM` at the start of the program to initialize. The
dimensioned length must be an integer, a `$define` or a `%` number.
Also, setting the value of variable "D" inside the procedure does not alter
the value of the variable "D" outside the procedure.
The parser transform this construct by creating new variables that hold the
parameters and local variables, so the resulting procedures don't support
recursion.
### Syntax from _Turbo-Basic XL_ in _Atari BASIC_
Some of the extra statements from _Turbo-Basic XL_ are supported even in _Atari
BASIC_ output mode, those are converted to equivalent forms:
- Multi-line `IF`/`ENDIF` statements are converted to `IF`/`THEN`.
- The `%0` to `%3` tokens are converted to the numbers 0 to 3.
- `PUT` without I/O channel is converted to `PUT #16`. This relies on a bug
in _Atari BASIC_ that makes I/O channel 16 equal to 0.
- String constants are converted to decimal constants.
### Parsing directives
There are parsing *directives* added, that consist on lines starting with a
dollar sign `$`. A list of available directives is documented bellow.
## Program Usage
basicParser [options] [-o output] filenames
Options:
- `-n nun` Sets the maximum line length before splitting lines to `num`.
Note that if a single statement is longer than this, the line
is output anyway.
The default is 120 characters (the standard Atari Editor limit)
- `-l` Output long (readable) listing, suitable for editing, with standard
end of lines and lowercase statements.
- `-s` Output a short, minimized listing, with ATASCII end of lines. The
default output file name is the same as input with `.lst` extension
added.
- `-b` Output a binary tokenized file instead of a listing. The default
output file name is the same as input with `.bas` extension added. Note
that this is the default behaviour.
- `-A` Accept (and produce) standard _Atari BASIC_ language, without the
extended statements and syntax. Note that some of the optimizations are
specific to _Turbo-Basic XL_ and won't run in this mode.
- `-x` In binary output mode, writes null variable names, making the program
unlistable. This options does nothing on listing output.
- `-f` In binary output mode, writes the full variable names, this eases
debugging the program. In short listing mode, keeps the names of
variables with less than two characters, renaming all longer or invalid
names.
- `-k` In binary output mode, keeps comments in the output. Note that only
standard comments are included, not new style (`'`) comments.
- `-a` In long output, replace Atari characters in comments with
approximating characters.
- `-v` Shows more parsing information, like name of renamed variables.
(verbose mode)
- `-q` Don't show any parsing output, only errors. (quiet mode)
- `-o` Sets the output file name. By default, the output is the name of the
input with `.lst` (listing) or `.bas` (tokenized) extension. If the
given name starts with a dot, use as output file name extension.
- `-c` Output to standard output instead of a file.
- `-O` Enables parser optimizations to produce smaller or faster code. Without
and argument enables all optimizations, an argument can be given
similar to the `optimize` directive in the code, see bellow for the
possible options. The option can be specified multiple times, an
example for producing short listings is `-O -O -convert_percent -O
-const_replace`
- `-h` Shows help and exit.
## Parser directives
Directives add extra features to the parser, much like C and C++. Directives
start with a dollar as the first non blank character on a line, and continue
up to the end of the line.
Bellow is a description of available directives.
### `$options` directive.
The options directive alter the way the parsing is done, accepting a list of
comma separated options, valid for the current file. Valid options:
- `mode=compatible`: Disable features to be more compatible with the
_Turbo-Basic XL_ parser.
- `mode=extended`: Makes the parser to accept more extended features.
- `mode=default`: Returns the parser to the default mode.
- `optimize` or `+optimize`: Allows the parser to optimize the output to
produce smaller or faster code.
- `-optimize`: Disable the optimizations.
- `optimize=+`*suboption*: Enable the particular optimization option.
- `optimize=-`*suboption*: Disable the particular optimization option.
The optimization sub-options are:
- `const_folding`: Replace operations on constants with the result.
- `convert_percent`: Replace small integers with the `%*` equivalent, this is
only available in _Turbo-Basic XL_ mode.
- `commute`: Swap arguments to binary operations to minimize runtime.
- `line_numbers`: Remove all BASIC line numbers that are unused.
- `const_replace`: Replace repeated constant values (numeric or string) with
a variable initialized to the value. The initialization code is added
before any statement in the program, and tries to use the minimum number
of bytes posible.
- `fixed_vars`: This is the complement of the `const_replace` option, tries to
identify variables with a fixed value in the whole program and removes the
variable. Use this optimization when converting original basic listings, as
reversing the constant replacing gives a simpler listing and allows to apply
further optimizations. Note that currently this option can produce bad
results, as it does not follows the program flow and can't detect if a
variable is used before the first assignment, so it is not enabled by
default. You need to check each removed variable, as printed in the output
and in the comments in the resulting program.
- `then_goto`: Searches `IF` statements with `THEN GOTO` and removes the `GOTO`
statement, replacing with the line number alone.
Note: If the line number is not a constant, the resulting program will be
executed and listed correctly by both _Atari BASIC_ and _Turbo-Basic XL_, but
can't be entered because of an original parser limitation. Therefore, this
conversion is only done for constant values when the output is a short listing.
Example: `IF X THEN GOTO 100` becomes `IF X THEN 100`
- `if_goto`: Performs the same optimization as `then_goto`, but also replaces
instances of multi-line `IF` statements containing a `GOTO` with `THEN` and
the target line number.
This optimization is not enabled by default because it can produce larger
code by forcing a newline in the file.
Example:
```
IF X
GOTO 100
ENDIF
```
becomes
```
IF X THEN 100
```
Note that options can be changed at any place in the file, this is an example
of changing the parser mode in the middle of the file:
```purebasic
' Example program using directives
$ options optimize, mode=default
error1 = 2
? error1 : ' This is parsed like Turbo-Basic XL, as ? ERR OR 1
$options mode = extended
? error1 : ' This is parsed as ? error1
Printa : ' This is a parsing error.
```
A good optimization mode for producing short listings is:
```
$options +optimize, optimize=-convert_percent-const_replace
```
The above line instructs the parser to avoid converting numbers to `%` values
and the replacement of constants, producing a smaller listing. Note that
replacement of constants can be beneficial, so try enabling the optimization
and running with "-v" option to see what variables are good candidates for
replacement.
### `$define` directive.
This directive defines new symbols that are replaced at parsing time with the
values, like C macros.
Replacement names are prefixed by `@` to differentiate from variables, and
as variables, string defines end in `$`, the syntax of the directive is:
`$define` *defineName* `=` *value*
Keep in mind that as the value is replaced each time the variable is used, it
is probably best to assign them to a variable instead if the value will be
used multiple times, and you should enable optimizations so that the usage is
simplified at parsing time.
This is an example usage of the `$define` directive:
```purebasic
' Example usage of defines
$options +optimize
$define Message$ = "Hello world!"
$define PCOLR0 = $2C0
print @Message$ : ' Replaced by: ? "Hello world!"
print len(@Message$) : ' Replaced by: ? 12
poke @PCOLR0+2, $1F : ' Replaced by: POKE 706,31
```
### `$incbin` directive.
This directive allows including data from a binary file to a new string
definition. The content of the file is read at parsing time and the full
content is stored in the define. The syntax of the directive is:
`$incbin` *defineName$* `, "`*fileName*`"` [ , *offset* [, *length* ] ]
The optional *offset* parameter specifies a starting offset in bytes for
the included data, and the optional *length* parameter specifies the number
of bytes to read. If *length* is not given, the file read completely.
This is an example usage of the `$incbin` directive:
```purebasic
$options +optimize
$incbin asmBin$, "myasm.bin"
asmRut = adr( @asmBin$ ) : ' Store address in variable to use multiple times.
? usr(asmRut, 1, 2) : ' Call routine. Should be relocatable and less than 242 bytes.
```
### `$incdata` directive.
This directive allows including data from a binary file to a `DATA` BASIC
statement. The content of the file is read at parsing time and the full content
is stored as is. The syntax of the directive is:
`$incdata` `"`*fileName*`"` [ , *offset* [, *length* ] ]
The optional *offset* parameter specifies a starting offset in bytes for the
included data, and the optional *length* parameter specifies the number of
bytes to read. If *length* is not given, the file read completely.
Note that you can use this directive to store arbitrary bytes inside the
statement, but BASIC parses the actual data at `READ` time.
## Limitations and Incompatibilities
There are some incompatibilities in the way the source is interpreted with the
standard _Turbo-Basic XL_ and _Atari BASIC_ parsers:
- The ASCII LF character (hexadecimal $10) is interpreted as end of line in
addition to the ATASCI EOL (hexadecimal $9B). This means that in `DATA`
statements and comments the LF character is not accepted.
- The parsing of special characters inside strings means that a valid hexadecimal
sequence (`\**`, with `*` an hexadecimal number in uppercase) or two backslashes
are interpreted differently.
- Extra statements after an `IF`/`THEN`/`LineNumber` are converted to a comment,
with the exception of `DATA` statements.
In the original, those statements are never executed, so this is not a problem
with proper code.
- Any string is accepted as a variable name, even if it is already an statement,
function name or operator.
The following code is valid:
```purebasic
PRINTED = 0 : ' Invalid in Atari BASIC, as starts with "PRINT"
DONE = 3 : ' Invalid in Turbo-Basic XL, as starts with "DO"
```
This relaxed handling of variable naming creates an incompatibility, as the
first example above is parsed differently as the standard _Atari BASIC_,
where it means "`PRINT (ED = 0)`" instead of "`LET PRINTED = 0`".
Note that currently, even full statements are accepted as variable names,
but avoid using them as they could produce hard to understand errors.
- In long format listing output, `IF`/`THEN` are converted to `IF`/`ENDIF`
statements. This introduces an incompatibility with the following code:
```purebasic
FOR A = 0 TO 2
? "A="; A; " - ";
IF A <> 0
? "1";
IF A = 1 THEN ELSE
? "2";
ENDIF
? " -"
NEXT A
```
This code should produce the following at output:
```
A=0 - 2 -
A=1 - 1 -
A=2 - 12 -
```
After conversion, the `ELSE` is associated with the second `IF` instead
of the first, giving the wrong result.
- Parsing of `TIME$=` statement allows a space between `TIME$` and the equals
sign, but in _Turbo-Basic XL_ this gives an error.
## Compilation
To compile from source, you need `gawk` and `peg`, both are available in any
recent Debian or Ubuntu Linux distro, install with:
apt-get install gawk peg
To compile, simply type `make` in the sources folder, a folder `build` will be
created with the executable program inside.