% $Id: gf.tex,v 3.0.1.2 1991/08/08 15:56:31 schrod Exp schrod $
%------------------------------------------------------------
% taken from GFtype 3.0

%
% definition of GF format
%	to be included
%
% [LaTeX with fileform]


% $Log: gf.tex,v $
% Revision 3.0.1.2  1991/08/08  15:56:31  schrod
% CHANGES BY DON HOSEK:
%  -- Inserted \subsection's.
%  -- Deleted WEB defines.
%  -- `e.g.' now in italics, to be consistent with the rest of the
%     standard.
%
% CHANGES BY JOACHIM SCHROD:
%  -- Changed \bigbreak between WEB sections to \medbreak.
%  -- Added + signs to length specifications in \cmd tags, to show that
%     the param is signed.
%  -- Make formulas look more `math-like' and less `Pascal-like.'
%
% Revision 3.0.1.1  1990/07/16  00:00:00  schrod
% changed \& to \res.
% appended \endinput.
%
% Revision 3.0  90/07/04  00:00:00  schrod
% extracted from GFtype 3.0
% 


\section{Generic Font File Format}
\label{gf-format}

\subsection{Introduction}

The most important output produced by a typical run of \MF\ is the
``generic font'' (\str{GF}) file that specifies the bit patterns of
the characters that have been drawn. The term {\sl generic\/}
indicates that this file format doesn't match the conventions of any
name-brand manufacturer; but it is easy to convert \str{GF} files to
the special format required by almost all digital phototypesetting
equipment. There's a strong analogy between the \str{DVI} files
written by \TeX\ and the \str{GF} files written by \MF; and, in fact,
the file formats have a lot in common.

A \str{GF} file is a stream of 8-bit bytes that may be regarded as a
series of commands in a machine-like language. The first byte of each
command is the operation code, and this code is followed by zero or
more bytes that provide parameters to the command. The parameters
themselves may consist of several consecutive bytes; for example, the
`\id{boc}' (beginning of character) command has six parameters, each
of which is four bytes long. Parameters are usually regarded as
nonnegative integers; but four-byte-long parameters can be either
positive or negative, hence they range in value from $-2^{31}$ to
$2^{31}-1$. As in \str{TFM} files, numbers that occupy more than one
byte position appear in BigEndian order, and negative numbers appear
in two's complement notation.

A \str{GF} file consists of a ``preamble,'' followed by a sequence of
one or more ``characters,'' followed by a ``postamble.'' The preamble
is simply a \id{pre} command, with its parameters that introduce the
file; this must come first.  Each ``character'' consists of a
\id{boc} command, followed by any number of other commands that
specify ``black'' pixels, followed by an \id{eoc} command. The
characters appear in the order that \MF\ generated them. If we ignore
no-op commands (which are allowed between any two commands in the
file), each \id{eoc} command is immediately followed by a \id{boc}
command, or by a \id{post} command; in the latter case, there are no
more characters in the file, and the remaining bytes form the
postamble. Further details about the postamble will be explained
later.

Some parameters in \str{GF} commands are ``pointers.'' These are
four-byte quantities that give the location number of some other byte
in the file; the first file byte is number~0, then comes number~1,
and so on.

\medbreak

The \str{GF} format is intended to be both compact and easily
interpreted by a machine. Compactness is achieved by making most of
the information relative instead of absolute. When a \str{GF}-reading
program reads the commands for a character, it keeps track of two
quantities: (a)~the current column number,~$m$; and (b)~the current
row number,~$n$.  These are 32-bit signed integers, although most
actual font formats produced from \str{GF} files will need to curtail
this vast range because of practical limitations. (\MF\ output will
never allow $\vert m\vert$ or $\vert n\vert$ to get extremely large,
but the \str{GF} format tries to be more general.)

How do \str{GF}'s row and column numbers correspond to the
conventions of \TeX\ and \MF? Well, the ``reference point'' of a
character, in \TeX's view, is considered to be at the lower left
corner of the pixel in row~0 and column~0. This point is the
intersection of the baseline with the left edge of the type; it
corresponds to location $(0,0)$ in \MF\ programs. Thus the pixel in
\str{GF} row~0 and column~0 is \MF's unit square, comprising the
region of the plane whose coordinates both lie between 0 and~1. The
pixel in \str{GF} row~$n$ and column~$m$ consists of the points whose
\MF\ coordinates $(x,y)$ satisfy $m\le x\le m+1$ and $n\le y\le n+1$.
Negative values of $m$ and~$x$ correspond to columns of pixels {\sl
left\/} of the reference point; negative values of $n$ and~$y$
correspond to rows of pixels {\sl below\/} the baseline.

Besides $m$ and $n$, there's also a third aspect of the current
state, namely the \id{paint\_switch}, which is always either
\id{black} or \id{white}. Each \id{paint} command advances $m$ by a
specified amount~$d$, and blackens the intervening pixels if
$\id{paint\_switch}=\id{black}$; then the \id{paint\_switch} changes
to the opposite state. \str{GF}'s commands are designed so that $m$
will never decrease within a row, and $n$ will never increase within
a character; hence there is no way to whiten a pixel that has been
blackened.


\subsection{Summary of {\tt GF} commands}

Here is a list of all the commands that may appear in a \str{GF}
file. Each command is specified by its symbolic name ({\it e.g.},
\id{boc}), its opcode byte ({\it e.g.}, 67), and its parameters (if any).
The parameters are followed by a bracketed number telling how many
bytes they occupy; for example, `$d[2]$' means that parameter $d$ is
two bytes long.

\cmd \id{paint\_0} 0,.
 This is a \id{paint} command with $d=0$; it does nothing but change
the \id{paint\_switch} from \id{black} to \id{white} or vice~versa.

\cmd \id{paint\_1} through \id{paint\_63} (opcodes 1 to 63),.
 These are \id{paint} commands with $d=1$ to~63, defined as follows:
If $\id{paint\_switch}=\id{black}$, blacken $d$~pixels of the current
row~$n$, in columns $m$ through $m+d-1$ inclusive. Then, in any case,
complement the \id{paint\_switch} and advance $m$ by~$d$.

\cmd \id{paint1} 64, d[1].
 This is a \id{paint} command with a specified value of~$d$; \MF\
uses it to paint when $64\le d<256$.

\cmd \id{paint2} 65, d[2].
 Same as \id{paint1}, but $d$~can be as high as~65535.

\cmd \id{paint3} 66, d[3].
 Same as \id{paint1}, but $d$~can be as high as $2^{24}-1$. \MF\
never needs this command, and it is hard to imagine anybody making
practical use of it; surely a more compact encoding will be desirable
when characters can be this large. But the command is there, anyway,
just in case.

\cmd \id{boc} 67, c[+4] p[+4] \id{min\_m}[+4] \id{max\_m}[+4]
 \id{min\_n}[+4] \id{max\_n}[+4].
 Beginning of a character: Here $c$ is the character code, and $p$
points to the previous character beginning (if any) for characters
having this code number modulo 256. (The pointer $p$ is $-1$ if
there was no prior character with an equivalent code.) The values of
registers $m$ and $n$ defined by the instructions that follow for
this character must satisfy $\id{min\_m}\le m\le \id{max\_m}$ and
$\id{min\_n}\le n\le \id{max\_n}$. (The values of \id{max\_m} and
\id{min\_n} need not be the tightest bounds possible.) When a
\str{GF}-reading program sees a \id{boc}, it can use \id{min\_m},
\id{max\_m}, \id{min\_n}, and \id{max\_n} to initialize the bounds of
an array. Then it sets $m\gets \id{min\_m}$, $n\gets \id{max\_n}$,
and $\id{paint\_switch}\gets \id{white}$.

\cmd \id{boc1} 68, c[1] \id{del\_m}[1] \id{max\_m}[1] \id{del\_n}[1]
 \id{max\_n}[1].
 Same as \id{boc}, but $p$ is assumed to be~$-1$; also
$\id{del\_m}=\id{max\_m}-\id{min\_m}$ and
$\id{del\_n}=\id{max\_n}-\id{min\_n}$ are given instead of
\id{min\_m} and \id{min\_n}. The one-byte parameters must be between
0 and 255, inclusive. \ (This abbreviated \id{boc} saves 19~bytes per
character, in common cases.)

\cmd \id{eoc} 69,.
 End of character: All pixels blackened so far constitute the pattern
for this character. In particular, a completely blank character might
have \id{eoc} immediately following \id{boc}.

\cmd \id{skip0} 70,.
 Decrease $n$ by 1 and set $m\gets \id{min\_m}$,
$\id{paint\_switch}\gets \id{white}$. \ (This finishes one row and
begins another, ready to whiten the leftmost pixel in the new row.)

\cmd \id{skip1} 71, d[1].
 Decrease $n$ by $d+1$, set $m\gets \id{min\_m}$, and set
$\id{paint\_switch}\gets \id{white}$. This is a way to produce $d$
all-white rows.

\cmd \id{skip2} 72, d[2].
 Same as \id{skip1}, but $d$ can be as large as 65535.

\cmd \id{skip3} 73, d[3].
 Same as \id{skip1}, but $d$ can be as large as $2^{24}-1$. \MF\
obviously never needs this command.

\cmd \id{new\_row\_0} 74,.
 Decrease $n$ by 1 and set $m\gets \id{min\_m}$,
$\id{paint\_switch}\gets \id{black}$. \ (This finishes one row and
begins another, ready to {\sl blacken\/} the leftmost pixel in the
new row.)

\cmd \id{new\_row\_1} through \id{new\_row\_164} (opcodes 75 to 238),.
 Same as \id{new\_row\_0}, but with $m\gets \id{min\_m}+1$ through
$\id{min\_m}+164$, respectively.

\cmd \id{xxx1} 239, k[1] x[k].
 This command is undefined in general; it functions as a $(k+2)$-byte
\id{no\_op} unless special \str{GF}-reading programs are being used.
\MF\ generates \id{xxx} commands when encountering a \res{special}
string; this occurs in the \str{GF} file only between characters,
after the preamble, and before the postamble. However, \id{xxx}
commands might appear anywhere in \str{GF} files generated by other
processors. It is recommended that $x$ be a string having the form of
a keyword followed by possible parameters relevant to that keyword.

\cmd \id{xxx2} 240, k[2] x[k].
 Like \id{xxx1}, but $0\le k<65536$.

\cmd \id{xxx3} 241, k[3] x[k].
 Like \id{xxx1}, but $0\le k<2^{24}$. \MF\ uses this when sending a
\res{special} string whose length exceeds~255.

\cmd \id{xxx4} 242, k[4] x[k].
 Like \id{xxx1}, but $k$ can be ridiculously large; $k$ mustn't be
negative.

\cmd \id{yyy} 243, y[+4].
 This command is undefined in general; it functions as a 5-byte
\id{no\_op} unless special \str{GF}-reading programs are being used.
\MF\ puts \id{scaled} numbers into \id{yyy}'s, as a result of
\res{numspecial} commands; the intent is to provide numeric parameters
to \id{xxx} commands that immediately precede.

\cmd \id{no\_op} 244,.
 No operation, do nothing. Any number of \id{no\_op}'s may occur
between \str{GF} commands, but a \id{no\_op} cannot be inserted
between a command and its parameters or between two parameters.

\cmd \id{char\_loc} 245, c[1] \id{dx}[+4] \id{dy}[+4] w[+4] p[+4].
 This command will appear only in the postamble, which will be
explained shortly.

\cmd \id{char\_loc0} 246, c[1] \id{dm}[1] w[+4] p[+4].
 Same as \id{char\_loc}, except that \id{dy} is assumed to be zero,
and the value of~\id{dx} is taken to be $65536\ast\id{dm}$, where
$0\le \id{dm}<256$.

\cmd \id{pre} 247, i[1] k[1] x[k].
 Beginning of the preamble; this must come at the very beginning of
the file. Parameter $i$ is an identifying number for \str{GF} format,
currently 131. The other information is merely commentary; it is not
given special interpretation like \id{xxx} commands are. (Note that
\id{xxx} commands may immediately follow the preamble, before the
first \id{boc}.)

\cmd \id{post} 248,.
 Beginning of the postamble, see below.

\cmd \id{post\_post} 249,.
 Ending of the postamble, see below.

\smallskip

\noindent Commands 250--255 are undefined at the present time.


\subsection{The postamble}

The last character in a \str{GF} file is followed by `\id{post}';
this command introduces the postamble, which summarizes important
facts that \MF\ has accumulated. The postamble has the form
 %
 \begin{center}
 \begin{tabular}{l}
   $\id{post}\ p[4]\ \id{ds}[4]\ \id{cs}[4]\ \id{hppp}[4]\ \id{vppp}[4]\ 
         \id{min\_m}[+4]$\\
   \qquad $\id{max\_m}[+4]\ \id{min\_n}[+4]\ \id{max\_n}[+4]$\\
   $\langle\,$character locators$\,\rangle$\\
   $\id{post\_post}\ q[4]\ i[1]\ \hbox{223's}[\ge 4]$\\
 \end{tabular}
 \end{center}
%
 Here $p$ is a pointer to the byte following the final \id{eoc} in
the file (or to the byte following the preamble, if there are no
characters); it can be used to locate the beginning of \id{xxx}
commands that might have preceded the postamble. The \id{ds} and
\id{cs} parameters give the design size and check sum, respectively,
which are exactly the values put into the header of any \str{TFM}
file that shares information with this \str{GF} file. Parameters
\id{hppp} and \id{vppp} are the ratios of pixels per point,
horizontally and vertically, expressed as \id{scaled} integers (i.e.,
multiplied by $2^{16}$); they can be used to correlate the font with
specific device resolutions, magnifications, and ``at sizes.''  Then
come \id{min\_m}, \id{max\_m}, \id{min\_n}, and \id{max\_n}, which
bound the values that registers $m$ and~$n$ assume in all characters
in this \str{GF} file. (These bounds need not be the best possible;
\id{max\_m} and \id{min\_n} may, on the other hand, be tighter than
the similar bounds in \id{boc} commands. For example, some character
may have $\id{min\_n}=-100$ in its \id{boc}, but it might turn out
that $n$ never gets lower than $-50$ in any character; then
\id{min\_n} can have any value $\le -50$. If there are no characters
in the file, it's possible to have $\id{min\_m}>\id{max\_m}$ and/or
$\id{min\_n}>\id{max\_n}$.)

\medbreak

Character locators are introduced by \id{char\_loc} commands, which
specify a character residue~$c$, character escapements
($\id{dx},\id{dy}$), a character width~$w$, and a pointer~$p$ to the
beginning of that character. (If two or more characters have the same
code~$c$ modulo 256, only the last will be indicated; the others can
be located by following backpointers. Characters whose codes differ
by a multiple of 256 are assumed to share the same font metric
information, hence the \str{TFM} file contains only residues of
character codes modulo~256. This convention is intended for oriental
languages, when there are many character shapes but few distinct
widths.)

The character escapements ($\id{dx},\id{dy}$) are the values of \MF's
\res{chardx} and \res{chardy} parameters; they are in units of
\id{scaled} pixels; i.e., \id{dx} is in horizontal pixel units times
$2^{16}$, and \id{dy} is in vertical pixel units times $2^{16}$. 
This is the intended amount of displacement after typesetting the
character; for \str{DVI} files, \id{dy} should be zero, but other
document file formats allow nonzero vertical escapement.

The character width~$w$ duplicates the information in the \str{TFM}
file; it is $2^{24}$ times the ratio of the true width to the font's
design size.

The backpointer $p$ points to the character's \id{boc}, or to the
first of a sequence of consecutive \id{xxx} or \id{yyy} or
\id{no\_op} commands that immediately precede the \id{boc}, if such
commands exist; such ``special'' commands essentially belong to the
characters, while the special commands after the final character
belong to the postamble (i.e., to the font as a whole). This
convention about $p$ applies also to the backpointers in \id{boc}
commands, even though it wasn't explained in the description
of~\id{boc}.

Pointer $p$ might be $-1$ if the character exists in the \str{TFM}
file but not in the \str{GF} file. This unusual situation can arise
in \MF\ output if the user had $\id{proofing}<0$ when the character
was being shipped out, but then made $\id{proofing}\ge 0$ in order to
get a \str{GF} file.

\medbreak

The last part of the postamble, following the \id{post\_post} byte
that signifies the end of the character locators, contains $q$, a
pointer to the \id{post} command that started the postamble.  An
identification byte, $i$, comes next; this currently equals~131, as
in the preamble.

The $i$ byte is followed by four or more bytes that are all equal to
the decimal number 223 (i.e., \H{DF} in hexadecimal). \MF\ puts out
four to seven of these trailing bytes, until the total length of the
file is a multiple of four bytes, since this works out best on
machines that pack four bytes per word; but any number of 223's is
allowed, as long as there are at least four of them. In effect, 223
is a sort of signature that is added at the very end.

This curious way to finish off a \str{GF} file makes it feasible for
\str{GF}-reading programs to find the postamble first, on most
computers, even though \MF\ wants to write the postamble last. Most
operating systems permit random access to individual words or bytes
of a file, so the \str{GF} reader can start at the end and skip
backwards over the 223's until finding the identification byte. Then
it can back up four bytes, read $q$, and move to byte $q$ of the
file. This byte should, of course, contain the value 248 (\id{post});
now the postamble can be read, so the \str{GF} reader can discover
all the information needed for individual characters.


\endinput