% $Id: gf.tex,v 3.0.1.2 1991/08/08 15:56:31 schrod Exp schrod $ %------------------------------------------------------------ % taken from GFtype 3.0 % % definition of GF format % to be included % % [LaTeX with fileform] % $Log: gf.tex,v $ % Revision 3.0.1.2 1991/08/08 15:56:31 schrod % CHANGES BY DON HOSEK: % -- Inserted \subsection's. % -- Deleted WEB defines. % -- `e.g.' now in italics, to be consistent with the rest of the % standard. % % CHANGES BY JOACHIM SCHROD: % -- Changed \bigbreak between WEB sections to \medbreak. % -- Added + signs to length specifications in \cmd tags, to show that % the param is signed. % -- Make formulas look more `math-like' and less `Pascal-like.' % % Revision 3.0.1.1 1990/07/16 00:00:00 schrod % changed \& to \res. % appended \endinput. % % Revision 3.0 90/07/04 00:00:00 schrod % extracted from GFtype 3.0 % \section{Generic Font File Format} \label{gf-format} \subsection{Introduction} The most important output produced by a typical run of \MF\ is the ``generic font'' (\str{GF}) file that specifies the bit patterns of the characters that have been drawn. The term {\sl generic\/} indicates that this file format doesn't match the conventions of any name-brand manufacturer; but it is easy to convert \str{GF} files to the special format required by almost all digital phototypesetting equipment. There's a strong analogy between the \str{DVI} files written by \TeX\ and the \str{GF} files written by \MF; and, in fact, the file formats have a lot in common. A \str{GF} file is a stream of 8-bit bytes that may be regarded as a series of commands in a machine-like language. The first byte of each command is the operation code, and this code is followed by zero or more bytes that provide parameters to the command. The parameters themselves may consist of several consecutive bytes; for example, the `\id{boc}' (beginning of character) command has six parameters, each of which is four bytes long. Parameters are usually regarded as nonnegative integers; but four-byte-long parameters can be either positive or negative, hence they range in value from $-2^{31}$ to $2^{31}-1$. As in \str{TFM} files, numbers that occupy more than one byte position appear in BigEndian order, and negative numbers appear in two's complement notation. A \str{GF} file consists of a ``preamble,'' followed by a sequence of one or more ``characters,'' followed by a ``postamble.'' The preamble is simply a \id{pre} command, with its parameters that introduce the file; this must come first. Each ``character'' consists of a \id{boc} command, followed by any number of other commands that specify ``black'' pixels, followed by an \id{eoc} command. The characters appear in the order that \MF\ generated them. If we ignore no-op commands (which are allowed between any two commands in the file), each \id{eoc} command is immediately followed by a \id{boc} command, or by a \id{post} command; in the latter case, there are no more characters in the file, and the remaining bytes form the postamble. Further details about the postamble will be explained later. Some parameters in \str{GF} commands are ``pointers.'' These are four-byte quantities that give the location number of some other byte in the file; the first file byte is number~0, then comes number~1, and so on. \medbreak The \str{GF} format is intended to be both compact and easily interpreted by a machine. Compactness is achieved by making most of the information relative instead of absolute. When a \str{GF}-reading program reads the commands for a character, it keeps track of two quantities: (a)~the current column number,~$m$; and (b)~the current row number,~$n$. These are 32-bit signed integers, although most actual font formats produced from \str{GF} files will need to curtail this vast range because of practical limitations. (\MF\ output will never allow $\vert m\vert$ or $\vert n\vert$ to get extremely large, but the \str{GF} format tries to be more general.) How do \str{GF}'s row and column numbers correspond to the conventions of \TeX\ and \MF? Well, the ``reference point'' of a character, in \TeX's view, is considered to be at the lower left corner of the pixel in row~0 and column~0. This point is the intersection of the baseline with the left edge of the type; it corresponds to location $(0,0)$ in \MF\ programs. Thus the pixel in \str{GF} row~0 and column~0 is \MF's unit square, comprising the region of the plane whose coordinates both lie between 0 and~1. The pixel in \str{GF} row~$n$ and column~$m$ consists of the points whose \MF\ coordinates $(x,y)$ satisfy $m\le x\le m+1$ and $n\le y\le n+1$. Negative values of $m$ and~$x$ correspond to columns of pixels {\sl left\/} of the reference point; negative values of $n$ and~$y$ correspond to rows of pixels {\sl below\/} the baseline. Besides $m$ and $n$, there's also a third aspect of the current state, namely the \id{paint\_switch}, which is always either \id{black} or \id{white}. Each \id{paint} command advances $m$ by a specified amount~$d$, and blackens the intervening pixels if $\id{paint\_switch}=\id{black}$; then the \id{paint\_switch} changes to the opposite state. \str{GF}'s commands are designed so that $m$ will never decrease within a row, and $n$ will never increase within a character; hence there is no way to whiten a pixel that has been blackened. \subsection{Summary of {\tt GF} commands} Here is a list of all the commands that may appear in a \str{GF} file. Each command is specified by its symbolic name ({\it e.g.}, \id{boc}), its opcode byte ({\it e.g.}, 67), and its parameters (if any). The parameters are followed by a bracketed number telling how many bytes they occupy; for example, `$d[2]$' means that parameter $d$ is two bytes long. \cmd \id{paint\_0} 0,. This is a \id{paint} command with $d=0$; it does nothing but change the \id{paint\_switch} from \id{black} to \id{white} or vice~versa. \cmd \id{paint\_1} through \id{paint\_63} (opcodes 1 to 63),. These are \id{paint} commands with $d=1$ to~63, defined as follows: If $\id{paint\_switch}=\id{black}$, blacken $d$~pixels of the current row~$n$, in columns $m$ through $m+d-1$ inclusive. Then, in any case, complement the \id{paint\_switch} and advance $m$ by~$d$. \cmd \id{paint1} 64, d[1]. This is a \id{paint} command with a specified value of~$d$; \MF\ uses it to paint when $64\le d<256$. \cmd \id{paint2} 65, d[2]. Same as \id{paint1}, but $d$~can be as high as~65535. \cmd \id{paint3} 66, d[3]. Same as \id{paint1}, but $d$~can be as high as $2^{24}-1$. \MF\ never needs this command, and it is hard to imagine anybody making practical use of it; surely a more compact encoding will be desirable when characters can be this large. But the command is there, anyway, just in case. \cmd \id{boc} 67, c[+4] p[+4] \id{min\_m}[+4] \id{max\_m}[+4] \id{min\_n}[+4] \id{max\_n}[+4]. Beginning of a character: Here $c$ is the character code, and $p$ points to the previous character beginning (if any) for characters having this code number modulo 256. (The pointer $p$ is $-1$ if there was no prior character with an equivalent code.) The values of registers $m$ and $n$ defined by the instructions that follow for this character must satisfy $\id{min\_m}\le m\le \id{max\_m}$ and $\id{min\_n}\le n\le \id{max\_n}$. (The values of \id{max\_m} and \id{min\_n} need not be the tightest bounds possible.) When a \str{GF}-reading program sees a \id{boc}, it can use \id{min\_m}, \id{max\_m}, \id{min\_n}, and \id{max\_n} to initialize the bounds of an array. Then it sets $m\gets \id{min\_m}$, $n\gets \id{max\_n}$, and $\id{paint\_switch}\gets \id{white}$. \cmd \id{boc1} 68, c[1] \id{del\_m}[1] \id{max\_m}[1] \id{del\_n}[1] \id{max\_n}[1]. Same as \id{boc}, but $p$ is assumed to be~$-1$; also $\id{del\_m}=\id{max\_m}-\id{min\_m}$ and $\id{del\_n}=\id{max\_n}-\id{min\_n}$ are given instead of \id{min\_m} and \id{min\_n}. The one-byte parameters must be between 0 and 255, inclusive. \ (This abbreviated \id{boc} saves 19~bytes per character, in common cases.) \cmd \id{eoc} 69,. End of character: All pixels blackened so far constitute the pattern for this character. In particular, a completely blank character might have \id{eoc} immediately following \id{boc}. \cmd \id{skip0} 70,. Decrease $n$ by 1 and set $m\gets \id{min\_m}$, $\id{paint\_switch}\gets \id{white}$. \ (This finishes one row and begins another, ready to whiten the leftmost pixel in the new row.) \cmd \id{skip1} 71, d[1]. Decrease $n$ by $d+1$, set $m\gets \id{min\_m}$, and set $\id{paint\_switch}\gets \id{white}$. This is a way to produce $d$ all-white rows. \cmd \id{skip2} 72, d[2]. Same as \id{skip1}, but $d$ can be as large as 65535. \cmd \id{skip3} 73, d[3]. Same as \id{skip1}, but $d$ can be as large as $2^{24}-1$. \MF\ obviously never needs this command. \cmd \id{new\_row\_0} 74,. Decrease $n$ by 1 and set $m\gets \id{min\_m}$, $\id{paint\_switch}\gets \id{black}$. \ (This finishes one row and begins another, ready to {\sl blacken\/} the leftmost pixel in the new row.) \cmd \id{new\_row\_1} through \id{new\_row\_164} (opcodes 75 to 238),. Same as \id{new\_row\_0}, but with $m\gets \id{min\_m}+1$ through $\id{min\_m}+164$, respectively. \cmd \id{xxx1} 239, k[1] x[k]. This command is undefined in general; it functions as a $(k+2)$-byte \id{no\_op} unless special \str{GF}-reading programs are being used. \MF\ generates \id{xxx} commands when encountering a \res{special} string; this occurs in the \str{GF} file only between characters, after the preamble, and before the postamble. However, \id{xxx} commands might appear anywhere in \str{GF} files generated by other processors. It is recommended that $x$ be a string having the form of a keyword followed by possible parameters relevant to that keyword. \cmd \id{xxx2} 240, k[2] x[k]. Like \id{xxx1}, but $0\le k<65536$. \cmd \id{xxx3} 241, k[3] x[k]. Like \id{xxx1}, but $0\le k<2^{24}$. \MF\ uses this when sending a \res{special} string whose length exceeds~255. \cmd \id{xxx4} 242, k[4] x[k]. Like \id{xxx1}, but $k$ can be ridiculously large; $k$ mustn't be negative. \cmd \id{yyy} 243, y[+4]. This command is undefined in general; it functions as a 5-byte \id{no\_op} unless special \str{GF}-reading programs are being used. \MF\ puts \id{scaled} numbers into \id{yyy}'s, as a result of \res{numspecial} commands; the intent is to provide numeric parameters to \id{xxx} commands that immediately precede. \cmd \id{no\_op} 244,. No operation, do nothing. Any number of \id{no\_op}'s may occur between \str{GF} commands, but a \id{no\_op} cannot be inserted between a command and its parameters or between two parameters. \cmd \id{char\_loc} 245, c[1] \id{dx}[+4] \id{dy}[+4] w[+4] p[+4]. This command will appear only in the postamble, which will be explained shortly. \cmd \id{char\_loc0} 246, c[1] \id{dm}[1] w[+4] p[+4]. Same as \id{char\_loc}, except that \id{dy} is assumed to be zero, and the value of~\id{dx} is taken to be $65536\ast\id{dm}$, where $0\le \id{dm}<256$. \cmd \id{pre} 247, i[1] k[1] x[k]. Beginning of the preamble; this must come at the very beginning of the file. Parameter $i$ is an identifying number for \str{GF} format, currently 131. The other information is merely commentary; it is not given special interpretation like \id{xxx} commands are. (Note that \id{xxx} commands may immediately follow the preamble, before the first \id{boc}.) \cmd \id{post} 248,. Beginning of the postamble, see below. \cmd \id{post\_post} 249,. Ending of the postamble, see below. \smallskip \noindent Commands 250--255 are undefined at the present time. \subsection{The postamble} The last character in a \str{GF} file is followed by `\id{post}'; this command introduces the postamble, which summarizes important facts that \MF\ has accumulated. The postamble has the form % \begin{center} \begin{tabular}{l} $\id{post}\ p[4]\ \id{ds}[4]\ \id{cs}[4]\ \id{hppp}[4]\ \id{vppp}[4]\ \id{min\_m}[+4]$\\ \qquad $\id{max\_m}[+4]\ \id{min\_n}[+4]\ \id{max\_n}[+4]$\\ $\langle\,$character locators$\,\rangle$\\ $\id{post\_post}\ q[4]\ i[1]\ \hbox{223's}[\ge 4]$\\ \end{tabular} \end{center} % Here $p$ is a pointer to the byte following the final \id{eoc} in the file (or to the byte following the preamble, if there are no characters); it can be used to locate the beginning of \id{xxx} commands that might have preceded the postamble. The \id{ds} and \id{cs} parameters give the design size and check sum, respectively, which are exactly the values put into the header of any \str{TFM} file that shares information with this \str{GF} file. Parameters \id{hppp} and \id{vppp} are the ratios of pixels per point, horizontally and vertically, expressed as \id{scaled} integers (i.e., multiplied by $2^{16}$); they can be used to correlate the font with specific device resolutions, magnifications, and ``at sizes.'' Then come \id{min\_m}, \id{max\_m}, \id{min\_n}, and \id{max\_n}, which bound the values that registers $m$ and~$n$ assume in all characters in this \str{GF} file. (These bounds need not be the best possible; \id{max\_m} and \id{min\_n} may, on the other hand, be tighter than the similar bounds in \id{boc} commands. For example, some character may have $\id{min\_n}=-100$ in its \id{boc}, but it might turn out that $n$ never gets lower than $-50$ in any character; then \id{min\_n} can have any value $\le -50$. If there are no characters in the file, it's possible to have $\id{min\_m}>\id{max\_m}$ and/or $\id{min\_n}>\id{max\_n}$.) \medbreak Character locators are introduced by \id{char\_loc} commands, which specify a character residue~$c$, character escapements ($\id{dx},\id{dy}$), a character width~$w$, and a pointer~$p$ to the beginning of that character. (If two or more characters have the same code~$c$ modulo 256, only the last will be indicated; the others can be located by following backpointers. Characters whose codes differ by a multiple of 256 are assumed to share the same font metric information, hence the \str{TFM} file contains only residues of character codes modulo~256. This convention is intended for oriental languages, when there are many character shapes but few distinct widths.) The character escapements ($\id{dx},\id{dy}$) are the values of \MF's \res{chardx} and \res{chardy} parameters; they are in units of \id{scaled} pixels; i.e., \id{dx} is in horizontal pixel units times $2^{16}$, and \id{dy} is in vertical pixel units times $2^{16}$. This is the intended amount of displacement after typesetting the character; for \str{DVI} files, \id{dy} should be zero, but other document file formats allow nonzero vertical escapement. The character width~$w$ duplicates the information in the \str{TFM} file; it is $2^{24}$ times the ratio of the true width to the font's design size. The backpointer $p$ points to the character's \id{boc}, or to the first of a sequence of consecutive \id{xxx} or \id{yyy} or \id{no\_op} commands that immediately precede the \id{boc}, if such commands exist; such ``special'' commands essentially belong to the characters, while the special commands after the final character belong to the postamble (i.e., to the font as a whole). This convention about $p$ applies also to the backpointers in \id{boc} commands, even though it wasn't explained in the description of~\id{boc}. Pointer $p$ might be $-1$ if the character exists in the \str{TFM} file but not in the \str{GF} file. This unusual situation can arise in \MF\ output if the user had $\id{proofing}<0$ when the character was being shipped out, but then made $\id{proofing}\ge 0$ in order to get a \str{GF} file. \medbreak The last part of the postamble, following the \id{post\_post} byte that signifies the end of the character locators, contains $q$, a pointer to the \id{post} command that started the postamble. An identification byte, $i$, comes next; this currently equals~131, as in the preamble. The $i$ byte is followed by four or more bytes that are all equal to the decimal number 223 (i.e., \H{DF} in hexadecimal). \MF\ puts out four to seven of these trailing bytes, until the total length of the file is a multiple of four bytes, since this works out best on machines that pack four bytes per word; but any number of 223's is allowed, as long as there are at least four of them. In effect, 223 is a sort of signature that is added at the very end. This curious way to finish off a \str{GF} file makes it feasible for \str{GF}-reading programs to find the postamble first, on most computers, even though \MF\ wants to write the postamble last. Most operating systems permit random access to individual words or bytes of a file, so the \str{GF} reader can start at the end and skip backwards over the 223's until finding the identification byte. Then it can back up four bytes, read $q$, and move to byte $q$ of the file. This byte should, of course, contain the value 248 (\id{post}); now the postamble can be read, so the \str{GF} reader can discover all the information needed for individual characters. \endinput