\chapter{Macros for typesetting chemical structure fragments} \label{ch:frags} \section{General organization of a structure macro} The macro facility in \TeX/\LaTeX\ was used to define mnemonics for typesetting frequently occurring structure fragments such as common ring structures and branching patterns. Chapter~\ref{ch:macros} describes the complete system of macros designed for this thesis. All of the macros are defined with the \LaTeX\ declaration \verb+\newcommand+ which has the format $$\hbox{\verb+\newcommand{\commandname}[n]{replacement text}+}$$ (\TeX\ calls these definitions macros, whereas \LaTeX\ just uses the more general word ``command.'') In the definition, n is an integer from 1 to 9 and gives the number of arguments if any are used. The arguments are represented in the replacement text by parameters of the form \#1, \#2 etc. A macroname, like the name of any control sequence in \TeX, can contain letters only, not numerals. Where it was considered important in this thesis to indicate a numbering scheme in a macroname, either the full word for the number or Roman numerals in lower case letters were used. Thus \verb+\hetisix+ is a mnemonic for a hetero sixring with one hetero atom, and \verb+\fuseiv+ indicates a fusing fragment with four atoms. --- In some macronames the mnemonic as such is preceded by ``c'' or ``chem,'' for ``chemistry.'' This was mainly done where the mnemonic was already used for a control sequence in \TeX\ or \LaTeX. Thus, chemical structure fragments that point ``right'' or ``left'' are drawn by the macros \verb+\cright+ and \verb+\cleft+ since \TeX\ employs control sequences \verb+\right+ and \verb+\left+ with different meanings. All the structure-drawing macro definitions in this thesis follow the same pattern in their organization. The actual structure-drawing \LaTeX\ code is preceded by a ``box'' declaration. The diagrams are produced in a box because it is sometimes desirable to move the diagram as a whole. In all but one of the macros, the box is the \LaTeX\ ``picture''; in the macro \verb+\tbranch+, which uses the tabbing environment, the box is the \LaTeX\ minipage. Box dimensions in the macros are made flexible through the use of global variables. --- The structure-drawing code itself consists of unconditional and of conditional statements. The macro arguments are used to vary parts of the structure diagram, such as substituents and multiple bonds. Those features of a structure macro that have not been discussed before are described in more detail in the following sections. \subsection{Box constructions with global variables} Integer variables such as the ones used here for box dimensions have to be stored in one of \TeX's 256 numerical registers and can be given symbolic names with \TeX's \verb+\newcount+ declaration (Knuth~84, pp.~118--121). The variables used in the macros of this thesis are defined and initialized in the macro \verb+\initial+ which should be part of the preamble of an input file containing chemistry typesetting using this system (see Figure~\ref{fg:preamble} for a summary of the preamble). A user can then change the variables by simple assignment, e. g. \verb+\xi=400+. When several variables have been changed, it is convenient to reset all of them, including the unitlength, to their initial values with the macro \verb+\reinit+. The role of the \LaTeX\ picture for line-drawing was discussed in Chapter~\ref{ch:txltx}, but the picture is also a box. As such it is processed in horizontal mode, as part of a line. Within a horizontal box, line breaks can never occur. The macros use variables for all numerical parameters in the picture declaration. Thus, the picture declaration in the macros has the form \\ $$\hbox{\verb+\begin{picture}(\pw,\pht)(\-xi,\-yi)+}$$ The picture width and height, \verb+\pw+ and \verb+\pht+, specify the nominal size used by \TeX\ to determine how much room to leave for the box. The diagram in the box can extend beyond these dimensions, but an adjoining box is typeset next to the preceding one according to the specified width. The user needs control over the picture width in cases where several such boxes are put on one line, especially for the horizontal connection of structure fragments. Control over the picture height is important because some chemical structures take up more vertical space than others. The picture width and height are initialized to 400 and 900 respectively, which is about $1.4\times 3.2\hbox{cm}$ with the unitlength of 0.1points. Variables are used for the coordinates of the lower left corner, \verb+\xi+ and \verb+\yi+, so that the user can change the placement of the diagram within the picture window. This is not often necessary for individual structures since they can be conveniently centered by the display mechanisms discussed later in this chapter; and also the whole picture can be shifted horizontally by adding horizontal space in front of it with the \verb+\hspace+ command. It was considered to be most convenient to put the minus signs in front of \verb+\xi+ and \verb+\yi+ in the declaration, since one thinks of the lower left corner of a coordinate system as having negative coordinates. With this declaration, an increase in the absolute \verb+\xi+ and \verb+\yi+ values shifts the diagram to the right and up. The coordinates \verb+\xi+ and \verb+\yi+ are initialized to 0 and 300 respectively, which places the coordinate origin about 1cm above the bottom of the picture window with the unitlength of 0.1points. The minipage, the box used in the macro \verb+\tbranch+, is a paragraph box, which allows line breaks. Only the width is specified for a paragraph box since the height is controlled by the number of lines that will be produced by a given amount of text. The variable used in this system for the width of paragraph boxes is \verb+\xbox+. The number value assigned to \verb+\xbox+ is interpreted as printer points. \subsection{Use of \TeX's conditional facility} \TeX's conditional facility is very similar to those of other high-level languages; it has the form: $$\hbox {\verb+\if+\(\langle\)condition\(\rangle\langle\)true text\(\rangle\) \verb+\else+\(\langle\)false text\(\rangle\)\verb+\fi+} $$ (Knuth~84, p.~207ff.). Nesting is possible. The \TeX\ \verb+\if+ primitive has over ten different forms for testing numbers, processing modes, or tokens. The form used in the chemical structure macros is \verb+\ifx+${\rm \langle token_{1}\rangle \langle token_{2}\rangle}$, which tests for the equality of the (character code, category code) pair of two tokens. In the structure macros, the \verb+\ifx+ tests the arguments. When the arguments are single characters, such as ``S'', ``D'', or ``C'' for single bond, double bond, and circle, respectively, the application is straightforward. The character ``Q'' is used as argument where ``no action'' --- no substituent, no additional bond --- is a desired option at a particular place in a structure diagram. Thus, the coding for ring positions where substituents are an option is: \begin{tabbing} move in some\= $\backslash $ifx\#nQ\= print the substituent \#n $\backslash $fi\+ \kill \verb+\ifx+\#nQ\> \\ \verb+\else+ \> draw a bond line \+ \\ print the substituent \#n \verb+\fi+ \end{tabbing} The parameter n represents the substituent formula. --- The explicit no-action symbol makes it possible to distinguish three different cases at a particular ring position: no action, just a bond line extending from the ring, and a bond line with a substituent at the position. As an example, the purine macro was used with an argument of Q for the 9-position in the left-hand diagram and with an empty set argument in the right-hand diagram: \[ \purine{Q}{D}{Q}{D}{Q}{$NH_{2}$}{Q}{D}{Q} \hspace{3cm}\purine{Q}{D}{Q}{D}{Q}{$NH_{2}$}{Q}{D}{} \] One would use the bond-line-only option in cases where another structure fragment in a picture is to be attached to the bond. --- The character Q was chosen because it is not part of any element symbol and is not commonly used as a structural symbol otherwise. When the parameter after the \verb+\ifx+ is substituted by a text string representing a multi-character substituent, \TeX\ actually compares the first character of the string with its second character, since these are the first two tokens encountered. Thus, a substituent that begins with two identical characters always makes the condition true. In such cases, one has to ensure that the string as a whole is compared by enclosing it in a box, {\it e.g.,\/} \verb+\mbox{$\rm NNHC_{6}H_{5}$}+. \TeX's conditional facility can also be used to impart some chemical intelligence to a macro by causing screen messages to be generated when the user supplies a combination of arguments that is chemically not possible. Such a combination would be a ring double bond (argument 7=``D'') and a circle denoting aromaticity (argument 9=``C'') for the carbon sixring. A section of code \begin{tabbing} move in \= $\backslash $ifx\#7D $\backslash $ifx\#9C \= $\backslash $message\{Error: $\ldots $\}\+ \kill \verb+\ifx+\#7D \verb+\ifx+\#9C \> \+ \\ \verb+\message{Error:+ $\ldots $\verb+}+ \- \\ \verb+\fi \fi+ \end{tabbing} will produce the message on the screen while the input file is processed by \TeX\ to give the DVI file. The user can then correct the mistake and reprocess the input file before sending the DVI file to the output device. Error messages were not placed into all macros, just into the \verb+\sixring+ macro to demonstrate this feature. In addition to the simple \verb+\if+ statements \TeX\ has an \verb+\ifcase+ construction of the form \\ \indent \verb+\ifcase+(number)(text for case 0) \verb+\or+ (text for case 1) \verb+\or+ $\ldots $ \\ \indent \ \ \ \verb+\or+ (text for case n) \verb+\else+ (text for all other cases) \verb+\fi+ \\ (Knuth~84, p.~210). When the \verb+\ifcase+ statement is used in a macro and the case number is passed as an argument, many different actions can be requested through one argument. For a larger number of cases, the \TeX\ code with \verb+\ifcase+ is somewhat more elegant than a series of individual \verb+\if+ statements. A suitable application for the chemical structure macros is the placement of double bonds in various positions of a structure, where the position number, or a numeric code for a combination of positions, is passed as the case number. One version of the sixring macro, \verb+\sixringb+, contains an \verb+\ifcase+ construct. \section{Use of structure macros} \subsection{Invoking the structure macros} In order to typeset a chemical structure through invoking one of the macros in this system, the user must know how the structure will be oriented on the page, besides knowing, of course, the function of each argument. The structures in this system can not be rotated; they are oriented according to common practices in chemistry, but occasionally the user will have to adapt a model structure to the given orientation. Chapter~\ref{ch:macros} shows for each macro a typical structure produced by it, to illustrate the orientation and the position numbers. In this section it will be demonstrated with two representative macros how this information is to be used. The macro \verb+\cright+ typesets structures or structure fragments of the general form $$\pht=600\cright{$R^{1}$}{S}{$Z$}{S}{$R^{5}$}{S}{$R^{7}$} \hspace{1cm} \mbox{.}$$ The arguments~1, 3, 5, and~7 are the substituents or groups $R^{1}$, $Z$, $R^{5}$, and $R^{7}$. Arguments 2, 4, and 6 trigger the drawing of bonds between $R^{1}$ and $Z$, between $Z$ and $R^{5}$, and between $Z$ and $R^{7}$, respectively. The bonds can be single or double bonds, from arguments ``S'' and ``D'' respectively, and the bond between $R^{1}$ and $Z$ does not have to be present. A typical structure is shown in Figure~\ref{fg:crightexample}; it was drawn using \verb+\cright{$CH_3$}{Q}{$CH$}{S}{$COOCH_3$}S{$COOCH_3$}+. Since the second argument for Figure+\ref{fg:crightexample} is ``Q,'' no bond is drawn between ${\rm R^{1}}$ and Z; and ${\rm R^{1}}$, which is in a \verb+\makebox+ (see Section~\ref{sc:fragments}), is moved next to Z. Figure~\ref{fg:crighttwo} shows two additional structures drawn with the \verb+\cright+ macro. \begin{figure}\centering \cright{$CH_{3}$}{Q}{$CH$}{S}{$COOCH_{3}$}{S}{$COOCH_{3}$} \caption{Figure drawn using \tt\char"5C{}cright} \label{fg:crightexample} \end{figure} \begin{figure} % figure 3.2 \hspace{3cm} \cright{$CH_{3}CH$}{D}{$C$}{S}{$CH_{3}$}{S}{$CH_{3}$} \hspace{3cm} \cright{$R$}{S}{$C$}{S}{$O^{-}$}{D}{${NH_{2}}^{+}$} \caption{Structures drawn with the {\tt\char"5C{}cright} macro} \label{fg:crighttwo} \end{figure} The macro \verb+\sixring+ typesets the very common carbon sixring. For this structure and all the ring structures, the user has to know how the system of macros assigns position numbers to the ring atoms. The assignment follows chemical nomenclature rules where applicable. However in the case of single-ring structures where all ring atoms are carbons, the assignment of position number~1 is arbitrary. In this system of macros, the sixring is numbered as follows: \pht=900 \[ \sixring{$R^{1}$}{$R^{2}$}{$R^{3}$}{$R^{4}$}{$R^{5}$}{$R^{6}$} {S}{S}{S} \hspace{2cm} \mbox{.} \] The first six arguments are the formulas for the optional substituents in the respective positions. The user has to refer to the position assignment to supply the text strings in the correct form, e. g. a sulfonic acid group for positions 4 and~5 would be typed in as $HO_{3}S$, whereas it would be $SO_{3}H$ for all other positions. The remaining three arguments can produce alternating ring double bonds, but each of them has a second function. Argument~7 can produce a second substituent at position~1, argument~8 an outside double bond with substituent in position~3, and argument~9 a circle inside the ring denoting aromaticity. Where arguments have two functions in this way, the different structural features generated are of course mutually exclusive chemically. Figure~\ref{fg:sixexample} shows a sixring structure typeset with the command \verb+\sixring{$OH$}{Q}{Q}{Q}{$NC$}{Q}{$CH_{3}$}{$NH$}{D}+. The substituent in position~3 is passed as argument~8 since it is not the regular single-bonded substituent represented by argument~3. \begin{figure}\centering \sixring{$OH$}{Q}{Q}{Q}{$NC$}{Q}{$CH_{3}$}{$NH$}{D} \caption{Structure typeset using \tt\char"5C{}sixring} \label{fg:sixexample} \end{figure} \subsection{Displaying macro-generated diagrams within a document} Since the macro-generated diagrams constitute \TeX\ boxes, the code for a diagram can be included anywhere in the input file and \TeX\ will try to find a place for the diagram as a whole in the line and on the page. The diagrams, however, take up more space horizontally and vertically than a box of text; therefore \TeX's line- and page breaking mechanisms would be strained, sometimes to such a degree that text squeezing or spreading would be apparent on the printed page. A convenient way to display one or several structures on one line by themselves and centered, is the math display environment, enclosed by \LaTeX\ with brackets in the form \verb+\[+$\ldots $\verb+\]+. A modification of this environment puts consecutive equation numbers at the right edge of the line. When using math display one has to remember that the only spacing in math mode is around math operators, space inserted by the user is ignored. Also, a new paragraph can not be started in math display. --- The diagrams in math display are printed at the place in the document where they are coded in the input file. Thus the display can still cause problems with pagebreaking, especially since the diagrams usually take up more vertical space than a math equation for which the environment is designed. The \LaTeX\ figure environment was specially designed for larger displays, such as the structure diagrams (Lamport 86, pp.~59, 60, 176, 177). A figure will usually not appear at the place in the document where the user has coded it; instead \LaTeX\ finds space for it on the current page or the next one in such a way that an overfull page is never produced. The user has some control over the placement of figures with optional parameters such as top or bottom of a page, but \LaTeX\ still makes the final decision, often producing surprising results. Since a figure is moved to a convenient place by \LaTeX\ , the figure is called a ``float.'' In addition to the automatic space-finding, the figure environment has the advantage that it makes captions possible. Furthermore, several displayed objects, each with a caption, can be included in one figure environment; Figures~\ref{fg:lactics} and~\ref{fg:lacticr} present an example, produced by the \LaTeX\ code in Figure~\ref{fg:lacticc} The code shows that each picture with its caption is enclosed in a \verb+\parbox+. \begin{figure} \parbox{.4\textwidth}{\centering \begin{picture}(\pw,\pht)(-\xi,-\yi) \put(90,0) {\circle{180}} \put(90,90) {\line(0,1) {70}} % behind and up \put(60,170) {$COOH$} \thicklines \put(30,10) {\line(-5,2) {140}} % in front \put(-415,30) {\makebox(300,87)[r]{$HO$}} % and left \put(150,10) {\line(5,2) {140}} % in front \put(300,30) {$H$} % and right \thinlines \put(90,-90) {\line(0,-1) {90}} % behind and \put(60,-260) {$CH_{3}$} % down \end{picture} \caption{${\rm (S)-}$lactic acid} } \label{fg:lactics} \hfill \parbox{.4\textwidth}{\centering \begin{picture}(\pw,\pht)(-\xi,-\yi) \put(90,0) {\circle{180}} \put(90,90) {\line(0,1) {70}} % behind and up \put(60,170) {$COOH$} \thicklines \put(30,10) {\line(-5,2) {140}} % in front \put(-415,30) {\makebox(300,87)[r]{$H$}} % and left \put(150,10) {\line(5,2) {140}} % in front \put(300,30) {$OH$} % and right \thinlines \put(90,-90) {\line(0,-1) {90}} % behind and \put(60,-260) {$CH_{3}$} % down \end{picture} \caption{${\rm (R)-}$lactic acid} } \label{fg:lacticr} \end{figure} \begin{figure}\centering \begin{verbatim} \begin{figure} \parbox{.4\textwidth}{\centering \ccirc{$COOH$}{$HO$}{$H$}{$CH_{3}$} %ccirc is a macro \caption{${\rm (S)-lactic acid}$}} \hfill \parbox{.4\textwidth}{\centering \ccirc{$COOH$}{$H$}{$OH$}{$CH_{3}$} \caption{${\rm (R)-lactic acid}$}} \end{figure} \end{verbatim} \caption{\LaTeX\ code for two captions in one figure} \label{fg:lacticc} \end{figure} It should also be mentioned that all the display mechanisms discussed above can be used in a two-column document style which is the format of most scientific chemistry journals. If the same structure is to be printed many times in a document, processing time can be saved by storing it in a \LaTeX\ \verb+\savebox+. The structure-drawing code then has to be processed once only. Applied to the chemistry macros, a statement $$\hbox{\verb+\savebox1{\macroname{arg+$_1$\verb+}{arg+$_2$\verb+}+}$$ will process the code and save the typeset structure. A \verb+\usebox1+ statement, enclosed in a math display or figure environment, is then placed into the input file whereever the structure is to be printed. A \verb+\savebox+ can also be given a symbolic name (Lamport~86, p.~101).