\chapter{Macros for typesetting chemical structure fragments}
\label{ch:frags}
\section{General organization of a structure macro}
 The macro facility in \TeX/\LaTeX\  was used to define mnemonics for
 typesetting frequently occurring structure fragments such as
 common ring structures and branching patterns.
 Chapter~\ref{ch:macros}
 describes the complete system of
 macros designed for this thesis. All of the macros
 are defined with the \LaTeX\  declaration \verb+\newcommand+ which
 has the format
 $$\hbox{\verb+\newcommand{\commandname}[n]{replacement text}+}$$
 (\TeX\  calls these definitions macros, whereas \LaTeX\  just uses the
 more general word ``command.'') In the definition, n is an integer
 from 1 to 9 and gives the number of arguments if any are used.
 The arguments are represented in the replacement text by
 parameters of the form \#1, \#2 etc.
 
 A macroname, like the name of any control sequence in \TeX,
 can contain letters only, not numerals.  Where it was considered
 important in this thesis to indicate a numbering scheme in
 a macroname, either the full word for the number or Roman
 numerals in lower case letters were used. Thus \verb+\hetisix+
 is a mnemonic for a hetero sixring with one hetero atom, and
 \verb+\fuseiv+ indicates a fusing fragment with four atoms. ---
 In some macronames the mnemonic as such is preceded by ``c''
 or ``chem,'' for ``chemistry.'' This was mainly done where the
 mnemonic was already used for a control sequence in \TeX\
 or \LaTeX. Thus, chemical structure fragments that point
 ``right'' or ``left'' are drawn by the macros \verb+\cright+
 and \verb+\cleft+ since \TeX\  employs control sequences
 \verb+\right+ and \verb+\left+ with different meanings.
 
 All the structure-drawing macro definitions in this thesis
 follow the same pattern in their organization. The actual
 structure-drawing \LaTeX\  code is preceded by a ``box''
 declaration.
 The diagrams are produced in a box because it is sometimes
 desirable to move the diagram as a whole.
 In all but one of the macros, the box is the \LaTeX\
 ``picture''; in the macro \verb+\tbranch+,
 which uses the tabbing environment, the box is
 the \LaTeX\  minipage. Box dimensions in the
 macros are made flexible through the use of global variables.
 --- The structure-drawing code itself consists of unconditional
 and of conditional statements.  The macro arguments are used to
 vary parts of the structure diagram, such as substituents
 and multiple bonds. Those features of a structure macro
 that have not been discussed before are described in more
 detail in the following sections.
 
\subsection{Box constructions with global variables}
 Integer variables such as the ones used here for box dimensions
 have to be stored in one of \TeX's 256 numerical registers and can
 be given symbolic names with \TeX's \verb+\newcount+ declaration
 (Knuth~84, pp.~118--121). The variables used in the macros of
 this thesis are defined and initialized in the macro
 \verb+\initial+ which should be part of the preamble of an input
 file containing chemistry typesetting using this system (see
 Figure~\ref{fg:preamble}
 for a summary of the preamble). A user can then change the
 variables by simple assignment, e. g. \verb+\xi=400+.
 When several variables have been changed, it is convenient to
 reset all of them, including the unitlength, to their initial
 values with the macro \verb+\reinit+.
 
 The role of the \LaTeX\  picture for line-drawing was discussed
 in Chapter~\ref{ch:txltx},
 but the picture is also a box.
 As such it is processed in horizontal mode, as part of a line.
 Within a horizontal box, line breaks can never occur.
 The macros use variables for all numerical
 parameters in the picture declaration.
 Thus, the picture declaration in the macros has the form \\
 $$\hbox{\verb+\begin{picture}(\pw,\pht)(\-xi,\-yi)+}$$
 The picture width and height, \verb+\pw+ and \verb+\pht+,
 specify the nominal size used by \TeX\  to determine how much
 room to leave for the box. The diagram in the box can extend
 beyond these dimensions, but an adjoining box is typeset next
 to the preceding one according to the specified width.
 The user needs control over the picture width in cases where
 several such boxes are put on one line, especially for the
 horizontal connection of structure fragments.
  Control over the picture height is important because
 some chemical structures take up more vertical space than
 others. The picture width and height are initialized to 400
 and 900 respectively, which is about $1.4\times 3.2\hbox{cm}$ with
 the unitlength of 0.1points.
 
 Variables are used for the coordinates of the
 lower left corner, \verb+\xi+ and \verb+\yi+, so that the
 user can change the placement of the diagram within the
 picture window. This is not often necessary for individual
 structures since they can be conveniently centered by
 the display mechanisms discussed later in this chapter;
 and also the whole picture can be shifted horizontally by
 adding horizontal space in front of it with the
 \verb+\hspace+ command.
 It was considered to be most convenient to put the minus
 signs in front of \verb+\xi+ and \verb+\yi+ in the declaration,
 since one thinks of the lower left corner of a coordinate
 system as having negative coordinates. With this declaration,
 an increase in the absolute \verb+\xi+ and \verb+\yi+ values
 shifts the diagram to the right and up. The coordinates
 \verb+\xi+ and \verb+\yi+ are initialized to 0 and 300
 respectively, which places the coordinate origin about 1cm
 above the bottom of the picture window with the unitlength
 of 0.1points.
 
 The minipage, the box used in the macro \verb+\tbranch+,
 is a paragraph box, which allows line breaks. Only the width
 is specified for a paragraph box since the height is
 controlled by the number of lines that will be produced by
 a given amount of text. The variable used in this system
 for the width of paragraph boxes is \verb+\xbox+. The number
 value assigned to \verb+\xbox+ is interpreted as printer
 points.
 
\subsection{Use of \TeX's conditional facility}
 \TeX's conditional facility is very similar to those of other
 high-level languages; it has the form:
 $$\hbox
 {\verb+\if+\(\langle\)condition\(\rangle\langle\)true text\(\rangle\)
  \verb+\else+\(\langle\)false text\(\rangle\)\verb+\fi+}
 $$
 (Knuth~84, p.~207ff.).  Nesting is possible.
 The \TeX\  \verb+\if+ primitive has over ten different forms
 for testing numbers, processing modes, or tokens.
 The form used in the chemical structure macros is
 \verb+\ifx+${\rm \langle token_{1}\rangle \langle
 token_{2}\rangle}$, which tests for the equality of the
 (character code, category code) pair of two tokens.
 
 In the structure macros, the \verb+\ifx+ tests the
 arguments. When the arguments are single characters,
 such as ``S'', ``D'', or ``C'' for single bond, double
 bond, and circle, respectively, the application is
 straightforward.  The character ``Q'' is used as argument
 where ``no action'' --- no substituent, no additional bond
 --- is a desired option at a particular place in a structure
 diagram. Thus, the coding for ring positions where substituents
 are an option is:
 
 \begin{tabbing}
  move in some\= $\backslash $ifx\#nQ\= print the substituent \#n
                                        $\backslash $fi\+ \kill
                 \verb+\ifx+\#nQ\>                           \\
                 \verb+\else+   \> draw a bond line \+  \\
                 print the substituent \#n \verb+\fi+
  \end{tabbing}
 
 The parameter n represents the substituent formula. ---
 The explicit no-action symbol makes it possible to distinguish
 three different cases at a particular ring position: no action,
 just a bond line extending from the ring, and a bond line with
 a substituent at the position. As an example, the purine macro
 was used with an argument of Q for the 9-position in the
 left-hand diagram and with an empty set argument in the
 right-hand diagram:
 \[ \purine{Q}{D}{Q}{D}{Q}{$NH_{2}$}{Q}{D}{Q}
    \hspace{3cm}\purine{Q}{D}{Q}{D}{Q}{$NH_{2}$}{Q}{D}{}  \]
 One would use the bond-line-only option in cases where another
 structure fragment in a picture is to be attached to the bond.
 --- The character Q was chosen because it is not part of any
 element symbol and is not commonly used as a structural symbol
 otherwise.
 
 When the parameter after the \verb+\ifx+ is substituted by a
 text string representing a multi-character substituent, \TeX\
 actually compares the first character of the string with its
 second character, since these are the first two tokens
 encountered. Thus, a substituent that begins with two identical
 characters always makes the condition true. In such cases,
 one has to ensure that the string as a whole is compared by
 enclosing it in a box, {\it e.g.,\/}
 \verb+\mbox{$\rm NNHC_{6}H_{5}$}+.
 
 \TeX's conditional facility can also be used to impart some
 chemical intelligence to a macro by causing screen messages
 to be generated when the user supplies a combination of
 arguments that is chemically not possible. Such a combination
 would be a ring double bond (argument 7=``D'') and a
 circle denoting aromaticity (argument 9=``C'') for the
 carbon sixring. A section of code
 
 \begin{tabbing}
  move in \= $\backslash $ifx\#7D  $\backslash $ifx\#9C
          \= $\backslash $message\{Error: $\ldots $\}\+ \kill
          \verb+\ifx+\#7D \verb+\ifx+\#9C \> \+ \\
          \verb+\message{Error:+ $\ldots $\verb+}+ \- \\
          \verb+\fi \fi+
  \end{tabbing}
 
 will produce the message on the screen while the input
 file is processed by \TeX\  to give the DVI file. The user
 can then correct the mistake and reprocess the input file
 before sending the DVI file to the output device.
 Error messages were not placed into all macros, just into
 the \verb+\sixring+ macro to demonstrate this feature.
 
 In addition to the simple \verb+\if+ statements \TeX\  has
 an \verb+\ifcase+ construction of the form \\
 \indent \verb+\ifcase+(number)(text for case 0) \verb+\or+
         (text for case 1) \verb+\or+ $\ldots $ \\
 \indent \ \ \ \verb+\or+ (text for case n) \verb+\else+
         (text for all other cases)  \verb+\fi+  \\
 (Knuth~84, p.~210). When the \verb+\ifcase+ statement
 is used in a macro and the case number is passed as an
 argument, many different actions can be requested
 through one argument. For a larger number of cases, the
 \TeX\  code with \verb+\ifcase+ is somewhat more elegant
 than a series of individual \verb+\if+ statements.
 A suitable application for the chemical structure
 macros is the placement of double bonds in various
 positions of a structure, where the position number,
 or a numeric code for a combination of positions,
 is passed as the case number.  One version of the
 sixring macro, \verb+\sixringb+, contains an
 \verb+\ifcase+ construct.
 
\section{Use of structure macros}
\subsection{Invoking the structure macros}
 In order to typeset a chemical structure through invoking one
 of the macros in this system, the user must know how the structure
 will be oriented on  the page, besides knowing, of course,
 the function of each argument. The structures in this system can
 not be rotated; they are oriented according to common practices
 in chemistry, but occasionally the user will have to adapt
 a model structure to the given orientation. Chapter~\ref{ch:macros}
 shows
 for each macro a typical structure produced by it, to illustrate
 the orientation and the position numbers. In this section it
 will be demonstrated with two representative macros how this
 information is to be used.
 
 The macro \verb+\cright+ typesets structures or structure
 fragments of the general form
 $$\pht=600\cright{$R^{1}$}{S}{$Z$}{S}{$R^{5}$}{S}{$R^{7}$} \hspace{1cm}
    \mbox{.}$$
 The arguments~1, 3, 5, and~7 are the substituents or groups
 $R^{1}$, $Z$, $R^{5}$, and $R^{7}$. Arguments 2, 4, and 6 trigger
 the drawing of bonds between $R^{1}$ and $Z$, between $Z$ and $R^{5}$,
 and between $Z$ and $R^{7}$, respectively. The bonds can be single
 or double bonds, from arguments ``S'' and ``D'' respectively,
 and the bond between $R^{1}$ and $Z$ does not have to be present.
 A typical structure is shown in Figure~\ref{fg:crightexample};
 it was drawn using \verb+\cright{$CH_3$}{Q}{$CH$}{S}{$COOCH_3$}S{$COOCH_3$}+.
 Since the second argument for Figure+\ref{fg:crightexample}
 is ``Q,'' no bond is drawn between ${\rm R^{1}}$ and Z; and
 ${\rm R^{1}}$, which is in a \verb+\makebox+ (see Section~\ref{sc:fragments}),
 is moved next to Z. Figure~\ref{fg:crighttwo}
 shows two additional structures
 drawn with the \verb+\cright+ macro.
 
 \begin{figure}\centering
  \cright{$CH_{3}$}{Q}{$CH$}{S}{$COOCH_{3}$}{S}{$COOCH_{3}$}
  \caption{Figure drawn using \tt\char"5C{}cright}
\label{fg:crightexample}
  \end{figure}
 
 
 \begin{figure}                % figure 3.2
  \hspace{3cm}
  \cright{$CH_{3}CH$}{D}{$C$}{S}{$CH_{3}$}{S}{$CH_{3}$}
  \hspace{3cm}
  \cright{$R$}{S}{$C$}{S}{$O^{-}$}{D}{${NH_{2}}^{+}$}
  \caption{Structures drawn with the {\tt\char"5C{}cright} macro}
\label{fg:crighttwo}
 \end{figure}
 
 The macro \verb+\sixring+ typesets the very common carbon sixring.
 For this structure and all the ring structures, the user has to
 know how the system of macros assigns position numbers to the
 ring atoms. The assignment follows chemical nomenclature rules
 where applicable. However in the case of single-ring structures
 where all ring atoms are carbons,
 the assignment of position
 number~1 is arbitrary. In this system of macros, the sixring is
 numbered as follows:
 \pht=900
 \[ \sixring{$R^{1}$}{$R^{2}$}{$R^{3}$}{$R^{4}$}{$R^{5}$}{$R^{6}$}
    {S}{S}{S} \hspace{2cm}  \mbox{.}  \]
 The first six arguments are the formulas for the optional
 substituents in the respective positions. The user has to
 refer to the position assignment to supply the text strings
 in the correct form, e. g. a sulfonic acid group for positions
 4 and~5 would be typed in as $HO_{3}S$, whereas it would be
 $SO_{3}H$ for all other positions.  The remaining three
 arguments can produce alternating ring double bonds, but each
 of them has a second function. Argument~7 can produce a second
 substituent at position~1, argument~8 an outside double bond
 with substituent in position~3, and argument~9 a circle inside
 the ring denoting aromaticity. Where arguments have two
 functions in this way, the different structural features
 generated are of course mutually exclusive chemically.
 Figure~\ref{fg:sixexample}
 shows a sixring structure typeset with the command
\verb+\sixring{$OH$}{Q}{Q}{Q}{$NC$}{Q}{$CH_{3}$}{$NH$}{D}+.
 The substituent in position~3 is passed
 as argument~8 since it is not the regular single-bonded
 substituent represented by argument~3.
 
 \begin{figure}\centering
  \sixring{$OH$}{Q}{Q}{Q}{$NC$}{Q}{$CH_{3}$}{$NH$}{D}
  \caption{Structure typeset using \tt\char"5C{}sixring}
\label{fg:sixexample}
 \end{figure}
 
\subsection{Displaying macro-generated diagrams
    within a document}
 Since the macro-generated diagrams constitute \TeX\  boxes, the
 code for a diagram can be included anywhere in the input file
 and \TeX\  will try to find a place for the diagram as a whole
 in the line and on the page. The diagrams, however, take up
 more space horizontally and vertically than a box of text;
 therefore \TeX's line- and page breaking mechanisms would be
 strained, sometimes to such a degree that text squeezing or
 spreading would be apparent on the printed page.
 
 A convenient way to display one or several structures on one
 line by themselves and centered, is the math display environment,
 enclosed by \LaTeX\  with brackets in the form \verb+\[+$\ldots $\verb+\]+.
 A modification of this environment puts
 consecutive equation numbers at the right edge of the line.
 When using math display one has to remember that the only
 spacing in math mode is around math operators, space inserted
 by the user is ignored. Also, a new
 paragraph can not be started in math display. ---
 The diagrams in math display are printed at the place in
 the document where they are coded in the input file.
 Thus the display can still cause problems with pagebreaking,
 especially since the diagrams usually take up more vertical
 space than a math equation for which the environment is
 designed.
 
 The \LaTeX\  figure environment was specially designed for larger
 displays, such as the structure diagrams (Lamport 86, pp.~59, 60,
 176, 177). A figure will usually not appear at the place in the
 document where the user has coded it; instead \LaTeX\  finds space
 for it on the current page or the next one in such a way that
 an overfull page is never produced. The user has some control
 over the placement of figures with optional parameters such as
 top or bottom of a page, but \LaTeX\  still makes the final
 decision, often producing surprising results.
 Since a figure is moved to a convenient place by
 \LaTeX\ , the figure is called a ``float.''  In addition to the
 automatic space-finding, the figure environment has the
 advantage that it makes captions possible. Furthermore,
 several displayed objects, each with a caption, can be included
 in one figure environment; Figures~\ref{fg:lactics} and~\ref{fg:lacticr}
 present an
 example, produced by the \LaTeX\  code in Figure~\ref{fg:lacticc}
 The code shows that each picture with its caption is enclosed in
 a \verb+\parbox+.
 
 \begin{figure}
 \parbox{.4\textwidth}{\centering
 \begin{picture}(\pw,\pht)(-\xi,-\yi)
   \put(90,0)    {\circle{180}}
   \put(90,90)   {\line(0,1)   {70}}         % behind and up
   \put(60,170)  {$COOH$}
   \thicklines
   \put(30,10)   {\line(-5,2)  {140}}        % in front
   \put(-415,30) {\makebox(300,87)[r]{$HO$}} %  and left
   \put(150,10)  {\line(5,2)   {140}}        % in front
   \put(300,30)  {$H$}                       %  and right
   \thinlines
   \put(90,-90)  {\line(0,-1) {90}}          % behind and
   \put(60,-260) {$CH_{3}$}                  %  down
 \end{picture}
 \caption{${\rm (S)-}$lactic acid}   }
 \label{fg:lactics}
 \hfill
 \parbox{.4\textwidth}{\centering
 \begin{picture}(\pw,\pht)(-\xi,-\yi)
   \put(90,0)    {\circle{180}}
   \put(90,90)   {\line(0,1)   {70}}         % behind and up
   \put(60,170)  {$COOH$}
   \thicklines
   \put(30,10)   {\line(-5,2)  {140}}        % in front
   \put(-415,30) {\makebox(300,87)[r]{$H$}}  %  and left
   \put(150,10)  {\line(5,2)   {140}}        % in front
   \put(300,30)  {$OH$}                      %  and right
   \thinlines
   \put(90,-90)  {\line(0,-1) {90}}          % behind and
   \put(60,-260) {$CH_{3}$}                  %  down
 \end{picture}
 \caption{${\rm (R)-}$lactic acid}  }
 \label{fg:lacticr}
 \end{figure}
 
 \begin{figure}\centering
 \begin{verbatim}
    \begin{figure}
      \parbox{.4\textwidth}{\centering
        \ccirc{$COOH$}{$HO$}{$H$}{$CH_{3}$}
        %ccirc is a macro
      \caption{${\rm (S)-lactic acid}$}}
      \hfill
      \parbox{.4\textwidth}{\centering
        \ccirc{$COOH$}{$H$}{$OH$}{$CH_{3}$}
      \caption{${\rm (R)-lactic acid}$}}
    \end{figure}
 \end{verbatim}
 \caption{\LaTeX\  code for two captions in one figure}
 \label{fg:lacticc}
 \end{figure}
 
 It should also be mentioned that all the display mechanisms
 discussed above can be used in a two-column document style
 which is the format of most scientific chemistry journals.
 
 If the same structure is to be printed many times in a
 document, processing time can be saved by storing it in a
 \LaTeX\  \verb+\savebox+. The structure-drawing code then has to
 be processed once only. Applied to the chemistry macros,
 a statement
 $$\hbox{\verb+\savebox1{\macroname{arg+$_1$\verb+}{arg+$_2$\verb+}+}$$
 will process the code and save the typeset structure.
 A \verb+\usebox1+ statement, enclosed in a math display or
 figure environment, is then placed into the input file
 whereever the structure is to be printed. A \verb+\savebox+
 can also be given a symbolic name (Lamport~86, p.~101).