%			 SGML-TeX conference
%			     Aug 31, 1990
%			Groningen, Netherlands
%			----------------------

\centerline{\bf Buses and wierdness in Groningen}
\smallskip
\noindent
For their second full-day international meeting, the Nederlandstalige \TeX\
Gebruikersgroep (NTG) organised, in conjunction with the Dutch \SGML\
Users Group, a conference intended to focus interest on the use of \TeX\
and \SGML\ together.  On August 31st,
 approximately 100 delegates from both `camps'
attended the day's events in Groningen, with what seemed a good
balance between \SGML-ers and \TeX ies.
 
The only drawback was that most of the day was organised in two
parallel streams of presentations and these tended to be a `\TeX\
stream' and an `\SGML\ stream'; consequently, delegates often chose to
hear talks on the package with which they identified more closely  --
at least, this delegate did.  So, one came away with a still hazy
appreciation of the alternative approach to mark-up.  I hope that this
situation will improve in future and that staunch supporters of \SGML\
and of \TeX\ will each come to understand why their opposite numbers
are so excited by {\it their\/} favourite method of tackling the
mark-up problem.  With that said, you may forgive me if the following
account concentrates on \TeX-related issues.
Malcolm Clark, who was also at the conference has added some accounts (in 
square brackets) where he went to a different presentation.
 
[There were some demonstrations from vendors, chiefly {\sl The 
Publisher\/} from both \TeX cel and MID Information, both running on Sun 
workstations. The local bookshop also had an impressive display of books 
for sale at the conference.]

The day started with an introduction by Kees van der~Laan, who, as
President of NTG, had been instrumental in organising the meeting,
with the assistance of members of the NTG and the \sgml-Holland users 
group.  In the opening talk, prior to
the parting of the ways into the parallel sessions, Prof.~Gerard
Kempen, of the University of Nijmegen, discussed {\it Language
technology and the future of text automation}.  This was an account of
some of the work in the Language Technology Project at Nijmegen, and
focussed on methods for enhancing the transportability of text between
systems including \SGML, \TeX\ and ODA\slash ODIF by separating content and
form, logical structure and layout.  The approach used employs
object-oriented methods for parsing the syntax of sentences.
Prof.~Kempen described the use of `Corrie', a module for checking and
correcting Dutch spelling, which is based on the syntactic and
morphological parsing of text.  Other modules used in the project
allow the inflection of known words and attempt to detect typos in
unknown words, by analysing spelling and missing letters.  Sentences
are also scanned for idioms.  In this way, parse trees are built which
describe the hierarchical structure of sentences, storing not the
words themselves but pointers to a standard lexicon.  In the current
implementation, it is possible to handle the re-arrangement of
subtrees graphically by means of a {\sc wimp} interface.  It is hoped to
develop a `grammar spreadsheet' to allow the automatic propagation of
word changes; for example, pluralising one word will result in
appropriate changes in other parts of the sentence.  The advantages of
the approach taken in the project are claimed to be easier editing,
enforcement of spelling standards, and applications including
semi-automatic translation into other languages as well as speech,
information retrieval, and hypertext.
 
As the first talk in the \TeX\ stream, Joop van~Gent, of the Institute
of Language Technology and AI, Tilburg, spoke on {\it Two Faces of
Text}, describing document retrieval and \TeX\null.  He described an
approach to the structure of published documents which considered two
types of structure: logical structure and typography.  While automatic
methods can be used for scanning the logical structure of a document,
human readers tend to use typographic cues in the typefaces and layout
of a document to locate information.  It is difficult to imitate this
search strategy in computer systems since it requires pattern
recognition on the typeset page image.  Only the typesetting software
can `know' the relationship between logical parts of a document and
the typeset output.  Work is \hbox{underway} to link these by storing the
positions of logical page-elements.  At Tilburg, this has been done by
adapting \TeX\ to produce a logical structure tree and predicates in a
format which can later be searched to locate parts of a document on
the typeset page.  The documents must be parsed while processing a
search query, but the query language, EL/DR, can be interpreted or
compiled by Prolog, allowing the use of natural language queries.
Some knowledge of the document structure, as nested lists or tree
structures with well-defined properties, is used to make basic
operations trivial.
 
The next speaker was Malcolm Clark, who addressed {\it Exchanging
documents: a \TeX nical perspective}, briefly reviewing document
`views' -- typographic versus logical structure; content, which may not
be orthogonal to the structure; and revisable versus non-revisable
documents -- and the requirements for document exchange, either within
or between groups, perhaps over heterogeous networks.  In the \TeX\
community, the latter need is met to a greater or lesser extent by the
use of electronic mail as a transmission method, and Malcolm discussed
some of the problems which can arise.  With the new version of \TeX,
providing support for 8-bit character sets, source files for documents
must be somehow encoded into 7-bit data for transmission using {\it email{}},
and Malcolm advocated the provision of a 7-to-8-bit filter, written in
\WEB, as a standard tool in the \TeX\ toolbox.  This would enable the
exchange of \TeX\ source files, representing revisable documents, and
\DVI\ files, when compressed and encoded to \ASCII, as non-revisable
documents.
 
Since Malcolm had not used all of his allotted time, he was persuaded
by Kees van der~Laan briefly to discuss the use of graphics in \TeX,
to fill the remaining minutes before lunch.
 
Frank Mittelbach had been scheduled to conduct a `work lunch' session
on the new features in \LaTeX\ 3.0, but his talk was instead
re-arranged into a third mini-stream at the start of the afternoon
session.  I chose to attend that, since it is more relevant to my
work, thereby regretfully missing Amy Hendrickson's talk on \TeX\
macro techniques, which I look forward to hearing at the Cork meeting.
With Frank's talk over-running, I missed most of Victor Eijkhout's
discussion of {\it The document style designer as separate entity},
but heard Johannes Braams' description of {\it The Dutch national
\LaTeXsl\ effort}.  This included a discussion of replacements for the
{\tt article} and {\tt report} document styles to produce documents
with a more European appearance than the \LaTeX\ book.  A new
contribution from the Netherlands in this area is `Babel', a
multilingual document style for the standard \LaTeX\ styles.  This
replaces the language-specific headings, hyphenation patterns and
explicit hyphenation points with support for more than one language in
a document and allows easy switching between them.  Usable with
plain \TeX\ or \LaTeX, each language can have a separate option file.  When
used with \LaTeX, the language can be selected on the 
|\documentstyle| command.

[Sake Hogeveen, {\it Aspects of Scientific Publishing\/}, is an astronomer 
who also works with some publishing houses. He claimed to be presenting 
an introductory course in \TeX\ in five minutes and a \LaTeX\ 
introduction in six! This seemed a bold claim which I could not resist. 
Basically he tried to argue through the markup route from genuine' copy 
editor mark up through to \TeX\ typographic markup and then to structural 
markup. No great conceptual leaps there, but since it was addressed to an 
audience made up mainly of \sgml\ people -- that is, suits and ties -- it 
had a useful function. One of the things to be learned fromn this meeting 
was the almost complete ignorance of the \TeX\ world for \sgml, and the 
equal ignorance of the \sgml\ world for \TeX\ and \LaTeX. Neither 
position is helped by our inability to speak to one another at the same 
level. Sad. Sake contended that good typography supports structure. Few 
could argue with that. He also seemed to be advocating that we should be 
prepared to get our fingers dirty by changing style files. This seems 
like a bad idea to me. \LaTeX\ style files are tortuous to `amend': it 
often seems to me that this is a good thing. The average document 
producer is not a document designer, and his or her results can be 
appalling.]

[Victor Eijkhout \& Andries Lenstra described a system, written in \TeX\ 
which allowed a `document style designer' to develop styles, without 
having to know \TeX. They use a fairly free syntax to define constructs,
placing the designer in a position between format writer and document 
producer. It was not at all clear to me whether this existed as a real 
product, or whether it was a set of stubs, full of good intentions. But 
even as a prototype it introduced many ideas which could be fruitful.]

   
A fascinating trip down memory lane was then given by Barbara Beeton,
who described ten years of {\it TUGboat production: \TeX, \LaTeXsl\ and
paste-up}, with a look at the pitfalls and rewards of editing TUG's
journal, interspersed with sideways glances at fragments of \TeX\
hagiography.  A rich goldmine for devising a \TeX\ trivia quiz!
 
The streams were finally drawn together for a talk by Manfred Kr\"uger
of MID, who discussed {\it \SGML\ and \TeX: Two core modules of
information processing}.  He highlighted the misunderstandings that
\SGML-ers and \TeX ies have of each other's worlds, claiming that the
question, `Which is better?', is not a useful comparision, since \SGML\
`does nothing', yet \TeX\ is `a solution for everything.'
Publishing \hbox{requires} the use of \SGML\ {\it and\/} \TeX\null.  A slightly
heated discussion between speaker and audience resulted when it was
suggested that the coupling of \SGML\ marked-up documents with the new
\LaTeX\ could be achieved by making \LaTeX\ macros read the \SGML\
reference syntax, since this could allow an interface for the
specification of processing instructions in a structured and standard
way (cf.~the results of the {\sc daphne} project).
 
At the conclusion of this talk, the faint-hearted departed for the bar
and the cocktail party.  However, those with a thirst for knowledge
who stayed were rewarded by a discussion of {\it Presentation rules
and rules for composition in the formatting of complex text} by
Richard Southall, now of Xerox EuroPARC\null.  Richard began by
demonstrating that both \SGML\ and \LaTeX\ have missed aspects of the
highest quality typesetting which can be achieved manually.  While
generalised markup can define `content objects', that the presentation
rules define the layout is only {\it sometimes\/} true.  He then took
examples from entries in {\it Math Reviews\/}, published by the AMS,
having been recently involved in a complete redesign of the layout and
typography.  The markup in the MR entries is almost exhaustive, yet
the entries before the redesign project show `wierdnesses' in the
typography since presentation rules are being applied to the whole
entry, both in the body of the review proper and in the bibliographic
details.  Former colleagues from Reading University made a
structural analysis of the whole journal, while Richard concentrated on the
structure of individual entries.  He found examples of poor spacing
in authors' and journal names in the bibliographic references, leaving
large spaces (`buses'\dots\ large enough to take a London omnibus)
which produce barely perceptible interruptions when a reader scans the
line.  Since these entries frequently contain abbreviations and
punctuation, he offered quotations from typographer's handbooks by
Tchischold, DeVinne and others illustrating the presentation rules for
punctuated text.  These seemed to indicate that \TeX's so-called
`french spacing' should be applied in such circumstances, since
traditional spacing between sentences is that of normal word spacing.
Lastly, the concept of `compositional environments', in which a set of
presentation rules apply, was discussed.  In electronic formatting
systems, these seem to have first been supported in {\sl Scribe}, which
provided delimited areas for such rules.  In \TeX, there are two clear
compositional environments (modes): inline and display math.  In other
environments, implied or procedural rules are used, such as the
spacing in quoted quotes, for example.  The problems in the first
\TeX\ version of {\it Math Reviews\/} were mainly due to the
application of the same presentation rules over all the content
objects in each entry.  Once appropriate compositional environments
were introduced, it was interesting to see from the finished examples
that the entries were more readable, yet occupied less space.
Following a short question and answer session, and with our thirst for
knowledge, if not beer, now slaked, we retired to the bar to join the
end-of-conference festivities.
\author{David Osborne}
\endinput