21-Oct-88 05:55:25-PDT,41684;000000000000 Return-Path: From: Peter King Date: Fri, 21 Oct 88 13:18:50 BST Subject: refer -> BiBTeX conversion Following feedback from a number of users, in particular Johnathan Bowen, I have modified and updated my 'ref2bib' script for conversion of refer databases to BiBTeX format. The files can now be extracted with a name of the user's choosing, to avoid clashes with Johnathan Bowen's 'ref2bib', although that remains the default. The heuristics for assigning the type of reference have been augmented, and a number of options are now selectable at run time, including the length and number of authors to be used in generating keys. Since the script is so long, it will probably be stored in the appropriate archives, but I am happy to mail it to people who do not have easy access to them. Peter King, Computer Science Department JANET: pjbk@uk.ac.hw.cs Heriot-Watt University ARPA: pjbk@cs.hw.ac.uk 79 Grassmarket, Edinburgh EH1 2HJ or pjbk%cs.hw.ac.uk@ucl-cs Phone: (+44) 31 225 6465 Ext. 555 UUCP: ..!ukc!cs.hw.ac.uk!pjbk %%% %%% This version also includes a bug fix that was submitted by Peter %%% on 10/25/88. Malcolm Brown, TeXhax moderator %%% -----cut here----------- export PATH || exec /bin/sh $0 $* : "This is a shar archive; use /bin/sh to extract" : "Extracted files will be owned by you, and will have" : "default permissions" : "to have a name other than the default 'ref2bib' give" : "the new name as a second argument to sh, after this shar file name" PRGRM=ref2bib if [ -n "$1" ] then PRGRM=$1 fi PATH=/bin:/usr/bin echo If this archive is complete, \"End of archive\" will appear at the end echo Extracting $PRGRM sed 's/^X//' <<'End-of-file' >$PRGRM X#!/bin/sh X# X# shell script to convert refer (or bib) databases to BiBTeX format X# X# Most of the shell script is based on that by Jonathan Bowen X# The awk and sed scripts are the work of Peter King X# with some ideas stolen from Jonathan Bowen X# X# XSEDFILE=${TMP-/tmp}/ref2b$$.sed XAWKFILE=${TMP-/tmp}/ref2b$$.awk XKEYFILE=${TMP-/tmp}/ref2b$$.key XNKEYFILE=${TMP-/tmp}/ref2b$$.nkey XAKEYFILE= XCAPSFILE= XPROGNAME=`basename $0` XNAUTHOR=3 XLAUTHOR=3 XERRFILE=${PROGNAME}.errs XDEFAULTWIDTH=72 XWIDTH=$DEFAULTWIDTH XNAMEDFILES=false XBIB=bib XDEBUG=false X XGEN=`date`" on "`hostname` XNAME=$BIB XSTDIN='' XNEWFILE="" X Xwhile expr X$1 : X'-' > /dev/null Xdo X case "$1" in X -|-0|-w) X WIDTH=-1 X ;; X -[1-9]|-[1-9][0-9]|-[1-9][0-9][0-9]) X WIDTH=`expr X"$1" : X'-\(.*\)'` X ;; X -c) : capitals file X CAPSFILE="$2" X shift X ;; X -d) : Enable debugging X DEBUG=true X ;; X -e) : error file X ERRFILE="$2" X shift X ;; X -kf) : key used file X AKEYFILE=$2 X shift X ;; X -k) : no of authors in key X NAUTHOR=$2 X shift X ;; X -l) : no of authors name letters to use X LAUTHOR=$2 X shift X ;; X -n) : Named output files X NAMEDFILES=true X ;; X -u|-U) X echo "Usage: $PROGNAME [ options ] [ file ... ] XConverts Unix \"refer\" format to \"BibTeX\" database format. X-c file Use file as source of protected words in titles X-d enable debugging (default=$DEBUG) X-e file error output in "file" (default=$ERRFILE) X-kf file Use file to initialise keys used X-k N Use N author's names for key (default=$NAUTHOR) X-l N Use N characters of author's name for key (default=$LAUTHOR) X-n output to named files (ext \".$BIB\") (default=$NAMEDFILES) X-w no maximum width X-u display usage information X-N maximum width of N characters (1-999) (default=$DEFAULTWIDTH)" X exit 0 X ;; X -*) X echo "Usage: $PROGNAME [ -[width] ] [ file ... ]" X exit 0 X ;; X esac X shift Xdone X Xtrap "rm -f $SEDFILE $AWKFILE $KEYFILE $NKEYFILE ;exit" 0 1 2 3 15 X Xcat $ERRFILE Xcat $KEYFILE X[ $AKEYFILE ] && [ -r $AKEYFILE ] && cat $AKEYFILE > $KEYFILE X X# although # introduces a comment to sed, this is undocumented, X# and we might as well strip it first if the file is to be used many times X# but this should work with cat replacing the sed -e here X Xsed -e '/^#/d' << 'ZZ' >$SEDFILE X# X# sed script to do some of the ref to bib database conversion X# X# written by Peter King, Heriot-Watt University X# You may do anything you like with this code X# EXCEPT claim that you wrote it X# X# remove as many redundant characters as possible Xs/ / /g Xs/ */ /g X# convert dashes Xs/ - / --- /g Xs/ $// X# X# First alter the TeX special characters X# but not the first % Xs/\(.\)%/\1{\\%}/g X# Xs/[&#_{}]/{\\&}/g X# you may want to leave dollars (especially if you have eqn in your refs Xs/\$/\\$/g X# convert the special characters and accents from troff to BibTeX X# assumes the accents are those of the Berkeley -ms with .AM X# X# convert any font rubbish (assumes that \fR is the default ) X/\\\\*f./{ X s/\\\\*f[I2]/{\\em /g X s/\\\\*f[B3]/{\\bf /g X s/\\\\*f[1PR]/}/g X} X/\\\\*[-*0('`^:~_/o,v".8?!QU ]/{ X s/\(.\)\\\\*\*'/{\\'\1}/g X s/\(.\)\\\\*\*`/{\\`\1}/g X s/\(.\)\\\\*\*^/{\\^\1}/g X s/\(.\)\\\\*\*:/{\\"\1}/g X s/\(.\)\\\\*\*~/{\\~\1}/g X s/\(.\)\\\\*\*_/{\\=\1}/g X s/\([oO]\)\\\\*\*\//{\\\1}/g X s/\([aA]\)\\\\*\*o/{\\\1\1}/g X s/\(.\)\\\\*\*,/{\\c{\1}}/g X s/\(.\)\\\\*\*v/{\\v{\1}}/g X s/\(.\)\\\\*\*"/{\\H{\1}}/g X s/\(.\)\\\\*\*\./{\\d{\1}}/g X s/\\\\*\*8/{\\ss}/g X s/\\\\*\*?/{?`}/g X s/\\\\*\*!/{!`}/g X s/\\\\*\*(P\([lL]\)/{\\\1}/g X s/\\\\*\*(ae\//{\\ae}/g X s/\\\\*\*(Ae\//{\\AE}/g X s/\\\\*\*(oe\//{\\oe}/g X s/\\\\*\*(Oe\//{\\OE}/g X# quotes X s/\\\\*\*Q/``/g X s/\\\\*\*U/''/g X s/\\\\*\*-/---/g X# convert \[space] to \0 for convenience X s/\\\\* /\\0/g X# \0 as space between surname de\0Souza etc. X s/\\\\*0\([a-z]*\)\\\\*0/\\0\1 /g X s/ \([a-z]*\)\\\\*0/ \1 /g X# but trap the ones that start with a capital letter and convert them to X# ties X s/\\\\*0/~/g X# X# now deal with special characters and Greek X s/\\\\*(hy/-/g X s/\\\\*(em/---/g X s/\\\\*(co/{\\copyright}/g X s/\\\\*(sc/{\\S}/g X s/\\\\*(if/$\\infty$/g X s/\\\\*(\*a/$\\alpha$/g X s/\\\\*(\*b/$\\beta$/g X s/\\\\*(\*g/$\\gamma$/g X s/\\\\*(\*d/$\\delta$/g X s/\\\\*(\*e/$\\epsilon$/g X s/\\\\*(\*z/$\\zeta$/g X s/\\\\*(\*y/$\\eta$/g X s/\\\\*(\*h/$\\theta$/g X s/\\\\*(\*i/$\\iota$/g X s/\\\\*(\*k/$\\kappa$/g X s/\\\\*(\*l/$\\lambda$/g X s/\\\\*(\*m/$\\mu$/g X s/\\\\*(\*n/$\\nu$/g X s/\\\\*(\*c/$\\xi$/g X s/\\\\*(\*o/$o$/g X s/\\\\*(\*p/$\\pi$/g X s/\\\\*(\*r/$\\rho$/g X s/\\\\*(\*s/$\\sigma$/g X s/\\\\*(\*t/$\\tau$/g X s/\\\\*(\*u/$\\upsilon$/g X s/\\\\*(\*f/$\\phi$/g X s/\\\\*(\*x/$\\chi$/g X s/\\\\*(\*q/$\\psi$/g X s/\\\\*(\*w/$\\omega$/g X s/\\\\*(\*A/A/g X s/\\\\*(\*B/B/g X s/\\\\*(\*G/$\\Gamma$/g X s/\\\\*(\*D/$\\Delta$/g X s/\\\\*(\*E/E/g X s/\\\\*(\*Z/Z/g X s/\\\\*(\*Y/H/g X s/\\\\*(\*H/$\\Theta$/g X s/\\\\*(\*I/I/g X s/\\\\*(\*K/K/g X s/\\\\*(\*L/$\\Lambda$/g X s/\\\\*(\*M/M/g X s/\\\\*(\*N/N/g X s/\\\\*(\*C/$\\Xi$/g X s/\\\\*(\*O/$O$/g X s/\\\\*(\*P/$\\Pi$/g X s/\\\\*(\*R/P/g X s/\\\\*(\*S/$\\Sigma$/g X s/\\\\*(\*T/T/g X s/\\\\*(\*U/$\\Upsilon$/g X s/\\\\*(\*F/$\\Phi$/g X s/\\\\*(\*X/X/g X s/\\\\*(\*Q/$\\Psi$/g X s/\\\\*(\*W/$\\Omega$/g X} X# Now trap title words that must be capitalised X/^%[^T]/b X# X# first all words that are all capitals (at least two consecutive) X# we need the slashes to allow for M/M/1 queues Xs;[A-Z][A-Z/][A-Z/0-9]*;{&};g X# and single letters (except A!) Xs/\([{ ]\)\([B-Z]\)\([ :\.,}]\)/\1{\2}\3/g Xs/ \([B-Z]\)$/ {\1}/ XZZ X X# Now append to this any proper name that might appear in titles X# Use sed to generate a sed script ! X# the input is a list of words which must remain capitalised X# one to a line X# a trailing * will be converted into a pattern at the end to get Markov, X# Markovian, etc. X Xcat - $CAPSFILE << ZZ | sed -e '/^#/d;s/\*$/[^ -,;]*/;s=^..*$=s/&/{\&}/g=' >> $SEDFILE X# File of Title information that must maintain its capitalised state X# X# some proper names X# X# first some mathematicians XAbel XBernoulli XBessel XBeta XBorel XCauchy XChurch XRosser XDedekind XDescartes XDirichlet XEuclid* XEuler XFibonacci XFermat XFourier XFresnel XFrobenius XPerron XGamma XGauss* XHadamard XHilbert XHorner XHolder XJacobi* XJensen XMarkov* XArnoldi XLaplace XLaguerre XLagrang* XLegendre XLeibnitz XLipschitz X# this is really Poincare (acute accent), but the accent processing will disrupt it XPoincar XHermite XRayleigh XRitz XRiemann X# this is really Rouche (acute accent), but the accent processing will disrupt it XRouch XStieltjes XStiener XSchwarz XWeibull XWald XKronecker XKarmarkar XKendall XDiophantine XDelbrouck XBayes* XSchafer XDempster XRunge XKutta XPollaczek XKhinchin XPalm XErlang XEngset XLittle's XKosten XGittins XFeller XCox* XPoisson XChapman XKolmogorov XSmirnov XWeiner XHopf XStirling X X# computing XBuzen XGordon XNewell XLemoine XPierce XJackson XNewhall XTuring XNorton XPetri XWilkinson XSkinner XHarrison XCambridge XEthernet XAloha X X# coding theory XHamming XHuffman XReed XShannon XSolomon XViterbi XZZ X X# strip out the comments from the AWKFILE, just to cut down on the character count X# that awk needs to read before starting. For a single file conversion X# this will probably make no difference X# but this should work with cat replacing the sed -e here X Xsed -e '/^[ ]*#/d' -e '/#/s/[ ]*#.*$//' << 'ZZ' > $AWKFILE X# X# awk script to convert refer (or bib) format databases X# to BiBTeX format. X# X# written by Peter King, Heriot-Watt University X# use freely, but don't claim that you wrote it X# X# Generates keys using authors names and year X# X# You may wish to alter treatment of key fields that are ignored X# such as %U %W %Y etc. X# X# NB because awk recognises end of line as a statement terminator X# you cannot reliably split the long lines in the script X# X# regular expressions should be sorted according to frequency X# so that minimal tests are made X# From tests in a local data base the order given appears quite good X# 2883 %A X# 1813 blank lines X# 1774 %T X# 1764 %D X# 1505 %P X# 1347 %J X# 1331 %V X# 1201 %N X# 773 .. continuation lines X# 501 %C X# 424 %I X# 192 %B X# 187 %E X# 92 %S X# 89 %R X# 33 %X X# 30 %K X# 16 %O X# XBEGIN { X # suffices for the key generation process X for(i=1;i<=26;i++) X addkey[i] = substr("abcdefghijklmnopqrstuvwxyz",i,1) X # standard BiBTeX types so that you can change the case to BOOK X # or book, as required. Not all are used X article = "Article" # the @ will be added later X book = "Book" X booklet = "Booklet" X inbook = "InBook" X incollection = "InCollection" X inproceedings = "InProceedings" X manual = "Manual" X mastersthesis = "MastersThesis" X misc = "Misc" X phdthesis = "PhDThesis" X proceedings = "Proceedings" X techreport = "TechReport" X unpublished = "Unpublished" X X bibtype = misc XZZ X X# the following 'echo' commands communicate shell variables to X# the awk script. X Xecho >> $AWKFILE lkey = $LAUTHOR # number of characters used from authors to make key Xecho >> $AWKFILE maxauthor = $NAUTHOR # maximum number of authors to use in X # constructing key Xecho >> $AWKFILE errfil = \"$ERRFILE\" Xecho >> $AWKFILE keyfil = \"$KEYFILE\" Xecho >> $AWKFILE nkeyfil = \"$NKEYFILE\" Xecho >> $AWKFILE lwidth = $WIDTH Xecho >> $AWKFILE print \"@Comment{ Database converted by $PROGNAME\\n\\t$GEN}\\n\" X Xsed -e '/^[ ]*#/d' -e '/#/s/[ ]*#.*$//' << 'ZZ' >> $AWKFILE X rx = 1 X percent = 0 # not in a reference X } X X/[^{$]\\/ || /^\\/ { # we've protecetd all the \ we introduced in {} X err = 1 X print "Non translated \\ symbol : Reference " rx >> errfil X print $0 >> errfil X } X X/^%/ { # any % entry X percent = 1 # in a reference X entry = substr($0,4) X pctchar = substr($0,2,1) X } X X/^%[AQ]/ { # authors (multiple occurrences possible) X A ++ X if ( $1 == "%A" ) authors[A] = entry X else authors[A] = "{"entry"}" # corporate authors need protection! X if (A> maxauthor) next X # generate the key string X ic = 0 X lc = 1 X if ( $1 == "%A" ) keyfield = $NF X else keyfield = $2 # corporate authors use first word X while(ic < lkey && lc <= length(keyfield) ){ X kc = substr( keyfield, lc, 1) X if ( kc ~ /[a-zA-Z]/ ){ X keys = keys kc X ic++ X if (ic==lkey) next X } X else if ( kc == "\\" ) lc ++ X lc ++ X } X next X } X X/^$/ { # blank line X if (percent) { X # end of a reference X refs ++ X if (T==0) print "No title : Reference "refs" "keys >> errfil X if (A==0) print "No author : Reference "refs" "keys >> errfil X if (D==0) print "No date : Reference "refs" "keys >> errfil X if ((!T)||(!A)||(!D))err=1 X X X # date processing X nf = split(date,z) X year = date X kyear = year X if(nf>1){ X xmonth = z[1] X mm = 1 X i = 2 X while ( z[i] !~ /^[0-9][0-9][0-9][0-9]/ ) { X xmonth = xmonth " " z[i] X i++ X mm = 0 X } X month = "{ " xmonth " }" X X year = z[i] X kyear = year # in case there is any extraneous info X i++ X while ( i <= nf ) { X year = year " " z[i] X i++ X } X if((mm) && ( xmonth ~ /^[A-Za-z.]*$/ )){ X #if the month is only letters and . X if(xmonth ~ /^Ja/) month = "jan" X if(xmonth ~ /^Fe/) month = "feb" X if(xmonth ~ /^Mar/) month = "mar" X if(xmonth ~ /^Ap/) month = "apr" X if(xmonth ~ /^May/) month = "may" X if(xmonth ~ /^Jun/) month = "jun" X if(xmonth ~ /^Jul/) month = "jul" X if(xmonth ~ /^Aug/) month = "jan" X if(xmonth ~ /^Se/) month = "sep" X if(xmonth ~ /^O/) month = "oct" X if(xmonth ~ /^N/) month = "nov" X if(xmonth ~ /^D/) month = "dec" X } X } X if ( year !~ /^[0-9][0-9]*$/ ) year = "{ " year " }" X X # sort out the editors X if (E) { X tew = split(alleditors,z) X E = 1 X editor[E] = z[1] X sc = " " X i = 2 X while( i <= tew ){ X if ( z[i] == "and" ) { X E++ X editor[E] = "" X sc = "" X } X else { X lastc = substr(z[i],length(z[i]),1) X if(lastc == "," && z[i+1] !~ /^Jn*r\./ ) { X editor[E] = editor[E] sc substr(z[i], 1, length(z[i])-1) X if ( z[i+1] != "and" ) { X E++ X editor[E] = "" X sc = "" X } X } X else { X editor[E] = editor[E] sc z[i] X sc = " " X } X } X i++ X } X } X X # classify the reference X X if (J) { X #journal or conference X X if (C||I||((!V)&&(!N))) { X # its a conference if there is a city, publisher X # or no volume or issue number X conf++ X bibtype = inproceedings X jnl = "Booktitle" X jtype = "Conference proceedings" X } X else { X jour ++ X bibtype = article X jnl = "Journal" X jtype = "Journal" X } X X if ( B||E||R||(!P)) { X err=1 X if (B||E||R) X print "Journal & book?: Reference "refs" "keys >> errfil X if (!P) print "No page nos.? : Reference "refs" "keys >> errfil X } X if (err){ X print jtype " reference in error" >> errfil X } X } X else X if (B) { X # article in book X bibtype = incollection X X if (N||R||(!E)||(!I)||(!C)||(!P)||(V&&(!S))){ X err=1 X if (!E) print "No editor? Reference "refs" " keys >> errfil X if (!I) print "No publisher? Reference "refs" " keys >> errfil X if (!C) print "No city? Reference "refs" " keys >> errfil X if (!P) print "No page nos.? Reference "refs" " keys >> errfil X if (V&&(!S)) print "Volume but no Series Reference "refs" " keys >> errfil X if (N) print "Issue no.? Reference "refs" " keys >> errfil X if (R) print "Report? Reference "refs" " keys >> errfil X X } X if (err) print "Article in book reference in error" >> errfil X } X else X if (R) { X #report X bibtype = techreport X X if (E||N){ X err=1 X if (N) print "Issue no.? Reference "refs" " keys >> errfil X if (E) print "Editor? Reference "refs" " keys >> errfil X } X if (err) print "Report reference in error" >> errfil X if ( report ~ /M.*[Tt]hesis/ ) bibtype = mastersthesis X if ( report ~ /M.*[Dd]issert/ ) bibtype = mastersthesis X if ( report ~ /D.*[Tt]hesis/ ) bibtype = phdthesis X if ( report ~ /D.*[Dd]issert/ ) bibtype = phdthesis X # Working papers and preprints are classified as techreports X # if ( report ~ /[Ww]orking/ ) bibtype = unpublished X # if ( report ~ /[Pp]reprint/ ) bibtype = unpublished X if ( report ~ /[Uu]npublish/ ) bibtype = unpublished X if ( report ~ /[Mm]anual/ ) bibtype = manual X X if (bibtype== unpublished ) { X if (other == "") other = report X else other = report " " other X R = 0 X report = "" X O = 1 X } X else X if (bibtype == manual ) { X TY = 1 X type = report X # the publisher is really the organisation X report = publisher X R = I X I = 0 X institute = "Organization" X } X else X if ( bibtype == techreport ) { X trw = split(report,z) X if ( trw == 1 ) { X N = 1 X number = z[1] X } X else { X if ( ( z[1] ~ /^Tech/ ) && ( z[2] ~ /^Rep/ ) ) X { X z[1] = "" X z[2] = "" X TY = 0 X number = z[3] X for(i=4;i<=trw;i++) X number = number " " z[i] X if (trw >=3) N = 1 X } X else { X type = z[1] X i = 2 X while( i <= trw && ( z[i] !~ /^[0-9A-Z-]*$/ ) ) { X type = type " " z[i] X i++ X } X if(i> errfil X if (N) print "Issue no.? Reference "refs" " keys >> errfil X if (E) print "Editor? Reference "refs" " keys >> errfil X if (V&&(!S)) print "Volume but no Series Reference "refs" " keys >> errfil X } X if (err) print "Book reference in error" >> errfil X X if (S && (( series ~ /[Tt]ech.*[Rr]eport/ ) || ( series ~ /[Tt]ech.*[Mm]ono/ ))){ X bibtype = techreport X type = series X TY = 0 X if ( series ~ /[Mm]ono/ ) TY = 1 X institute = "Institution" X R = I X I = 0 X report = publisher X publisher = "" X } X } X else { X bibtype = misc X err=1 X if( other ~ /[Uu]npublished/ ) bibtype = unpublished X print "Unclassified reference" >> errfil X } X X # generate key X if(keys == "") keys = "ANON" X keys = keys substr(kyear,3,2) X if(keyused[keys] >=1) { X key_suffix = keyused[keys]++ X keys = keys addkey[key_suffix] X } X else keyused[keys] = 1 X X if (err) { X print "Key: " keys >> errfil X if(A) for (i=1;i<=A;i++) X print "%A " authors[i] >> errfil X if(T) print "%T " title >> errfil X if(J) print "%J "journal >> errfil X if(B) print "%B "booktitle >> errfil X if(V) print "%V "volume >> errfil X if(N) print "%N "number >> errfil X if(I) print "%I "publisher >> errfil X if(C) print "%C "city >> errfil X if(E) for (i=1;i<=E;i++) X print "%E "editor[i] >> errfil X if(S) print "%S "series >> errfil X if(P) print "%P "pages >> errfil X if(R) print "%R "report >> errfil X if(D) print "%D "date >> errfil X if(O) print "%O "other >> errfil X print "" >> errfil X } X X if ( other ~ /[Uu]npublished/ ) bibtype = unpublished X if ( other ~ /[Ee]dition/ ) { X twc = split(other,z) X for(i=1;i<=twc;i++) X if(z[i] ~ /[Ee]dition/) { X edition = z[i-1] X z[i-1] = "" X z[i] = "" X } X other = z[1]; X for(i=2;i<=twc;i++) { X other = other " " z[i] X } X if(!(other ~ /[^ ]/ )) { X # no non space characters X other = "" X O = 0 X } X } X X if(O&&H) other = header " " other X else if(H) { X O = H X other = header X } X X if (J) { X X # substitute the journal abbreviations from the standard styles X X journal = "{ " journal " }" X # {acmcs} {"ACM Computing Surveys"} X if ( journal ~ /Comp.* Sur/ ) journal = "acmcs" X # {acta} {"Acta Informatica"} X if ( journal ~ /Acta Inf/ ) journal = "acta" X # {cacm} {"Communications of the ACM"} X if ( journal ~ /Com.* ACM/ ) journal = "cacm" X if ( journal ~ /CACM/ ) journal = "cacm" X # {ibmjrd} {"IBM Journal of Research and Development"} X if ( journal ~ /IBM J.*R.*D/ ) journal = "ibmjrd" X # {ibmsj} {"IBM Systems Journal"} X if ( journal ~ /IBM Sy.*J/ ) journal = "ibmsj" X # {ieeese} {"IEEE Transactions on Software Engineering"} X if ( journal ~ /IEEE Tran.*Soft.*Eng/ ) journal = "ieeese" X # {ieeetc} {"IEEE Transactions on Computers"} X if ( journal ~ /IEEE Tran.*Computers/ ) journal = "ieeetc" X # {ieeetcad} X if ( journal ~ /IEEE Tran.*Comp.*Desig/ ) journal = "ieeetcad" X # {ipl} {"Information Processing Letters"} X if ( journal ~ /Inf.*Proc.*Lett/ ) journal = "ipl" X # {jacm} {"Journal of the ACM"} X if ( journal ~ /Jou.* ACM/ ) journal = "jacm" X if ( journal ~ /JACM/ ) journal = "jacm" X # {jcss} {"Journal of Computer and System Sciences"} X if ( journal ~ /J.*Comp.*Sys.*Sc/ ) journal = "jcss" X # {scp} {"Science of Computer Programming"} X if ( journal ~ /Sc.*Comp.*Prog/ ) journal = "scp" X # {sicomp} {"SIAM Journal on Computing"} X if ( journal ~ /SIAM .*Comp/ ) journal = "sicomp" X # {tocs} {"ACM Transactions on Computer Systems"} X if ( journal ~ /ACM Tran.*Comp.*Sys/ ) journal = "tocs" X # {tods} {"ACM Transactions on Database Systems"} X if ( journal ~ /ACM Tran.*Data.*Sys/ ) journal = "tods" X # {tog} {"ACM Transactions on Graphics"} X if ( journal ~ /ACM Tran.*Grap/ ) journal = "tog" X # {toms} {"ACM Transactions on Mathematical Software"} X if ( journal ~ /ACM Tran.*Math.*Soft/ ) journal = "toms" X # {toois} {"ACM Transactions on Office Information Systems"} X if ( journal ~ /ACM Tran.*Off.*Inf.*Sys/ ) journal = "toois" X # {toplas} {"ACM Transactions on Programming Languages and Systems"} X if ( journal ~ /ACM Tran.*Prog.*Lan.*Sys/ ) journal = "toplas" X # {tcs} {"Theoretical Computer Science"} X if ( journal ~ /Th.*Comp.*Sci/ ) journal = "tcs" X X } X X if(lwidth>0) { # split lines of potential over length X # titles, book titles, notes, abstracts, addresses, X # journals ( which may be conference proceedings) and institutions X if(T){ X twc = split(title,z) X title = z[1]; X lt = length(z[1])+13+length("Title") X for(i=2;i<=twc;i++) { X if(lt + length(z[i]) >= lwidth) X {sc = "\n\t\t";lt = 15;} X else sc = " "; X title = title sc z[i] X lt += length(z[i]) + 1 X } X } X X if(J){ # it may be a conference X twc = split(journal,z) X journal = z[1]; X if(twc>1) { # if its 1 then we have an abbreviation X lt = length(z[1])+11+length(jnl) X for(i=2;i<=twc;i++) { X if(lt + length(z[i]) >= lwidth) X {sc = "\n\t\t";lt = 15;} X else sc = " "; X journal = journal sc z[i] X lt += length(z[i]) + 1 X } X } X } X X if(B){ X twc = split(booktitle,z) X booktitle = z[1]; X lt = length(z[1])+13+length("Booktitle") X for(i=2;i<=twc;i++) { X if(lt + length(z[i]) >= lwidth) X {sc = "\n\t\t";lt = 15;} X else sc = " "; X booktitle = booktitle sc z[i] X lt += length(z[i]) + 1 X } X } X X if(C){ X twc = split(city,z) X city = z[1]; X lt = length(z[1])+13+length("Address") X for(i=2;i<=twc;i++) { X if(lt + length(z[i]) >= lwidth) X {sc = "\n\t\t";lt = 15;} X else sc = " "; X city = city sc z[i] X lt += length(z[i]) + 1 X } X } X X if(O){ X twc = split(other,z) X other = z[1]; X lt = length(z[1])+13+length("Note") X for(i=2;i<=twc;i++) { X if(lt + length(z[i]) >= lwidth) X {sc = "\n\t\t";lt = 15;} X else sc = " "; X other = other sc z[i] X lt += length(z[i]) + 1 X } X } X X if(R){ X twc = split(report,z) X report = z[1]; X lt = length(z[1])+13+length( institute ) X for(i=2;i<=twc;i++) { X if(lt + length(z[i]) >= lwidth) X {sc = "\n\t\t";lt = 15;} X else sc = " "; X report = report sc z[i] X lt += length(z[i]) + 1 X } X } X X if(X){ X twc = split(abstr,z) X abstr = z[1]; X lt = length(z[1])+13+length("Annote") X for(i=2;i<=twc;i++) { X if(lt + length(z[i]) >= lwidth) X {sc = "\n\t\t";lt = 15;} X else sc = " "; X abstr = abstr sc z[i] X lt += length(z[i]) + 1 X } X } X } X X # fiddle fields that might contain only digits X if ( volume !~ /^ *[0-9][0-9]* *$/ ) volume = "{ " volume " }" X if ( number !~ /^ *[0-9][0-9]* *$/ ) number = "{ " number " }" X if ( pages !~ /^ *[0-9][0-9]* *$/ ) pages = "{ " pages " }" X # print the reference X X bibs[bibtype] ++; X printf "@%s{\t%s",bibtype,keys X if(A) { X printf ",\n\tAuthor = { %s",authors[1] X for(i=2;i<=A;i++) printf " and\n\t\t%s",authors[i] X printf " }" X } X if(TY) printf ",\n\tType = { %s }", type X if(T) printf ",\n\tTitle = { %s }", title X if(B) printf ",\n\tBooktitle = { %s }", booktitle X if(E) { X printf ",\n\tEditor = { %s", editor[1] X for(i=2;i<=E;i++) printf " and\n\t\t%s", editor[i] X printf " }" X } X if(J) printf ",\n\t%s = %s", jnl, journal X if(S) printf ",\n\tSeries = { %s }", series X if(V) printf ",\n\tVolume = %s", volume X if(N) printf ",\n\tNumber = %s", number X if(P) printf ",\n\tPages = %s", pages X if(O) printf ",\n\tNote = { %s }", other X if( edition != "" ) X printf ",\n\tEdition = { %s }", edition X if(R) printf ",\n\t%s = { %s }", institute, report X # Non standard fields start X if(G) printf ",\n\tGovernmentNo = { %s }", govtorder X if(M) printf ",\n\tBellLabsMemo = { %s }", bellmemo X # Non standard end X if(I) printf ",\n\tPublisher = { %s }", publisher X if(C) printf ",\n\tAddress = { %s }", city X if(month != "") printf ",\n\tMonth = %s", month X if(D) printf ",\n\tYear = %s", year X if(L) printf ",\n\tKey = { %s }", label X if(K) printf ",\n\tKeywords = { %s }", keywords X if(X) printf ",\n\tAnnote = { %s }", abstr X printf "\t}\n\n" X X # initialise for next reference X X A=0;B=0;C=0;D=0;E=0;F=0;G=0;H=0;I=0;J=0; X K=0;L=0;M=0;N=0;O=0;P=0;Q=0;R=0;S=0;T=0; X U=0;V=0;W=0;X=0;Y=0;Z=0; TY=0; X bibtype = misc X edition = "" X type = "" X institute = "" X booktitle = "" X title = "" X volume = "" X alleditors = "" X city = "" X date = "" X month = "" X xmonth = "" X kyear = "" X year = "" X publisher = "" X journal = "" X number = "" X other = "" X page = "" X report = "" X series = "" X abstr = "" X bellmemo = "" X govtorder = "" X keywords = "" X label = "" X keys = "" X toterr +=err X rx++ X } X else if (comment) print "\t}\n" X err = 0 X percent = 0 # not in a reference X comment = 0 # not in a comment X pctchar = "" X next X } X X/^%T/ { X T ++ X if (T>1) { X err=1 X print "Two titles: Reference " rx >> errfil X print title >> errfil X } X title = entry X next X } X X/^%D/ { X D ++ X if (D>1) { X err=1 X print "Two dates: Reference " rx >> errfil X print date >> errfil X } X if (($NF<1900)||($NF>=2000)) { X err=1 X print "Date error? : Reference " rx >> errfil X } X date = entry X next X } X X/^%P/ { X P ++ X if ( P>1 ) { X err=1 X print "Two page nos? : Reference " rx >> errfil X print pages >> errfil X } X pages = entry X next X } X X/^%J/ { X J ++ X if ( J>1 ) { X err=1 X print "Two journals: Reference " rx >> errfil X print journal >> errfil X } X journal = entry X next X } X X/^%V/ { X V ++ X if ( V>1 ) { X err=1 X print "Two volumes: Reference " rx >> errfil X print volume >> errfil X } X volume = entry X next X } X X/^%N/ { X N ++ X if ( N>1 ) { X err=1 X print "Two issue numbers: Reference " rx >> errfil X print number >> errfil X } X number = entry X next X } X X/^[^%]/ { # non-blank-non-% lines X X if(FILENAME == keyfil ) X keyused[$1] = $2 X else X if(percent) X { # in the references X X if( pctchar == "A") authors[A] = authors[A] " " $0 X if( pctchar == "B") booktitle = booktitle " " $0 X if( pctchar == "C") city = city " " $0 X if( pctchar == "D") date = date " " $0 X if( pctchar == "E") alleditors = alleditors " " $0 X if( pctchar == "G") govtorder = govtorder " " $0 X if( pctchar == "H") header = header " " $0 X if( pctchar == "I") publisher = publisher " " $0 X if( pctchar == "J") journal = journal " " $0 X if( pctchar == "K") keywords = keywords " " $0 X if( pctchar == "L") label = label " " $0 X if( pctchar == "M") bellmemo = bellmemo " " $0 X if( pctchar == "N") number = number " " $0 X if( pctchar == "O") other = other " " $0 X if( pctchar == "P") pages = pages " " $0 X if( pctchar == "Q") authors[A] = authors[A] " " $0 X if( pctchar == "R") report = report " " $0 X if( pctchar == "S") series = series " " $0 X if( pctchar == "T") title = title " " $0 X if( pctchar == "V") volume = volume " " $0 X if( pctchar == "X") abstr = abstr " " $0 X } X else { X if (!comment) print "@Comment{" X print $0 X comment = 1 X } X next X } X X/^%C/ { X C ++ X if ( C>1 ) { X err=1 X print "Two cities: Reference " rx >> errfil X print city >> errfil X } X city = entry X next X } X X/^%I/ { X I ++ X if ( I>1 ) { X err=1 X print "Two publishers: Reference " rx >> errfil X print publisher >> errfil X } X publisher = entry X next X } X X/^%B/ { X B ++ X if ( B>1 ) { X err=1 X print "Two books: Reference " rx >> errfil X print booktitle >> errfil X } X booktitle = entry X next X } X X/^%E/ { # this really deals with 'bib' format which allows multiple X # %E fields X # refer only allows one %E field X # we split it when the reference is printed X E++ X if ( alleditors == "" ) alleditors = entry X else alleditors = alleditors " and " entry X next X } X X/^%O/ { X O ++ X if ( O>1 ) { X err=1 X print "Two others: Reference " rx >> errfil X print other >> errfil X } X other = entry X next X } X X/^%H/ { X H ++ X if ( H>1 ) { X err=1 X print "Two headers: Reference " rx >> errfil X print header >> errfil X } X header = entry X next X } X X/^%S/ { X S ++ X if ( S>1 ) { X err=1 X print "Two series: Reference " rx >> errfil X print series >> errfil X } X series = entry X next X } X X/^%R/ { X R ++ X if ( R>1 ) { X err=1 X print "Two reports: Reference " rx >> errfil X print report >> errfil X } X report = entry X next X } X X/^%X/ { X X ++ X abstr = entry X if ( X>1 ) { X err=1 X print "Two abstracts: Reference " rx >> errfil X } X next X } X X/^%K/ { X K++ X if (K>1) { X err=1 X print "Two keywords: Reference " rx >> errfil X print keywords >> errfil X } X keywords = entry X next X } X X/^%L/ { X L++ X if (L>1) { X err=1 X print "Two labels: Reference " rx >> errfil X print label >> errfil X } X label = entry X next X } X X/^%G/ { X G++ X if (G>1) { X err=1 X print "Two Gov't order Nos: Reference " rx >> errfil X print govtorder >> errfil X } X govtorder = entry X next X } X X/^%M/ { X M++ X if (M>1) { X err=1 X print "Two Bell Labs Memo Nos: Reference " rx >> errfil X print bellmemo >> errfil X } X bellmemo = entry X next X } X X/^%/ { # should not get these X F ++ X print "Unexpected flag: Reference " rx >> errfil X print $0 >> errfil X err = 1 X next X } X XEND { X print refs " references" >> errfil X if(toterr) print toterr " erroneous" >> errfil X if(bibs[article]>0) print bibs[article], " journal articles" >> errfil X if(bibs[book]>0) print bibs[book], " books" >> errfil X if(bibs[booklet]>0) print bibs[booklet], " booklets" >> errfil X if(bibs[inbook]>0) print bibs[inbook], " book extracts" >> errfil X if(bibs[incollection]>0) print bibs[incollection], " book articles" >> errfil X if(bibs[inproceedings]>0) print bibs[inproceedings], " conference papers" >> errfil X if(bibs[manual]>0) print bibs[manual], " manuals" >> errfil X if(bibs[mastersthesis]>0) print bibs[mastersthesis], " Master's theses" >> errfil X if(bibs[misc]>0) print bibs[misc], " miscellaneous" >> errfil X if(bibs[phdthesis]>0) print bibs[phdthesis], " PhD theses" >> errfil X if(bibs[proceedings]>0) print bibs[proceedings], " conference proceedings" >> errfil X if(bibs[techreport]>0) print bibs[techreport], " technical reports" >> errfil X if(bibs[unpublished]>0) print bibs[unpublished], " unpublished papers" >> errfil X X for(k in keyused) print k, keyused[k] > nkeyfil X X } XZZ X X$DEBUG && echo "Generated: <$GEN>" 1>&2 X$DEBUG && echo "Width: <$WIDTH>" 1>&2 X$DEBUG && echo "Errors in: <$ERRFILE>" 1>&2 X$DEBUG && echo "Authors used in Keys: <$NAUTHOR>" 1>&2 X$DEBUG && echo "Characters/Author used in Keys: <$LAUTHOR>" 1>&2 X X# Process each file, or if none given, standard input Xfor FILE in ${*-$STDIN} Xdo X cp $KEYFILE $NKEYFILE X X# First set up shell variables as required X if [ "$FILE" = "$STDIN" ] X then X NEWFILE=$NAME.$BIB X else X if [ -r "$FILE" -a -f "$FILE" ] X then X NAME=`basename $FILE` X NEWFILE=$FILE.$BIB X else X NAME="" X echo "$PROGNAME: Can't read $FILE" 1>&2 X fi X fi X X# If all is OK, read input and terminate with a blank line. X if [ "$NAME" ] X then X if [ "$FILE" = "$STDIN" ] X then X# If no files given, read from standard input. X $DEBUG && echo "Reading from standard input" 1>&2 X cat X echo X else X $DEBUG && echo "Reading <$FILE>" 1>&2 X cat $FILE X echo X fi | X# do the conversions X sed -f $SEDFILE | X awk -f $AWKFILE $KEYFILE - | X# Finally, output to named files or standard output X if $NAMEDFILES X then X $DEBUG && echo "Output to <$NEWFILE>" 1>&2 X cat > $NEWFILE X else X cat X fi X fi X cp $NKEYFILE $KEYFILE Xdone X Xecho >> $ERRFILE Key Frequencies Xsort < $KEYFILE >> $ERRFILE X[ $AKEYFILE ] && [ -w $AKEYFILE ] && cat $KEYFILE > $AKEYFILE Xexit 0 End-of-file echo Extracting $PRGRM.1 sed -e 's/^X//' -e 's/prgrm/'$PRGRM'/' <<'End-of-file' >$PRGRM.1 X.TH prgrm 1-local X.SH NAME Xprgrm \- convert refer input files to bibtex .bib files X.SH SYNOPSIS X.B prgrm X[options ...] [files ...] X.br X.SH DESCRIPTION X.B prgrm Xreads the X.I files Xand produces a X.B bibtex Xreference list (a .bib file) on the standard output. XIf no files are given, prgrm reads Xstandard input. X.PP XA rudimentary attempt is made to convert X.I troff Xspecial characters and accents to the equivalent X.I TeX Xones. XThe file ``prgrm.errs'' contains complaints about references that were Xnot recognised, and other problems, as well as a summary of the Xnumber of conversions completed. X.PP XSince X.B refer Xfiles are inherently unstructured (compared to X.B bibtex ) X.B prgrm Xonly does a passable job. In particular X.B refer Xdoesn't require a keyword, while X.B bibtex Xdoes. X.B prgrm Xgenerates one using the following procedure: Xthe first three characters of the last names of the first three authors Xare concatenated, (preserving the capital letters), and the last two Xdigits of the date are appended. If this key has already been used, Xthen 'a', 'b', 'c', are appended as needed. XThere is an optional facility to start the key useage where it left off Xin some previous conversion, by supplying a file containing the keys used. X.PP XJournal entries that appear to be in the standard bibliography style Xfiles list of @strings, are converted. XThe %D field is converted to month and year entries if there are two Xfields, otherwise it is assumed to contain only the year. XA large number of proper names, such as Hilbert, Turing, etc., Xwhich are often found in the titles of articles are enclosed in braces X{} to protect them. This treatment is also applied to any strings of Xtwo or more consecutive capital letters, or to any isolated single Xcapital letter (except A). XThe user can supply an extra list of names to have their capitalisation Xprotected. X.PP XTo determine the type of reference that the X.B refer Xentry is, X.B prgrm Xhas to do some ``calculated guessing''. The heuristic used Xhere (again, in order of precedence) is: X.PP X1. If it has a journal entry (%J) then it's considered to Xbe an @article, unless there is a city entry (%C) or a publisher entry X(%I) as well, in which case it's treated as an @inproceedings. XIf there is no volume or number (%V or %N) entry it will be considered a Xconference proceedings. X.PP X2. If it has a book entry (%B) then it's considered to Xbe an @incollection. X.PP X3. If it has a report entry (%R) then it's considered to Xbe a @techreport. XIf the %R field contains the word ``Dissertation'' or ``Thesis'', Xthe classification will be @phdthesis or @mastersthesis. XIf the %R field contains the word ``Manual'' then it will be classified Xas a @manual, and if it contains ``Unpublished'' Xit will be classified as a @unpublished. X.PP X4. If it has a issuer entry (%I) then it's considered to Xbe a @book. X.PP X5. Otherwise it's considered to be a @misc, or @unpublished. XAll these entries are listed in the ``prgrm.errs'' file. XThe decision to classify it as @unpublished is made if the Xword ``unpublished'' appears in the %O field. This word is deleted. X.PP XQuite often X.B prgrm Xwill misguess and you will need to edit (by hand) the resulting .bib Xfile. X.PP XAny fields that X.B prgrm Xdoesn't know about it will ignore (and complain about on stderr). X.PP XThe output is normally folded at word boundaries to ensure that Xlines do not become too long on output. XJournal entries that correspond to the abbreviations in the standard Xbibliography styles are abbreviated, as are months mentioned in the date Xfield. X.PP XNon blank lines that appear outside a reference are accumulated Xand printed as a @comment section. In fact BiBTeX would ignore them Xas refer does, but identifying them separately seems cleaner, and might Xmake prgrm suitable for converting refer bibliographies to scribe Xformat. X.SH OPTIONS XThe following options are available: X.TP 10 X.BI \- num XSpecify a maximum width for the output. XThe default is 72 characters. XIf X.I num Xis omitted then lines are not folded and may be of any length. X.TP 10 X.BI \-e " file" XUse the next argument as the file name in which to print the errors Xand summary of output. X.TP 10 X.BI \-kf " file" XUse the next argument as a file name to read the current state of key Xusage from, and to save the key usage data at the end of the conversion. XThis allows extra databases to be converted, without having to convert Xall the old ones, and keeps the keys unique over all the databases. X.TP 10 X.BI \-k " n" XUse the next argument to decide how many authors names to use in generating the Xkey. (Default 3). X.TP 10 X.BI \-l " n" XUse the next argument to decide how many charcters from each authors name Xto use in generating the key. (Default 3). X.TP 10 X.B \-n XUse the name of the input file(s) to produce output file(s) with Xthe same name and extension ``.bib'' rather than sending the output Xto standard output. X.TP 10 X.B \-u XDisplay the usage of the command. X.TP 10 X.B \-w XDo not fold the output. Lines may be of any length. X.SH ACKNOWLEDGMENT XThis manual page is based on the manual page for X.I r2bib , Xa program which performs a simpler version of the same conversion, Xwritten by XRusty Wright, Center For Music Experiment, University of California San XDiego. XThe options and their processing is based on the X.I ref2bib Xwritten independently by Jonathan Bowen of the Programming Research Group, XOxford University. XA number of the heuristics are also copied from Jonathan Bowen's Xscript. X.SH AUTHOR XPeter King, Computer Science Department, Heriot-Watt University, XEdinburgh. X.SH BUGS XImplemented as a X.I sh(1) Xscript, using X.I sed(1) Xand X.I awk(1) . XThis makes the conversion very slow, but also means that it is easily Xmodified to alter the heuristics. In particular, the key generation Xalgorithm is easily changed. X XThe heuristics for identifying theses, unpublished papers, etc. are Xrather crude. End-of-file echo End of archive exit 0 -------