Electronic Publishing -- Origination, Dissemination and Design
Sample Papers

Here are the abstracts of the seven sample papers freely available from the EP-odd PDF archive. Select the article title or author's name to download the paper.

Tools for printing indexes, Jon L. Bentley and Brian W. Kernighan

This paper describes a set of programs for processing and printing the index for a book or a manual. The input consists of lines containing index terms and page numbers. The programs collect multiple occurrences of the same terms, compress runs of page numbers, create permutations (e.g., `index, book' from `book index'), and sort them into proper alphabetic order. The programs can cope with embedded formatting commands (size and font changes, etc.), with roman numeral page numbers, and with see terms. The programs do not help with the original creation of index terms. The implementation runs on the UNIX operating system. It uses a long pipeline of short awk programs rather than a single program in a conventional language. This structure makes the programs easy to adapt or augment to meet special requirements that arise in different indexing styles. The programs were intended to be used with troff, but can be used with a formatter like Tex with minor changes. An appendix contains a complete listing of the programs, which total about 200 lines.

A search strategy for large document bases, Dario Lucarella

In this paper, we emphasize the need of modelling the inherent uncertainty associated with the information retrieval process. Within this context, a search strategy is proposed for locating documents which are likely to be relevant to a given query. A notion of closeness between document(s) and query is introduced and the implementation of an improved algorithm for the identification of the closest document set is presented with emphasis on computational efficiency.

Paragraph-based nearest neighbour searching in full-text documents, Suliman Al-Hawamdeh and Peter Willett

This paper discusses the searching of full-text documents to identify paragraphs that are relevant to a user request. Given a natural language query statement, a nearest neighbour search involves ranking the paragraphs comprising a full-text document in order of descending similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query. This approach is compared with the more conventional Boolean search which requires the user to specify the logical relationships between the query terms. Comparative searches using 130 queries and 20 full-text documents demonstrate the general effectiveness of the nearest neighbour model for paragraph-based searching. It is shown that the output from a nearest neighbour search can be used to guide a reader to the most appropriate segment of an online full-text document.

Automatically transforming regularly structured linear documents into Hypertext, Richard Furuta, Catherine Plaisant, and Ben Shneiderman

Fully automatic conversion of a paper-based document into hypertext can be achieved in many cases if the original document is naturally partitioned into a collection of small-sized pieces that are unambiguously and consistently structured. We describe the methodology that we have used successfully to design and implement several straightforward conversions from the original document's machine-readable markup.

Active Tioga documents: an exploration of two paradigms, Douglas B. Terry and Donald G. Baker

The advent of electronic media has changed the way we think about documents. Documents with illustrations, spreadsheets, and mathematical formulae have become commonplace, but documents with active components have been rare. This paper focuses on our extensions to the Tioga editor to support two very different styles of active documents. One paradigm involves dynamically computing, or at least transforming, the contents of a document as it is displayed. A second paradigm uses notifications of edits to a document to trigger activities. Document activities can include database queries, which are evaluated and placed in the document upon opening the document, or constraints between portions of a document, which are maintained as the user edits the document. The resulting active documents can be viewed, edited, filed, and mailed in the same way as regular documents, while retaining their activities.

Automatic structuring of text files, G. Salton, C. Buckley and J. Allan

In many practical information retrieval situations, it is necessary to process heterogeneous text databases that vary greatly in scope and coverage, and deal with many different subjects. In such an environment it is important to provide flexible access to individual text pieces, and to structure the collection so that related text elements are identified and appropriately linked.

Methods are described in this study for the automatic structuring of heterogeneous text collections, and the construction of browsing tools and access procedures that facilitate collection use. The proposed methods are illustrated by performing searches with a large automated

Journal publishing with Acrobat: the CAJUN project, Philip N. Smith, David F. Brailsford, David R. Evans, Leon Harrison, Steve G. Probets and Peter E. Sutton

The publication of material in `electronic form' should ideally preserve, in a unified document representation, all of the richness of the printed document while maintaining enough of its underlying structure to enable searching and other forms of semantic processing. Until recently it has been hard to find a document representation which combined these attributes and which also stood some chance of becoming a de facto multi-platform standard.