|
Indexer
by zweibieren
physpics.com
(Release V1.0)
|
Overview
Indexer displays the pages
of
a book and the index entries for each page. You use Indexer to add entries, delete
entries, and create new terms that can be applied as entries to
pages. Finally, the Create Index
command combines the entries from all pages to produce the index.
Terminology: I try to
consistently use "term" to refer to items
that are available to appear in the index. A term that has been chosen
for a page is an "entry;" it will actually appear in the index. For
instance, one of the terms may be "labor party" and it might be added
as an entry for page 20. In the final index, the entry for "labor
party" will list 20 as one of the pages.
The index can be output as text or HTML. Here's a sample in HTML.
|
- labor force
- composition
of, 67, 94, 96, 103, 131, 188n8–9, 191n10
- growth in,
103, 108, 110, 111, 117, 174, 177, 180
- labor party,
xi, 14, 15, 20, 21, 122, 124-127, 184n3
- labor union
- membership,
4, 15, 16, 35, 36, 52, 57, 59, 74, 81, 93, 94, 159,
189n21, 194n19
|
Starting Indexer
Indexer is a shortcut
in its installation
directory and may be installed on the desktop. It may also be an
item in the start menu
.
Click on any one of these to start Indexer.
Or drag a prepared .txt file or directory and
drop it onto the shortcut. "Prepared" means that the .txt file
has $@ page number lines and the directory has index terms in a file
called indexterms.txt; see Getting Ready.
From a command line, start Indexer with the command
java -jar installdir/indexer.jar filename-optional
where installdir is the
Indexer installation directory. Adding a file or directory name at the
end of the line will open that file or switch to that directory. After
the application starts, you can choose a chapter to work on with
Open Chapter on the File menu. TO enter demo mode directly, give the
command
java -jar installdir/indexer.jar -demo
In this mode, Indexer edits a single built -in file and does not access
the file system.
When Indexer starts there will be a DOS command window for severe error
messages (a stack trace). When an error occurs, please send the
contents of this window to me.along
with a note describing what caused the error.
Indexer remembers the directory you are working in by
storing the name in ".Indexer.ini" in your home directory. Without
a file or directory argument, Indexer starts in that directory. A
properly set up directory for Indexer work has a file called
indexterms.txt. (See below.)
If a directory does not have a file indexterms.txt, your are prompted
to create one. If you decline, you can still work, but entries selected
will be remembered only with the files you create index entries for.
(And any phrases will be lost.)
"Indexer" Main Window
The main Indexer window names the current file and directory in the
title bar:

From the left the columns show the page number, the contents of
the page, and the index terms that have been selected for that page. As
the page was read in, Indexer scanned it for phrases (as given in
indexterms.txt). In the picture, the phrases "race to the bottom"
and "slavery" resulted in index entries of the same. "Interstate
competition" and "Levi" resulted in "labor costs, state"
subhead "interstate competition" and "Levi, Margaret." The term United
States of America was added with the Add
Entry command. The
phrase "labor costs" is red because that phrase ties to
two different index terms. Neither was automatically listed, so you
need to review red phrases to see
if any index terms should be added for that page.
The index entries on the "active" page are hi-lit in yellow. Additions
and removal of index
entries occur there. Indexer makes sure that the yellow is
associated with one of the pages currently on-screen. If the text is
scrolled, the yellow moves to a visible page.
Command Buttons
The menu bar has three buttons for the commands of Indexer. They are used thus:
- Scroll the text to a page. With scrollbar,
pageup/pagedown, or a line at a time with the arrow keys. The
index entries for the "current page" are in yellow.
- Click on an entry in the yellow area and click Remove Entry. The entry
is removed.
- Click on a term in the Available Index Terms window and click Add Entry. The
term is added to the yellow area.
- Click Create new index term ...
and you will be prompted for an
entry to be added to the Available Index Terms window.
Commands can be invoked from menus, and also from the keyboard:
Command
|
Keystrokes
|
Add
Entry
|
+or Insert or
Control-A
|
Remove
Entry
|
- or Delete or
Control-D
|
Create
new index term ...
|
Control-N
|
Save
entries
|
Control-S
|
When prompted for a new index term, you can add a new term or add a
crossreference. For adding a term you will see three fields:
The trigger phrase is one
or more
words; I'll describe this a little later. The
index entry is whatever you put in the main
heading field together with the sub heading field, if you decide to
have one. As long as the entry does not duplicate an existing
entry, it is also added at the end of the Available
Index Terms window. It is also immediately inserted in the
indexterms.txt file. The trigger phrase is for automatic initial
creation of the index entries. Each time a chapter is read in, it is
scanned for instances of the trigger phrases. Any page with a match is
given that index term initially. See the description
of the text window.
Clicking the "Cross reference" tab at the top of the dialog box brings
up the fields for entering a cross reference:
The "under" term is the term in the index where this cross reference
will appear; The "See" term is the one that is referred to. The "under"
term might be NEA and the "See" term "National Education Association
(NEA)" Then the index would have entries
National Education Association
(NEA) 12, 20, 44
NEA. See National Education Association
(Note the special case for acronyms. The trailing instance of "(NEA)"
is stripped from the entry for NEA, but appears in the entry
referred to.)
After adding terms, you may want to rescan the chapter text
to find instances of their phrases. The way to do this is to
open the chapter again with
Open Chapter
from the File menu.
The File Menu
Open Chapter - Prompts for
a new chapter and opens it. The file must be a text file and its name
must end in .txt. It will be scanned for phrases and those found will
be colored blue (or red if multiple terms have matching phrases).
Save Entries - For chapter
xxx.txt, this command creates file xxx-index.txt and stores into it all
the index entries.It even remembers which entries you have deleted. The
chapter is rescanned every time it is opened, but deleted entries do
not
come back. Entries are saved automatically when you open another
chapter, you exit the program, or when a five minute timer fires.
Create Index ... - You are
prompted with a list of all the ...-index.txt files in the current
directory. When you click "Index in text" or "Index in html", the
checked files are read, the entries are sorted, and an index is created
in index.txt or index.html, respectively.
New Terms Window - A new
instance of the Available Index Terms window is opened. All such
windows look and behave alike, except that they may be scrolled
differently and each may have its own set of selected entries. The
selection is visible only when the window has the input focus.
Exit - Indexer saves any
entries. For filename.txt, entries are saved to
filename-index.txt. Entries are automatically saved when you
switch to another file or exit the program. They are also saved
every five minutes,
The Help Menu
About Indexer - Displays some
mildly useful information, especially the current directory and file
name. You should report the version number in error reports.
Help - Brings up a window
displaying this very file.
Enter demo mode - In demo
mode, Indexer works on a single built-in file and set of index terms.
Creating an index shows it on the screen instead of saving it to a
file. Things to try:
- Choose menu item File/CreateIndex and either html or text.
- See the nice index.
- Click "shadow" at the bottom of the terms list window.
- It turns blue.
- Click on a page in the main window.
- Its index entries turn yellow.
- Click the Add Entry button at the top.
- "shadow" gets added to the entries in the yellow area.
- Choose File/CreateIndex again.
- Now the index has an entry for "shadow" and a cross
reference to it.
|
If you add "dark" as a term on some page(s), more cross references will
appear. (Cross references do not appear unless the term they point at
has associated entries.)
|
|
Any term in the "Available
Index Terms"
window can be assigned to any page in the text. Scroll through
the list. Select a
term. It turns blue. Click the Add
Entry
button, and that term becomes an entry for the current page. Select two
or more
consecutive terms. They get blue. Click the Add Entry
button, and they all become entries for the page.
If you want a new term, use the Create
new index term ... button. If you want another copy of the
entire window, use New Terms Window
in the File menu. The contents
of the window are derived from indexterms.txt
in the same directory as
the open chapter. |
Indexer operates within a directory containing the document to be
indexed and various index files. Indexer is distributed as a java
archive file, Indexer.jar. Put a copy of Indexer.jar into the directory
for an index. Then click on that file in a listing in an explorer
window. The Indexer windows will appear and you can work on the index.
Alternatively, you can run Indexer from a command line. Change
directory to the directory containing the index files. Then give the
command:
java -jar pathname\Indexer.jar
A shortcut to Indexer.jar can be adapted by editing its properties.
The target property should have something like this:
"dir-j\Indexer.jar"
Where dir-j is the path to Indexer.jar;
typically something like:
C:\Program Files\Physpics\Indexer.jar
Then you can drag and drop a file or directory onto the shortcut icon
and it
will open itself to edit the first of the dropped files.
Indexing is done in a directory devoted to that task. To begin,
there
must be a list of potential index entries in a file called
indexterms.txt
and also one or more chapter text files whose names end with .txt.
Later, there will be one name-index.txt file for each
chapter and final index files named index.txt and/or index.html.
Chapter text files: name.txt
Each section of the book (it doesn't have to be a chapter) needs to
be
in the Indexer directory as a text file. Each page of text must
begin with a line containing "$@" and the page number:
$@1
Chapter 1.
Call me Ishmael. ...
For my indexing task, I created these files by first writing the
chapter as a text file using the original word processing
program. Then I went through the text with a text editor (wordpad
would work) and added the page number lines. For a two hundred page
book, this took an hour or two. {I did cheat a bit. I used xemacs and a
special purpose macro so a mouse click and a single keypress would
enter each successive page number.
The macro is
initialized to a starting page number with
\C-u number \C-x r
n p
Then the next page
number is added by typing \C-z
where that key is bound by running
(fset 'page-number
[return return ?$ ?@ return
left ?\C-x ?r ?+
?p ?\C-x ?r ?g ?p ?\C-e])
(global-set-key [26] 'page-number)
}
Terms file: indexterms.txt
The meat of the indexterms.txt is
lines containing
index
terms. The simplest form of a term line is
<phrase> <WHITE> <term>
where <WHITE> is some combination of TABs and SPACEs. Since
<phrase> and <term> can each have spaces, <WHITE>
must be at least a TAB or two spaces. More TABs and SPACEs are okay.
When Indexer first looks at a chapter text, it scans for instances
of the <phrase>s. If a <phrase> is found, the corresponding
<term> is added as an entry for the page.
When a phrase is recognized in the text, it is colored red or blue. If
blue, the corresponding term has been made an entry for the page. It
can be deleted from the list of entries if it is not appropriate. The
phrase is colored red if that phrase appears for more than one index
term. For instance, in the first book indexed, the phrase Washington
appeared for four index terms: the state of Washington, the city
Washington, D. C., the president George, and Tom, an author of a
referenced work. So no entries for Washington were made automatically
and Washington was red everywhere it appeared.
Phrase words can contain only letters, hyphens, and apostrophes. Other
characters are ignored. The phrase can be omitted and then that
term is never automatically added to a page by the initial scan. If the
phrase is left out, there must be leading white space, as in
<WHITE> <term>
If an index category is subdivided, an index term may have subterms.
These are written in indexterms.txt
as lines of the form
<phrase> <WHITE> <term>
<COLON> <subterm>
Here <COLON> is a SPACE, a colon, and a SPACE. The corresponding
index entry will be
term
subterm page numbers ...
As a convenience, the <term> may be omitted if it is the same as
for the preceding line. These lines would then have the form
<phrase> <WHITE> <COLON>
<subterm>
Besides terms, indexterms.txt may
contain blank and comment
lines. Comments begin with "//". One comment line can have the form
// title: title words ...
When the index is generated in html, this book title will appear as the
page title for the html page.
The first book indexed had phrases for both New York and New York
Times. This works because the longest phrase found is the one used. But
"York Times" would not work; the text "New York Times" is recognized as
"New York" and not as an instance of "York Times".
As new entries are added to indexterms.txt, it gets chaotic.
Usually I resorted entries alphabetically using Excel. A general
programmatic scheme would certainly be possible and will be considered
for the next version of Indexer.
Cross reference entries
Cross references are index entries that direct the reader to look at
other index entries. They appear in the index as "see ..." and "see
also ...," as in
NYT. See
New York Times
race/ethnicity
home ownership and, 25n3
equality, struggle for (see racial equality, struggle for)
political party polarization, 102-3 (see also polarization, racial)
See also
Eastern Europeans; Asians.
These are incorporated in indexterms.txt with lines having the form
index term .SEE. index term
where either index term may be just a main heading, or may be a main
heading, " : ", and a sub heading. For instance
NEA : members .SEE. National Education
Association (NEA) : membership
which will generate in the index as
National Education Association (NEA)
membership 23, 25, 167-71
NEA
members (see National Education
Association, membership)
Neither the term before or after .SEE. can have an associated phrase.
To assign a phrase, put in another line that gives the phrase and its
index term.
For the indexterms.txt line "xxx .SEE. yyy", the Available Index Terms
window will have a listing of yyy.
http://www.asindexing.org/site/checklist.shtml
American Society of Indexers

Indexing
Evaluation Checklist
The Index is the KEY to the book
Is the index to
your book or web
site good enough for your readers?
Here are some helpful insights for ensuring an excellent index.
| "An index is not an outline,
nor is it a concordance. It's an
intelligently compiled list of topics covered in the work, prepared
with the reader's needs in mind." |
| Reader
Appropriateness |
- Are the indexed terms appropriate for the intended
audience? For example: "heart attack" in a book for the general public,
"myocardial infarction" in a book for health professionals; "Taxus" in
a work for botanists or horticulturalists, "Yew" in a work for home
gardeners.
|
| Main Headings |
- Are the main headings relevant to the needs of the
reader? Are they pertinent, specific, comprehensive? Not too general
yet not too narrow? Not inane or improbable?
- Do main headings have not more than 5–7 locators
(page references)? If more, they should be broken down into subheadings.
|
| Subheadings |
- Are the subheadings useful? In the example below,
a) the page range is extensive
b) the subheading "problems with Republicans" may be too general
Roosevelt, Franklin
problems with
Republicans, 1–32
- Are subheadings concise, with the most important word
at the beginning? For example, not:
banks
and relationship
to Federal Reserve bank
but
banks
Federal Reserve
regulation
- Do entries avoid unnecessary words and phrases like
"concerning" and
"relating to" and proliferation of prepositions and articles?
- Is the number of subheadings about right? More than
one column’s worth is probably too many.
- Are subheadings overanalyzed?
Could they be combined? For example, could "dimensions" be substituted
for "height," "width," and "length"? Or should some subheadings become
main headings with their own subheadings instead?
- Do subheadings have more than 5–7 locators? If more,
they should either be broken down into sub-subheadings or be changed to
main headings.
|
| Double Postings |
- For the reader’s convenience, many subheadings should
be double posted—that is, they should exist as main headings too. An
example: "Cats: Siamese" and "Siamese cats." Has this been done? Double
postings should, of course, have the same locators. Do they?
|
Locators
(Page References) |
- Are the locators accurate? Check a sample of entries
to see. Spot-check pagination for nonsense numbers where the hyphen or
en dash may be missing, such as 18693 for 186-93. Check that elision
(page ranges such as 186-93) is consistent.
- When locators include roman numerals or volume
numbers, does the typography make the usage clear?
|
| Cross-References |
- Have see and see also
cross-references been provided?
- A see should direct the reader to a
different term expressing the same concept, such as "Clemens, Samuel. See
Twain, Mark" or "aerobics see exercise".
- A see also should guide the
reader from a complete entry to the related entries for more and
different information. Examples: "Mammals: 81, 85, 105; see also
names of individual mammals" "astronomy 12–14, 56, 68. See also
galaxies; planets"
|
| Length and Type |
- Is the index length adequate for the complexity of
the book? An index should be 3–5% of the pages in the typical
nonfiction book, perhaps 5–8% for a history or biography, and more
(15–20%) for reference books.
- Is there a need for more than one type of index? For
example, in addition to the usual subject index, perhaps a separate
name or place index is called for. If so, is there one?
|
| Format |
- Is the type large enough to be easily read? Do the
index pages look open and not crowded?
- Are the main headings and subheadings (and
sub-subheadings if any) distinguished from each other?
- Is the organization—whether alphabetical,
chronological, or other—accurate, clear, and consistent?
- When an entry’s subheadings "turn a page" that is,
are continued from a right-hand page to a left-hand page, the main
heading should be repeated, followed by the word continued in
parentheses. Depending on the size of the pages, continued
headings might be appropriate for continuations from left to right
pages, or even from left to right columns. Are they present?
- Preferences for punctuation between main headings and
their subheadings and see and see also
cross-references will vary from publisher to publisher. This discussion
features several acceptable variants. The important thing is that the
punctuation style be clear to the reader and consistent. Is it?
|