Indexer logo

Indexer

by zweibieren
physpics.com
(Release V1.0)

Topics      

Overview
Starting Indexer
Indexer main window
Indexer terms window
Getting ready to start
Criteria for a good index

Overview

Indexer displays the pages of a book and the index entries for each page. You use Indexer to add entries, delete entries, and create new terms that can be applied as entries to pages. Finally, the Create Index command combines the entries from all pages to produce the index.

At the right is a sketch of the main Indexer window.: page number to the left, text of the page in the middle, and applicable index entries on the right. Yellow highlights the current work area, where one or more entries are high-lit in blue. Those entries can be deleted by clicking the Remove Entry button.

Indexer main window (blurred)
Available index terms are in a separate window like the one at the right. One or more consecutive terms are high-lit in blue. These will be inserted in the yellow area if the Add Entry button is clicked. The Create new term... button prompts for and adds a new term to the list. You can have multiple terms window showing different parts of the set of terms. terms window (blurred)

Terminology: I try to consistently use "term" to refer to items that are available to appear in the index. A term that has been chosen for a page is an "entry;" it will actually appear in the index. For instance, one of the terms may be "labor party" and it might be added as an entry for page 20. In the final index, the entry for "labor party" will list 20 as one of the pages.

The index can be output as text or HTML. Here's a sample in HTML.


labor force
composition of, 67, 94, 96, 103, 131, 188n8–9, 191n10
growth in, 103, 108, 110, 111, 117, 174, 177, 180
labor party, xi, 14, 15, 20, 21, 122, 124-127, 184n3
labor union
membership, 4, 15, 16, 35, 36, 52, 57, 59, 74, 81, 93, 94, 159, 189n21, 194n19

Starting Indexer

Indexer is a shortcut Indexer shortcut imagein its installation directory and may be installed on the desktop.  It may also be an item in the start menu Startmenu icon. Click on any one of these to start Indexer. Or drag a prepared .txt file or directory and drop it onto the shortcut.  "Prepared" means that the .txt file has $@ page number lines and the directory has index terms in a file called indexterms.txt; see Getting Ready.

From a command line, start Indexer with the command
        java -jar installdir/indexer.jar  filename-optional
where installdir is the Indexer installation directory. Adding a file or directory name at the end of the line will open that file or switch to that directory. After the application starts, you can choose a chapter to work on with Open Chapter on the File menu. TO enter demo mode directly, give the command
       java -jar installdir/indexer.jar  -demo
In this mode, Indexer edits a single built -in file and does not access the file system.

When Indexer starts there will be a DOS command window for severe error messages (a stack trace). When an error occurs, please send the contents of this window to me.along with a note describing what caused the error.

Indexer remembers the directory you are working in by storing the name in ".Indexer.ini" in your home directory. Without a file or directory argument, Indexer starts in that directory. A properly set up directory for Indexer work has a file called indexterms.txt. (See below.)

If a directory does not have a file indexterms.txt, your are prompted to create one. If you decline, you can still work, but entries selected will be remembered only with the files you create index entries for. (And any phrases will be lost.)

"Indexer" Main Window

The main Indexer window names the current file and directory in the title bar:

the Indexer main window

From  the left the columns show the page number, the contents of the page, and the index terms that have been selected for that page. As the page was read in, Indexer scanned it for phrases (as given in indexterms.txt). In the picture, the phrases "race to the bottom" and "slavery" resulted in index entries of the same. "Interstate competition" and "Levi" resulted in "labor costs, state" subhead "interstate competition" and "Levi, Margaret." The term United States of America was added with the Add Entry command. The phrase "labor costs" is red because that phrase ties to two different index terms. Neither was automatically listed, so you need to review red phrases to see if any index terms should be added for that page.

The index entries on the "active" page are hi-lit in yellow. Additions and removal of index entries occur there. Indexer makes sure that the yellow is associated with one of the pages currently on-screen. If the text is scrolled, the yellow moves to a visible page.

Command Buttons

The menu bar has three buttons for the commands of Indexer.  They are used thus:

Commands can be invoked from menus, and also from the keyboard:

Command
Keystrokes
Add Entry
+or Insert or Control-A
Remove Entry
- or Delete or Control-D
Create new index term ...
Control-N
Save entries
Control-S

When prompted for a new index term, you can add a new term or add a crossreference. For adding a term you will see three fields:

Adding an index term

The trigger phrase is one or more words; I'll describe this a little later. The index entry is whatever you put in the main heading field together with the sub heading field, if you decide to have one. As long as the entry does not duplicate an existing entry, it is also added at the end of the Available Index Terms window. It is also immediately inserted in the indexterms.txt file. The trigger phrase is for automatic initial creation of the index entries. Each time a chapter is read in, it is scanned for instances of the trigger phrases. Any page with a match is given that index term initially. See the description of the text window.

Clicking the "Cross reference" tab at the top of the dialog box brings up the fields for entering a cross reference:
Four fields for entering a cross-reference: the term where the reference will appear and the term it refers to.
The "under" term is the term in the index where this cross reference will appear; The "See" term is the one that is referred to. The "under" term might be NEA and the "See" term "National Education Association (NEA)" Then the index would have entries
 National Education Association (NEA) 12, 20, 44
NEA. See National Education Association
(Note the special case for acronyms. The trailing instance of "(NEA)" is stripped from the entry for NEA, but appears in the entry  referred to.)

After adding terms, you may want to rescan the chapter text to find instances of their phrases. The way to do this is to open the chapter again with Open Chapter from the File menu.

The File Menu

Open Chapter - Prompts for a new chapter and opens it. The file must be a text file and its name must end in .txt. It will be scanned for phrases and those found will be colored blue (or red if multiple terms have matching phrases).

Save Entries - For chapter xxx.txt, this command creates file xxx-index.txt and stores into it all the index entries.It even remembers which entries you have deleted. The chapter is rescanned every time it is opened, but deleted entries do not come back. Entries are saved automatically when you open another chapter, you exit the program, or when a five minute timer fires.

Create Index ... - You are prompted with a list of all the ...-index.txt files in the current directory. When you click "Index in text" or "Index in html", the checked files are read, the entries are sorted, and an index is created in index.txt or index.html, respectively.

New Terms Window - A new instance of the Available Index Terms window is opened. All such windows look and behave alike, except that they may be scrolled differently and each may have its own set of selected entries. The selection is visible only when the window has the input focus.

Exit - Indexer saves any entries. For filename.txt, entries are saved to filename-index.txt.  Entries are automatically saved when you switch to another file or exit the program.  They are also saved every five minutes,

The Help Menu

About Indexer - Displays some mildly useful information, especially the current directory and file name. You should report the version number in error reports.

Help - Brings up a window displaying this very file.

Enter demo mode - In demo mode, Indexer works on a single built-in file and set of index terms. Creating an index shows it on the screen instead of saving it to a file.  Things to try:
Choose menu item File/CreateIndex and either html or text.
See the nice index.
Click "shadow" at the bottom of the terms list window.
It turns blue.
Click on a page in the main window.
Its index entries turn yellow.
Click the Add Entry button at the top.
"shadow" gets added to the entries in the yellow area.
Choose File/CreateIndex again.
Now the index has an entry for "shadow" and a cross reference to it.
If you add "dark" as a term on some page(s), more cross references will appear. (Cross references do not appear unless the term they point at has associated entries.)

"Available Index Terms" Window

Available Index Terms window        Any term in the "Available Index Terms" window can be assigned to any page in the text.  Scroll through the list. Select a term. It turns blue. Click the Add Entry button, and that term becomes an entry for the current page. Select two or more consecutive terms. They get blue. Click the Add Entry button, and they all become entries for the page.

If you want a new term, use the Create new index term ... button. If you want another copy of the entire window, use New Terms Window in the File menu. The contents of the window are derived from indexterms.txt in the same directory as the open chapter.

Getting Ready to Use Indexer

Indexer operates within a directory containing the document to be indexed and various index files. Indexer is distributed as a java archive file, Indexer.jar. Put a copy of Indexer.jar into the directory for an index.  Then click on that file in a listing in an explorer window. The Indexer windows will appear and you can work on the index.

Alternatively, you can run Indexer from a command line. Change directory to the directory containing the index files. Then give the command:
   java -jar pathname\Indexer.jar

A shortcut to Indexer.jar can be adapted by editing its properties. The target property should have something like this:
      "dir-j\Indexer.jar"
Where dir-j is the path to Indexer.jar; typically something like:
      C:\Program Files\Physpics\Indexer.jar
Then you can drag and drop a file or directory onto the shortcut icon and it will open itself to edit the first of the dropped files.


Indexing is done in a directory devoted to that task. To begin, there must be a list of potential index entries in a file called indexterms.txt and also one or more chapter text files whose names end with .txt. Later, there will be one name-index.txt file for each chapter and final index files named index.txt and/or index.html.

Chapter text files: name.txt

Each section of the book (it doesn't have to be a chapter) needs to be in the Indexer directory as a text file.  Each page of text must begin with a line containing  "$@" and the page number:
    $@1
          Chapter 1.
          Call me Ishmael. ...

For my indexing task, I created these files by first writing the chapter as a text file using the original word processing program.  Then I went through the text with a text editor (wordpad would work) and added the page number lines. For a two hundred page book, this took an hour or two. {I did cheat a bit. I used xemacs and a special purpose macro so a mouse click and a single keypress would enter each successive page number.
        The macro is initialized to a starting page number with
              \C-u number \C-x r n p
       Then the next page number is added  by typing \C-z where that key is bound by running
              (fset 'page-number
                 [return return ?$ ?@ return left ?\C-x ?r ?+ ?p  ?\C-x ?r ?g ?p  ?\C-e])
              (global-set-key [26] 'page-number)

}

Terms file: indexterms.txt

The meat of the indexterms.txt is lines containing index terms. The simplest form of a term line is
    <phrase> <WHITE> <term>
where <WHITE> is some combination of TABs and SPACEs. Since <phrase> and <term> can each have spaces, <WHITE> must be at least a TAB or two spaces. More TABs and SPACEs are okay.

When Indexer first looks at a chapter text, it scans for instances of the <phrase>s. If a <phrase> is found, the corresponding <term> is added as an entry for the page.

When a phrase is recognized in the text, it is colored red or blue. If blue, the corresponding term has been made an entry for the page. It can be deleted from the list of entries if it is not appropriate. The phrase is colored red if that phrase appears for more than one index term. For instance, in the first book indexed, the phrase Washington appeared for four index terms: the state of Washington, the city Washington, D. C., the president George, and Tom, an author of a referenced work. So no entries for Washington were made automatically and Washington was red everywhere it appeared.

Phrase words can contain only letters, hyphens, and apostrophes. Other characters are ignored.  The phrase can be omitted and then that term is never automatically added to a page by the initial scan. If the phrase is left out, there must be leading white space, as in
    <WHITE> <term>

If an index category is subdivided, an index term may have subterms. These are written in indexterms.txt as lines of the form
    <phrase> <WHITE> <term> <COLON> <subterm>
Here <COLON> is a SPACE, a colon, and a SPACE. The corresponding index entry will be
    term
       subterm page numbers ...

As a convenience, the <term> may be omitted if it is the same as for the preceding line. These lines would then have the form
    <phrase> <WHITE> <COLON> <subterm>

Besides terms, indexterms.txt may contain blank and comment lines. Comments begin with "//". One comment line can have the form
    // title: title words ...
When the index is generated in html, this book title will appear as the page title for the html page.


The first book indexed had phrases for both New York and New York Times. This works because the longest phrase found is the one used. But "York Times" would not work; the text "New York Times" is recognized as "New York" and not as an instance of "York Times".

 As new entries are added to indexterms.txt, it gets chaotic. Usually I resorted entries alphabetically using Excel. A general programmatic scheme would certainly be possible and will be considered for the next version of Indexer.

Cross reference entries

Cross references are index entries that direct the reader to look at other index entries. They appear in the index as "see ..." and "see also ...," as in
NYT. See New York Times
race/ethnicity
    home ownership and, 25n3
    equality, struggle for (see racial equality, struggle for)
    political party polarization, 102-3 (see also polarization, racial)
    See also Eastern Europeans; Asians.
These are incorporated in indexterms.txt with lines having the form
 index term .SEE. index term
where either index term may be just a main heading, or may be a main heading, " : ", and a sub heading.  For instance
NEA : members .SEE. National Education Association (NEA) : membership
which will generate in the index as
National Education Association (NEA)
membership 23, 25, 167-71
NEA
members (see National Education Association, membership)

Neither the term before or after .SEE. can have an associated phrase. To assign a phrase, put in another line that gives the phrase and its index term.

For the indexterms.txt line "xxx .SEE. yyy", the Available Index Terms window will have a listing of yyy.


http://www.asindexing.org/site/checklist.shtml

ASI logo  American Society of Indexers   open book index
Indexing Evaluation Checklist

The Index is the KEY to the book

Is the index to your book or web site good enough for your readers?
Here are some helpful insights for ensuring an excellent index.

"An index is not an outline, nor is it a concordance. It's an intelligently compiled list of topics covered in the work, prepared with the reader's needs in mind."

Reader Appropriateness
  • Are the indexed terms appropriate for the intended audience? For example: "heart attack" in a book for the general public, "myocardial infarction" in a book for health professionals; "Taxus" in a work for botanists or horticulturalists, "Yew" in a work for home gardeners.
Main Headings
  • Are the main headings relevant to the needs of the reader? Are they pertinent, specific, comprehensive? Not too general yet not too narrow? Not inane or improbable?
  • Do main headings have not more than 5–7 locators (page references)? If more, they should be broken down into subheadings.
Subheadings
  • Are the subheadings useful? In the example below,
    a) the page range is extensive
    b) the subheading "problems with Republicans" may be too general
           Roosevelt, Franklin
              problems with Republicans, 1–32
  • Are subheadings concise, with the most important word at the beginning? For example, not:
          banks
             and relationship to Federal Reserve bank
    but
          banks
              Federal Reserve regulation
  • Do entries avoid unnecessary words and phrases like "concerning" and "relating to" and proliferation of prepositions and articles?
  • Is the number of subheadings about right? More than one column’s worth is probably too many.
  • Are subheadings overanalyzed? Could they be combined? For example, could "dimensions" be substituted for "height," "width," and "length"? Or should some subheadings become main headings with their own subheadings instead?
  • Do subheadings have more than 5–7 locators? If more, they should either be broken down into sub-subheadings or be changed to main headings.
Double Postings
  • For the reader’s convenience, many subheadings should be double posted—that is, they should exist as main headings too. An example: "Cats: Siamese" and "Siamese cats." Has this been done? Double postings should, of course, have the same locators. Do they?
Locators
(Page References)
  • Are the locators accurate? Check a sample of entries to see. Spot-check pagination for nonsense numbers where the hyphen or en dash may be missing, such as 18693 for 186-93. Check that elision (page ranges such as 186-93) is consistent.
  • When locators include roman numerals or volume numbers, does the typography make the usage clear?
Cross-References
  • Have see and see also cross-references been provided?
  • A see should direct the reader to a different term expressing the same concept, such as "Clemens, Samuel. See Twain, Mark" or "aerobics see exercise".
  • A see also should guide the reader from a complete entry to the related entries for more and different information. Examples: "Mammals: 81, 85, 105; see also names of individual mammals" "astronomy 12–14, 56, 68. See also galaxies; planets"
Length and Type
  • Is the index length adequate for the complexity of the book? An index should be 3–5% of the pages in the typical nonfiction book, perhaps 5–8% for a history or biography, and more (15–20%) for reference books.
  • Is there a need for more than one type of index? For example, in addition to the usual subject index, perhaps a separate name or place index is called for. If so, is there one?
Format
  • Is the type large enough to be easily read? Do the index pages look open and not crowded?
  • Are the main headings and subheadings (and sub-subheadings if any) distinguished from each other?
  • Is the organization—whether alphabetical, chronological, or other—accurate, clear, and consistent?
  • When an entry’s subheadings "turn a page" that is, are continued from a right-hand page to a left-hand page, the main heading should be repeated, followed by the word continued in parentheses. Depending on the size of the pages, continued headings might be appropriate for continuations from left to right pages, or even from left to right columns. Are they present?
  • Preferences for punctuation between main headings and their subheadings and see and see also cross-references will vary from publisher to publisher. This discussion features several acceptable variants. The important thing is that the punctuation style be clear to the reader and consistent. Is it?