There are several kinds of incoming URL requests that need to be shunted off to alternative URLs. Most common are requests that are probes by webscum looking for weaknesses to exploit. Next are the pages that you have renamed or relocated. Finally you may want to add a few to correct for observed common user errors. To check for remappings, URLhandler consults this database, urlmap.sq3
.
In each database row are three columns old-URL, new-URL, and count. The count for a row is incremented each time it matches an incoming rule. The old-URL value is in lower-case; incoming URLs are converted to lower-case before querying the table. The new-URL can have these kinds of values: a directory (ends with a slash), a file (ends without a slash), an external URL (starts with "http://
"), or "GONE
". The algorithm matches successiveky shorter paths in the request, so a map entry for "/pictools/
" will also match "/pictools/doc/
"and "/pictools/doc/images/
". If the request seems evil, map it to "/tarpit.php
".
Logging errors - /admin/badlinks.xslx
When /admin/command.php lists errors in its 403/4 Errors section, I copy them to worksheet "observed errors
" in /admin/badlinks.xlsx
. (If you open badlinks before mapdata.csv, you will see a dialog box about "links to other data sources
". You can click "Don't Update
".) From time to time I look through the list to see if any should be redirected to some other page. These I add to the "mapping
" worksheet. When enough new map entries have accrued, I follow the steps below to update the database. And I add the first column of the new entries to the list in worksheet "counts history
".
The sheet "Existing dirs
" should have a list of all existing directories. This is consulted among the checks conducted by the mapping
sheet. The "latest counts
" sheet records a history of the counts. Entries with very low, or unchanging, counts can be deleted.
Fetch Counts, Revise the Database and Rebuild it
File /admin/mapdata.csv
is data sufficient to record the state of the database. To fetch the counts from the server, the database is recorded in mapdata.csv
, which is then transferred to the local build machine. To rebuild the database, the new contents are placed in mapdata.csv
and it is transferred to the serve. The database is rebult from it.
Fetching mapdata.csv
is done in directory /admin
on the local build machine. Open a shell and give the command "make fetchmapdata". This runs the script extractmapdata.csh
on the server and then copies mapdata.csv
to the local machine. where it is sorted. The full state is retained in file mapdata-unsorted.csv
for consultation to recover from incorrect changes.
Once mapdata.csv
is on the local machine, I open it in excel
and then open badlinks.xlsx
in that same instanve of excel
. Then the current counts should be appended to shet "count history
". As noted in the instructions atop that page, unsert a blank column before the Mapped Names column and then copy the "New Counts
" column. Using the version of paste that just pastes values, paste the counts into the newly inserted blank column.
After modifying work sheet "mapping
", sort its table rows on column A. Inspect the "detect
" columns for red entries. Fix any errors. Finally, copy the first three columns to mapdata.csv
. When adding a row to "mapping", you need to add the same old-URL value to the list in worksheet.
To then rebuild the data base, I open a shell window change to directory /admin
and issue the command "make rebuildurlmap
". It transfers mapdata.csv
to the server and then runs buildurlmap.csh
on the server.
When logged into the server, you can change directory to /admin
and run the csh
scripts directly, if you like. The commands are "csh extractmapdata.csh
" and "csh rebuildurlmap.csh
".