Arriving URL requests are processed by the server.
Pictools uses mod_rewrite
, but I ran into some deficiencies,
so requests that do not map immediately to a page
are routed to urlHandler.php
.
The original URL is fetched via $sp->request_uri()
. The goal of the analysis is to choose one of these outcomes:
- SUCCEED
- serve an appropriate page
- REDIRECT
- reply with a permanent redirect so that browser will no longer request the initial URL. The mapping table is urlmap.sq3.
- FAIL
- Fail via forPhyspics.
failpage
- TARPIT
- Redirect to tarpit.php,
a black hole. This is implemented as a SUCCEED with arevised URL of
/tarpit.php
.
As you add items to the site, you may find some names that are inaccessible. This will happen for names listed in /admin/urlmap.xslx and a few other names listed in urlHandler.php. The map or URLhandler must sometimes be revised to solve such problems.
These strategies are tried
- rewrite "Cincinatti" to "Cincinnati" in current URL
- redirect requests for /admin/ to TARPIT
- send requests ending with xmlrpc to TARPIT
- parse the URL into dir, file, and extension; if
the parse fails, FAIL
- if there is no file, set file to
index.php
(or
an alternative, if one exists)
- rewrite
localcaptions.cap
to index.php
(for
a directory built from a captions file segment)
- if the current URL refers to an existing file, SUCCEED to it
- if the URL or some initial string of it, is in the remap database,
set the current URL to the mapped value from the data base (many malicious
strings map to TARPIT). The database test increments a count of how often
a given entry is selected.
- (at this point the code makes several tests to repair requests for pages that have moved. These are commented out for distribution.)
- if the current URL corresponds to an existing file, SUCCEED to it
- if the request is for an image in a segment directory,
find the captions file
and SUCCEED via servepicture.php
- if the database check found a match, REDIRECT with the resulting url
- if the request is for
xxx/index.php
, convert
it to xxx/
- FAIL
From time to time (daily),
URL requests arrive for pages that do not exist. These may be from
- user typos
- web crawlers following obsolete links to files that have moved ot vanished
- buggy web crawlers
- webscum seeking vulnerabilties
The known-to-be-evil requests are forwarded to tarpit
and ignored. Others are handled with box404
. and logged for the administrator. The log is displayed when the administrator views command.php
. When I see such errors, I process them as described in urlmap.
Testing URLhandler
The first column of worksheet 'Tests and tesults
' in badlinks.xslx
is a list of URLs to visit for testing. The rest of the sheet is predicted/experienced results. Copy the first column to server file /admin/urlTest.txt
and then browse to /admin/testUrlHandler.php
. The output will be the three columns seen in worksheet 'Test and Results
'. I generally compare the old and new result with cell formulas of the form =X1=Y1.
In urlTest.txt
, blank lines are ignored and lines beginning with hash are special. A #!
line terminates the test. A ##
line is ignored. Otherwise, hash lines are displayed in the output..