We'll need to move the account signup form onto its own page so that we don't need to have one of those images on every page -- alternatively, we could leave the form where it is, then add the challenge as a last step (probably moving the password part there, so we don't have to send it back to the browser for resubmission from the challenge page).
Agreed. We definitely need a separate registration page.
I also have the challenge of not having root on the webserver machine, and so not being able to easily install Image::Magick or File::Temp (I can work around those problems eventually, of course).
The simplest solution would be to convince the HE sysadmins to install on the web server machine the modules required by captchagen. However, if that's not possible, we could put captchagen on the mail server and put a link from the registration web page to the CAPTCHA image, but that leave the problem of the quizzword (the word that the user has to read in the CAPTCHA), which is returned by the routine: how do we get it in this scenario? I see at least two ways:
1. A simple solution that comes to mind is to have the mail server export a certain directory through NFS and have the web server mount it. This way, the quizzword returned by captchagen can be saved in an NFS-exported local file and that file can be read by the web server. Not very clean, but extremely simple.
2. Barring that, we can have a simple service running on the mail server. The mail server does not export anything. When the registration routine running on the web server wants to validate the quizzword, it sends both the typed word and the hash or ID of the captcha to that service on the mail server. The service retrieves the correct info and replies VALID/INVALID. More complex.
As for your approach, I agree with one reservation (see below).
a) when the challenge page is created, it calls captagen with a safe random filename (perhaps a hash of the proposed username) that creates the file in the sg web folder. It receives the word (probably lc()'s the word), and uses a private hash algorithm to create a hash that is included as a hidden input (I guess this could be the filename, for that matter) -- the image is included on the page
b) when the user submits the challenge form, the user typed word (probably lc()'ed) is hashed using the private algorithm to see if it matches the hidden input. If so, create the account.
Actually, we cannot send to the user a hash of the quizzword. Remember, some day, the SG code will be public, and everyone will be able to see what hash method you use for producing that hidden input. Then, considering that captchagen's dictionary has only 14,000 words, it will be very simple to hash all the dictionary words and compare their hashed value to the one sent by the registration page. And voila, Spammy got himself an SG registration script. Double plus ungood.
Incidentally, the attack described above is the reason why /etc/password on Unix doesn't contain hashed passwords anymore. Dictionary attacks on these hashes were way too common.
What we can do is send (e.g., as a hidden input) the name of the local server-side file that contains the quizzword. Since the name is unique and the file is short-lived, there is no risk of attack here.
So your approach becomes (I italicized the diffs):
a) when the challenge page is created, it
first verifies that the proposed user name is not already taken. Then it calls captchagen with a safe random filename (perhaps a hash of the proposed username
and of the localtime() timestamp, or we can use File::Temp if available) that creates the file in the sg web folder. It receives the quizzword (probably lc()'s the word),
saves the quizzword in a temp file and put that temp file name in the web page as a hidden input (I guess this could be the filename, for that matter) -- the image is included on the page
b) when the user submits the challenge form, the user typed word (probably lc()'ed) is
compared to the content of the word in the server-side file. If so, create the account.
The determining factor will be whether or not you can run captchagen on the web server. Please let us know ASAP.