captchagen

Discussion re sg development. You don't have to be a developer.

Postby josh » Mon Jan 26, 2004 5:36 pm

so the "use case" goes like this:

1) user submits desired username & email to sg web server (not password, since it won't stick)

2) sg webserver validates username to make sure it's not in use -- if so, returns error (like now)

3) if username is available, sg server
a) gets a word from the dictionary
b) uses lwp to call gourmet, which will generate image and provide the filename
c) uses secret phrase, new username, and word to generate hash
d) makes a page with a form that includes the new username and email address as hidden inputs, includes the hash as a hidden input, shows the captchagen image, and provides displayed inputs for new password (twice) and word-in-the-image

4) user submits form, sg webserver validates that the two passwords match and then re-hashes secret phrase, username, and input word. It then compares the new hash with the input hash. If they match, it creates the account (otherwise it gives an error and re-displays the previous page with the form)


does that sound right? If so, it's cool because there's no persistence on the sg webserver until it creates the account.

Should the type-in-word be considered case sensitive? If not, we can lc() the relevant variables.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Almost that

Postby SysKoll » Mon Jan 26, 2004 5:46 pm

Josh,

That's almost it. The only trouble in your scenario is that we can have a potential race condition here if two users decided to register the same name simultaneously.

The remedy it to recheck the availability of the user name in stage 4, right when you create the account, and allow the user to pick a new name if it's already taken. Keep stage 2, since it will be the most common case of duplicate detection.

Corrected stage 4 is:

4) user submits form, sg webserver validates that the two passwords match and then re-hashes secret phrase, username, and input word. It then compares the new hash with the input hash. If they match, it attempts to create the account (otherwise it gives an error and re-displays the previous page with the form). If the account name is already taken, same as stage 2.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

No need for LWP

Postby SysKoll » Mon Jan 26, 2004 6:17 pm

Josh,

Regarding point 3b above, no need for LWP, it's even simpler than that.

If you look at the server code, you'll notice that it doesn't try to be particularly clever at analyzing the HTTP request containing the quizword. All it looks for is the GET line. Then it waits for a blank line marking the end of the HTTP header. This allows testing from a browser. So in a Perl client, all you need is to send the correctly formed GET line and a blank line. Here is a code sample:

Code: Select all
#! /usr/bin/perl -w

use IO::Socket::INET;

sub testcaptcha($) {
  my $quizword = shift(@_);  # Arg1: quizword
  # Define $RmtPort and $RmtHost, remote port and host of server
  my $socket = IO::Socket::INET
      ->new(PeerAddr => $RmtHost,
       PeerPort => $RmtPort,
       Proto => "tcp",
       Type => SOCK_STREAM)
      or die "testcaptcha: cannot connect to host $RmtHost port $RmtPort: $@\n";
  # Send the quizword to server - Note the 2 blanks after
  # the HTTP 1.1 and the double \n
  print $socket "GET /q=$quizword HTTP/1.1   \n\n";
  # Send a blank line to tell server we're done
  print $socket "\n\n";

  # Now we can get the server's response
  {     # Put perl in slurp mode
     local $/;
     $response = <$socket>;
     print "Server response is $response\n";
  }
  close $socket;
  return;
}

# main
testcaptcha("sgrocks4242");


Then you can browse the image directory and display the file name contained in the server response. It should contain your "sgrocks4242" quizword.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Mon Jan 26, 2004 11:05 pm

OK - it's working (more or less) at:

http://spamgourmet.com/index2.pl?languageCode=ZZ

If you get a chance, try it out (this will probably result in creating a real new user). Rather than give instructions here, I'll be silent, and let you all form opinions of where the process falls short (it certainly does in terms of dialogs).

Josh
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Postby maratheamit » Tue Jan 27, 2004 1:42 am

Just took a peek. The second page does need some explanations but other than that I don't see any problems: I tried simulating the race condition Syskoll mentioned and one user was correctly asked to choose another name.

We should create a cron job on the captcha server to remove old image files. And change the secret phrase and captcha background every few weeks.
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Postby maratheamit » Tue Jan 27, 2004 1:49 am

I noticed that when the user makes a mistake in entering the captcha word the web server displays another image. My first reaction was that this is not very efficient: why not display the same image again?

But then, it could be that a particular captcha image is not very clear and for such cases it makes sense to generate a new image.
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Seems to work for me

Postby SysKoll » Tue Jan 27, 2004 2:09 am

I tried the account creation on the URL you mentioned and it seems to work. I even tested trying to register an existing user name and stage 2 and 4 (before and at the captcha) correctly give me an error.

The error message you get is not too visible, though. It's pretty short ("The username XXXX is taken. Please try another one.") and it's buried in the login sidebar. I'd be better to send the user to an intermediate page with a lone, visible "please choose a different user name" message and with an OK button sending you back to the non-logged-in page.

TODO: you might want to lowercase the user's input before hashing for added comfort. Right now it's case-sensitive.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Old image deletion is automatic

Postby SysKoll » Tue Jan 27, 2004 2:13 am

maratheamit wrote:We should create a cron job on the captcha server to remove old image files. And change the secret phrase and captcha background every few weeks.


Amit,

The cron job is not necessary. The captcha server runs a child process that cleans up old images automatically. The images stay 15 to 25 minutes on the disk, then they are deleted. That should be more than enough.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Tue Jan 27, 2004 3:05 am

maratheamit wrote:But then, it could be that a particular captcha image is not very clear and for such cases it makes sense to generate a new image.


There was that reason, and also -- this didn't occur to me right away -- there's no way to know what the first word was because of the one-way encryption.

Anyway, I'm open for suggestions for explanatory text -- remember it has to be translated into umpteen languages....
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Postby SysKoll » Tue Jan 27, 2004 3:49 pm

josh wrote:Anyway, I'm open for suggestions for explanatory text -- remember it has to be translated into umpteen languages....


I'll create a new topic for that. This one should be reserved for the captcha programming per se.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Wed Jan 28, 2004 9:31 pm

you guys probably saw this:

http://boingboing.net/2004_01_01_archiv ... 8693964966

basically, the account creation script getst to the point where a captcha image is presented, then it puts a tag to the image on another website where free porn is offered to people who are willing to type the word in the image. Apparently, the delay is negligible :). Then the script continues with registration, using the newly acquired word.

I was thinking we could configure apache to reject request that don't have the sg page as a referrer -- this could work, but would create problems for "privacy enhanced" browsers that don't send the referer info.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Very skeptical

Postby SysKoll » Wed Jan 28, 2004 10:53 pm

Josh,

I saw that, I am highly skeptical this could work. What the scenario here?

  1. Spammer script wants to register fake user, detects captcha
  2. Script posts captcha on porn web page, asks porn guzzler to type the word in order to see Hillary naked
  3. Porn guzzler types quizword
  4. Now porn web server has to go back to the script with the correct (hash,quizword) pair. How? I see three solutions:
    1. The script has shipped the whole captcha reference and the attached hidden fields to the porn web server. When the porn guzzler solves the captcha, the request goes directly to our SG registration. This can be defeated by checking the referer.
    2. The script listens to a socket to the porn web server. The web server has an exec that tells the script when its captcha has been solved.
    3. The porn web server maintains a database of solved captchas that the script polls.

4a can be defeated by the referer verification -- unless the porn web site acts as an intermediate and changes the referer field in the HTTP response header going to our site . As for 4b and 4c, they seems technically quite difficult. That's why I have my doubts.

But assume the script and the web server are really programmed to do that. The simplest countermeasure for any of the techniques above would be to set a max delay (say 3 mins) between the moment the image is displayed and the moment the quizword has to be typed in and sent. That would give the script a way too small window. I refuse to believe that there are that many people registering for free porn.

Of course, you don't want to put a cleartext timestamp cookie or hidden field in the query you send out (remember, you cannot trust the response's content because it might be a script). You have two solutions:
  • You put an encrypted timestamp in the cookie, but you'd have to make it safe: you have to encrypt (not hash) the time stamp so that you can decrypt it in the answer, and the encrypted string has to refer to the captcha so that the script cannot just respond with the timestamp of the most recent captcha.
  • Or you can just check that the captcha file is still there when you receive the answer. Remember, the captcha file is removed from the captcha server after about 15 mins as of today. It could be much smaller.


Conclusion: I am skeptical this spammers' scheme could work, but if does, there are solutions that would be very quick to code and deploy.

My recommendation: ignore this for now.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby maratheamit » Thu Jan 29, 2004 2:48 am

I second Syskoll's recomendation: do noting for now.

But over the next few months we should try to put in place his suggestion of checking for the existence of the image file before allowing registration. It is a neat idea and will be very easy to implement.
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Postby vellire » Fri Jan 30, 2004 7:08 pm

josh wrote:I was thinking we could configure apache to reject request that don't have the sg page as a referrer -- this could work, but would create problems for "privacy enhanced" browsers that don't send the referer info.


<random comment>
So block anything other than sg with the exception of nullstrings?
</random comment>
vellire
 
Posts: 27
Joined: Tue Jan 27, 2004 7:42 pm

Previous

Return to Developers

Who is online

Users browsing this forum: No registered users and 13 guests

cron