Page 1 of 1

Avoiding automatic account generation

PostPosted: Fri Aug 29, 2003 1:43 pm
by Guest
By: syskoll
Avoiding automatic account generation
2003-02-11 14:50
If we authorize outgoing unsollicited messages to be sent from SG, we have to make sure we don't turn SG into a spammer haven. There are two axes:

1. Throttling messages. Josh and I (Fred) agreed on the "generic throttling" algorithm similar to basic TCP/IP bandwidth throttling. This very simple algorithm requires two new data elements per user

2. Limiting account generation. Since Spammy can't use an SG account to spew spam (he'd quickly run into the enforced limit), he'll try to create multiple accounts from scripts. We need to stop him.

Yahoo had the same problem. Their solution is to ask the user to identify a word from a randomly chosen image made with low-constrast font on multicolored background to defeat OCR. We could do that or a variant at account creation time. Spammy would have to become a real good programmer to defeat that. He could still open a lot of accounts manually. But is this really an issue? With a limit of, say, 500 messages a day, even 1000 accounts will not allow any real amount of spam by spammers' standards, and these accounts would be flagged in SG's logs, allowing the admin to investigate.

Another idea is to ask the user to pay a very small sum through PayPal. This is nice because it allows a degree of traceability. Spammy would not risk using a credit card to open an account for spamming. On the other hand, asking for money creates a whole new slew of problems.

Finally, there is the idea of throttling outgoing email by IP address. Spamfarms often have their own domains with multiple addresses so that would not be sufficient. This would only stop the chickenboner in his trailer, not the really nefarious guys.

Other suggestions? Ideas?

-- SysKoll


By: maratheamit ( Amit Marathe )
RE: Avoiding automatic account generation
2003-02-13 14:30
I feel the PayPal solution would discourage users from trying out SG. So we should be adopting something along the lines of Yahoo. Are images like to ones Yahoo uses available freely? If not, we can go for a simpler option: ask the user to enter a specific number from a list, or ask them to add two numbers, ... This is not as user-friendly as identifying a number in an image but it would be better than nothing.

-- Amit


By: syskoll
Images to identify
2003-02-13 14:59
Yes, I am not too much in favor of a PayPal solution. I like the Yahoo image system. I suggest this:

-offline (once and for all, not on the fly), we run a script to generate a few hundred images with words in a low-contrast color scheme. We can use ImageMagick and/or GIMP's ScriptFu. I know how to do this in ImageMagick so I'll volunteer.

-we store the expected result somewhere as well. If img42.gif is generated with the word "penguin" then we have a content42.txt containing that word.

- we present a random image to the user, making sure the HTML does not give any indication of the chosen image name (we need a scrambled temp name)

- we check that he gives the correct answer.

Of course, Yahoo can come up with a patent for their little system. Sillier things have been patented. Josh, this is your department. What do you think?

-- SysKoll


By: maratheamit ( Amit Marathe )
RE: Images to identify
2003-02-13 16:39
I second Syskoll's idea above. And we can use one-way hash functions to avoid storing the words anywhere in cleartext. i.e. we associate each image with the MD5 hash of the word it contains (rather than the word itself). Checking the validity of the user response might be easier with this...

We still would need to generate scrambled file names to keep determined spammers from building a table of filename, word pairs.

-- Amit


By: jqh1 ( Josiah Hamilton )
RE: Avoiding automatic account generation
2003-02-13 17:01
Sure - this would involve moving the quick account create form off the main page and creating link to another main page view that would include the current sign up parameters, the image, and a box to type in the image text.

As to the potential patent issue, I'll give it some thought (probably not much research, though) -- As you guys know, running a website at all is probably a patent infringment :)

I don't have too much file space on the server, but I probably have enough to store a meaningful number of images.


BTW, if you get a chance, check out the PC mag sg review at:

http://www.pcmag.com/article2/0,4149,849253,00.asp

They pretty much trashed it - it looks to me like they didn't spend enough time to understand all the features (eg, they harp on the boogey-man possibility of hostile address creates without mentioning watchwords or prefix). If you have time, put in a user review (we're all users here, too, right?). I put one in that attempted to refute some of the claims, and I know at least one other user has put in a review -- I mentioned the matter in the "what's new" popup.

Anyway, back to the image thing, we could actually dynamically generate the images on the "graphs.spamgourmet.net" server (actually the test server). Using a one-way hash scheme could conceivably work even splitting it up this way. CPU's not much of an issue on the test server, so we wouldn't have that concern.




By: syskoll
Auto image generation with ImageMagick
2003-02-13 21:13
I re-read the man page for ImageMagick and I came up with a small script (a command actually) to imbed a ligh blue word image in a background. Here, the background is backgnd.gif and the resulting image is result.gif. The whole thing is blurred on top of that.

It takes 0.87 seconds to do that on my 800 MHz machine.

convert -fill lightblue -affine 8,0,0,8,0,0 -font /usr/share/fonts/default/Type1/p052003l.pfb -draw "text 20,20 hello" -blur 2x3 backgnd.gif result.gif

The word "hello" in the result.gif image is easy to see for a human eye but would be very hard for an OCR program.

Try it. It's very easy to do with ImageMagick.

-- SysKoll


By: nobody ( Nobody/Anonymous )
RE: Avoiding automatic account generation
2003-02-18 11:52
AutoImage generation looks right.

By the way, Yahoo is not the only one using that trick. Altavista used it years ago (to prevent script-submmissions), Paypal is using it, Overture is using it (for login), and much more...

T the point of the question (preventing spammers), here's more ideas:

Time limiting (i.e. 15 email per minute).
Limit the number of recipients per email (i.e. total of 15 addresses in to/cc/bcc)

=Bob


By: syskoll
Image recognition as anti-script measure
2003-02-18 14:37

The funny thing is that the specific program used by Yahoo, called Gimpy, has often been used as a Turing test (only a human can answer). As such, it was an irresistible challenge for some academics, and it has been cracked:

http://www.berkeley.edu/news/media/rele ... gimpy.html

They say that they beat Gimpy in 83% of the cases.

Now, I doubt that spammers are ever going to deploy a Gimpy breaker, but just to be sure, I'll deploy my own version of Gimpy, which hasn't been subjected to any kind of research. Besides, Gimpy is based on The GIMP and Script-Fu, and as such is relatively heavy. The little ImageMagick trick I mentioned above is very lightweight.

Time/volume limitation per account is one of the two necessary components, of course (the other being account limitation).

-- SysKoll