pressure valve throttling

Discussion re sg development. You don't have to be a developer.

pressure valve throttling

Postby josh » Sat Oct 18, 2003 2:42 am

OK - I just went over Amit's "throttling with a daemon" thread.

I like the job spool architecture, too. I can tell you that the file space part scares me, though, even under normal usage. It's my understanding that the volume of mail going through the system is fairly extraordinary, and I think there would be significant risks of running out of disk space, unless the daemon stayed pretty busy.

Now that I think about it, though, it sort of works this way now, but without the daemon part -- sendmail does the spooling if necessary ahead of executing the sg code, and the spool grows when things heat up, then clears later. There's no throttling there, though.

Throttling does take on a whole new form when a daemon is involved in any manner -- we might end up storing a pretty big hash in memory -- up to a theoretical limit of one entry per user. I guess that's not too bad, though.

If I could alter the subject slightly -- I'm on the verge of adding the much more tedious file/db throttling. The db columns are in place, and I've started introducing the variables to both the old and new code (a pain... I've been working on the Mail::Audit header thing, too, but to no avail).

Here's what I'm planning:

1) implement the database throttling as Syskoll previously describled -- introduce an interval and maxcount to the config file (actually, a set for sending and for receiving), then maintain the database columns in the Users table, by updating the time to the current time() and setting the count to 0 if there's been a gap greater than the interval, or incrementing the count if not. Up front, we'd check to see if we're still in the interval and have exceeded the maxcount -- if so, we eat the message and probably go ahead and increment the count.

2) in addition to that, I'd probably like to specify a higher maxcount that will represent the pressure-valve threshold. If the count gets up this high, we shift the logic over to a file, including the username and the time() --- no need to count messages at this point, because we're basically under attack. Actually, we'd probably want to name the file as the username (if possible - there are some funny characters in some of the names -- we could use the lc() function and a lightweight alphanumeric hash to get around both case and character problems) and put the time() in the file -- we could avoid some contention this way by having different files for each user. Before we even create a database connection, we check for one of these files and see if the interval (actually, probably a second, greater interval) has expired. If not, we just exit - no database connection. If so, we delete the file and proceed. We'd probably need to use two directories for the files -- one for sending and one for receiving, so that the throttling could operate independently. I guess we could just prepend the files with s- and r- as well, though.


Anyway, I think this would protect us from two of the major types of existing resource hogs: a) DOS attacks and b) gaming scripters who automate the creation of buzillions of addresses (amounts to the same thing as a DOS attack, but the motivation is different, I think).

The third type of resource hog are those damn Outlook viruses - they really get out of control, and they send big messages. It'd be nice to tackle these too (maybe throttling by subject line) but I don't want to overcomplicate things.


By virtue of being on the sending side, it would also open the way for "sending the first message" :mrgreen:



How does this sound?
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Postby maratheamit » Sat Oct 18, 2003 3:39 am

I don't think we need to worry that much about the disk space issue during normal operation. As Josh observed, the only difference with the job spool architecture is where the pending emails are stored (in the sendmail queue or elsewhere). The thing to be careful about is not to allow a DOS attack to fill up the filesystem.

As for what we cache in the daemon, that is entirely up to us. One entry per user would be manageable to current membership levels, but we can get a lot fancier by caching information for only the most "frequent users", whatever we define that to be. That way we can also cache more of the relevant information which will avoid a lot of database round trips.

But coming back to the throttling, i don't see a problem with Josh suggests above. What kind of throttling can we do in sendmail?
Some of what we want had to be in the perl script but I remember sendmail has some UCE controls wherein it waits longer and longer (upto some limit) before accepting multiple emails on the same connection.
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Postby josh » Sat Oct 18, 2003 4:13 pm

[Actually - we could just stat() the pressure valve files to get the date/time -- we could go ahead and use the alphanumeric hash, but put the true usename in the file so we can look and see which usernames are throttled.]

I'll bet there are lots of options with sendmail, and for our server, I'm in favor of using them all, since we can. I have to say I'm always happy to have things working at all :wink: and we probably really should get another test machine prior to experimenting. I'm afraid I'm stuck in the "magic sendmail.cf" phase of sendmail maturity (even though I've been operating it for 11 years :oops:) -- I finally get one that works and then I'm really scared to change it -- if we had a separate box, though, we could try all kinds of stuff (taking care to never create an open relay, of course), even other mail servers.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Postby maratheamit » Sat Oct 18, 2003 6:34 pm

Take a look at the following sendmail options.
1. ConnectionRateThrottle
2. MaxMessageSize
3. MaxDaemonChildren
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Pressure throttling, file system hashes and DOS

Postby SysKoll » Sun Oct 19, 2003 2:47 am

Josh, Amit,

Good ideas. Some suggestions:

* DOS by scripted address creation: If you think this is ever going to be a problem, we can stop it by the same method we discussed for avoiding account creation by spammers: have a small picture with a word and ask the user to enter the word in the picture. This is very easy to code and I think we should deploy is ASAP. Let me know.

* File system containing time stamps: yes, it makes sense to use the file creation time as the time stamp and have the filename equal to the user name. I suggest to go hashing the whole hog, though:
- From the username, create an 8-byte hash with MD5 or another hash algorithm.
- Convert the hash into a hex string.
- Create a pathname that has one hex digit per subdir and the name of the user at the end. If the user is syskoll and the corresponding hash is 604FDB60A6464C9E, the pathname is
Code: Select all
 6/0/4/F/D/B/6/0/A/6/4/6/4/C/9/E/syskoll


This guarantees both unicity and quick lookup without any risk of hash folding (even if two users have the same hash, the usernames are different hence the files are different).

The danger when you use a filesystem to map a database is that you depend on the filesystem filename lookup for your overall performance. Here, each subdir can have at most 16 entries 0..9A..F, and the absolute worst case for this method is a 16*16 = 256 sequential directory reads, or the same as finding a file in a directory with 256 entries -- not a big deal. And most dirs will have only very few entries.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby James Day » Sun Nov 09, 2003 9:40 pm

Does this throttling count all email or only that which isn't blocked?

I'm asking because I use SpamGourmet as a cutout address for my SpamPal plugins, knowing that if a spammer tries a trivial joe job it'll be aborted at the 20 message limit and won't do me any harm and the rest of my email via Spamgourmet will still get through.

For the more complicated generated email address case, is it practical to automatically add a prefix to the account and email the account holder to let them know that it's been done to stop the attack? That could let through mail to the old email addresses while blocking the generated ones, so it's less disruptive than a full block. This wouldn't help with lists of addresses which a spammer might set up in advance, though.

On the Outlook DDOS issue, I've found that SpamCop is pretty fast to block them. I also block mail directly from dialup-equivalent addresses. You probably wouldn't want to use these normally but if there's a particular problem happening, they can be helpful. The dialup-quivalent blocking is less risky than SpamCop, in my experience, but also less effective. The ones I have experience with are here. Using RFC-ignorant also helps, because compromised systems often don't have valid reverse DNS or have other RFC compliance issues.
James Day
 
Posts: 8
Joined: Sat Aug 30, 2003 1:44 pm

Postby SysKoll » Mon Nov 10, 2003 4:26 am

I think the throttling will count all email to stop attacks (mail bombing or Joe jobs) targetting a given account.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby James Day » Wed Nov 12, 2003 5:55 am

Thanks. I'll switch the more likely joe job targets to a different SpamPal account to avoid them affecting the less public accounts.
James Day
 
Posts: 8
Joined: Sat Aug 30, 2003 1:44 pm

This is our next job

Postby SysKoll » Tue Feb 10, 2004 1:10 am

Josh, Amit,

Now that we have the first message interface, it's urgent to implement throttling, especially for sending messages. This will be our next top-most priority. Right?
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Tue Feb 10, 2004 6:41 am

I agree -- I've put in the database throttling, but we're missing the file based 'defcon 1' level of throttling. The db throttling should protect us from being exploited, but at the expense of the CPU during an attempt.

Syskoll - you'd mentioned the possibility of another server awhile back-- any chance that would be available for use as a test server? We could test to failure with it (more or less -- we'd probably want to use the machine itself to launch the "attack" to save bandwidth, and this would necessarily warp our results a bit, but so what...).
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Test server

Postby SysKoll » Tue Feb 10, 2004 3:46 pm

Josh,

I contacted the guy who made the offer. I'll keep you posted.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Fri Feb 13, 2004 12:03 am

btw, I just added a feature that allows us to disable an account (Feature level 13) - when an account is disabled, it can't log in to the website or send or receive mail. I added a new dialog to the effect of 'this account has been disabled'

What about hooking up the website "send the first message" feature to the SendThrottle data in the database -- that is, getting a "send from" address would count as sending a message (yes, this would double up on the counts). If the limit is exceeded, we can either a) not give the redirect address or b) disable the account. Likewise, we could consider disabling the account in spameater if the sendthrottle count is exceeded.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Account disabled

Postby SysKoll » Fri Feb 13, 2004 6:39 pm

What about hooking up the website "send the first message" feature to the SendThrottle data in the database -- that is, getting a "send from" address would count as sending a message (yes, this would double up on the counts).


What do you mean exactly by doubling up? You mean counting it twice?

I'd prefer to avoid disabling permanently the account. Rather, just display a message saying that the limit is exceeded and that he should try later. This is very efficient if we compute the throttling with a high time period, i.e, say 200 per two hours. What is a spammer going to do? Script-send 200 messages in 5 seconds, then wait two hours? Unlikely.

I suggest also that we log the "limit exceeded' throttling events so that we might decide to manually disable some abusers. If we see an abuse pattern, we can always implement something more drastic.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Fri Feb 13, 2004 11:49 pm

That sounds good. By "doubling up" I meant advancing the counter once for getting the address and another time for sending the message.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Finer throttling over time if needed

Postby SysKoll » Sat Feb 14, 2004 6:31 am

Well, considering that creating an email address is exerting some database activity, I think you're right, we should count it to limit it in order to prevente DoS.

Something else: you mentioned that the throttling method I proposed was somewhat crude. Say we allow N=100 messages over T= 1 hour. If I send 1 message at 3PM and 99 msgs at 3:55 PM, then at 4PM, the time is reset so I am allowed another 100 messages, for a total of 200 within an hour.

If this is an issue, there is a very simple fix: you split the time interval in k parts and reduce the counter down by one k-th (which is multiplying it by (k-1)/k if you prefer). This way, your counter is actually a summation function of message density per time unit. You can prove that this allows you to never send more than (k+1)N/k messages over any given time period.

An exemple to clear it up with k=2, N=100, T=1 hour. When the time period (1 hour) has passed, you divide the current counter by 2 (or you multiply it by (2-1)/2, same) and you reset the time.

This way, if I send 100 messages during the first hours, the counter will be halved (to 50) after 1 hour. Then I can send another 50 during the following hour, and so on. If I wait four hours after sending 64 messages, my counter will be down to 4. In any 1-hour period, I can send only (k+1)N/k = 3*100/2 = 150 messages, but this would average over a 2-hour period. This method gives a smoother average for an insignificant change (halving the counter instead of resetting it at the end of a time period).

If you read the counter and you see that more than a period has past since the last message, just reset the counter.

Code: Select all
 
  if  ($throttleTime < $now - 2*$recthrottleinterval) {
     $throttleTime = $now;
      $throttleCount = 1;
} else if ($throttleTime < $now - $recthrottleinterval) {
      $throttleTime = $now;
      $throttleCount = $throttleCount/2;
    } else {
      $throttleCount++;
    }


Just a suggestion.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Next

Return to Developers

Who is online

Users browsing this forum: No registered users and 9 guests

cron