rollout and release

Discussion re sg development. You don't have to be a developer.

rollout and release

Postby Guest » Fri Aug 29, 2003 1:41 pm

By: jqh1 ( Josiah Hamilton )
rollout and release
2003-02-08 10:02
I briefly "went live" with the Mail::Audit version last night (finding and fixing a few other \015 problems buried in the libraries). After ironing out the deployment wrinkles, it worked fine. It's back off for now though -- it appeared that it was using about twice the CPU of the old code for each run that didn't involve forwarding a message -- I wasn't able to get an idea of the relative impact of forwarding messages. Since 9 out of 10 messages to the production box are eaten, it appeared we might see a spike in CPU usage.

I was using a very coarse tool to get my CPU usage figures: 'top' with a one second sample granularity. It appeared that hits for the old code were 5-7% CPU and hits for the new code were 11-13% CPU. Both lasted less than one second (relatively how much less, I don't know, and that would be relevant).

Anyway, I had been too busy to do much performance testing on the test box. I'm assuming that either a) the additional CPU usage was due to instantiating and parsing all the library (.pm, .al) files; or b) I was mistaken, and the usage wasn't really higher or that much higher.

In any event, I'll start working on release notes, packaging so we can get both the Mail::Audit version and the daemon version released. Meanwhile, I'll use the Bench package to try to get some better comaprison numbers.

I probably mentioned that I'm not allowed to run daemons on the production server (this is a low rent operation, after all), but that shouldn't stop a release.

By: syskoll ( Fred )
RE: rollout and release
2003-02-10 13:15
>I probably mentioned that I'm not allowed to run
>daemons on the production server

Errr, no, I don't remember that. The whole work I did with is based on the assumption you could launch spameater in the background and let it run, while the smaller forwarder is invoked on receiving each email.

Is that assumption false?

-- SysKoll

By: jqh1 ( Josiah Hamilton )
RE: rollout and release
2003-02-10 14:54
SysKoll - it's right that I can't run daemons in the current production environment. On the other hand, though, daemonizing the code is pretty much essential to interest third parties in running it, and at some point we'll be switching servers.

Also, as I'm pushing the envelope on the existing service, it may be that Hurricane *would* let me run a daemon (that is, make an exception to their policy) -- I'm not in the position to ask for favors just now though...

By: syskoll ( Fred )
Daemon, higer cost
2003-02-10 21:00

OK, I was a bit frightened for a moment.

Regarding the higher cost of a future SG server, I was thinking that we also have a problem with the identification of users who want to send unsollicited email. You proposed a throttle (see the throttle thread). We don't want spammers opening multiple account to bypass the throttling. We need a unique ID.

There is an easy solution to both: Request a very small amount ($1 or so) from the users, payable with PayPal. Paypal allows users to specify a confirmed name and address, as listed in the credit card linked to the PayPal account. So this would be a way to check the ID of the sender. Also, even $1 is too much for a spammer who would have to open hundreds, if not thousands of accounts to go around the throttles.

Of course, I really hate the thought ot begging for money. If course, it would pay for the server... Maybe...

By: jqh1 ( Josiah Hamilton )
RE: rollout and release
2003-02-11 11:08
Yeah the new user exposure thing could be trouble. Getting the $1 would probably be the best way to handle that, but we've got all the usual reservations about charging money.

(BTW - sg costs me $10/month to run, not counting domain renewals -- in the two months since I put up the paypal donate page, I've gotten around $260 in donations, which is pretty much a 100% reimbursement for the 2 years of hosting costs)

Another approach to the new user problem would be to impose throttling on an IP address basis (and maybe an absolute basis, as well) for new user accounts. Since new users are created by the web code, this would have zero impact on the mailer.

Here's my current scattered thoughts on stateless/jobless throttling for sending mail (applies to new users as well):

First, I'm aware that this will sound a bit like your (SysKoll's) prescription for the eaten message log. If we add a column to the Users table, we can store a number/string that will contain enough info to do throttling if the time period and number of allowed messages are small enough (eg, 5 messages per minute rather than 200 per day). The value would be the time() (seconds since the epoch) representing the send time of the earliest tracked message, followed by an offset for each subsequently tracked message. If the time period is a minute, we can use two digits for the offsets. For clarity, I'll use a delimiter between the time and the offsets in the example below, but I don't think one is required. Assume we allow 5 sends per minute.

After the first message, the value will look like:

1044988266 (one value, no offsets)

10 seconds later, the next sent message comes through. We look at the value, and see that the current time is 10 seconds later. Since we're inside the minute window, we add the offset to the value:


5 seconds later, the next message comes. Still inside the window, so we add the next offset:


20 seconds later next message (still inside window):


5 seconds later next message (still inside):


5 seconds later next message comes. We're now 45 seconds from the start period with five messages already sent, so we drop the message and leave the value alone.

20 seconds later, the next message comes. That places us 5 seconds outside the original window, so we promote the first offset to be the start time, and add the next offset on the end (the new offset is 10, instead of 20, because we subtract the first offset from it):


40 seconds later, the next message comes through. That puts us a full 70 seconds ahead of the start time, so we lop the first two offsets, add the 25 seconds to the start time, subtract it from the 40 seconds and put a new 15 second offset at the end.


two minutes go past and the next message comes through. Since we're fully outside the window, we replace everything with the new start time:


Is this making sense? I'd like to use a number field instead of a text field (on the theory that they're faster), so maybe we can subtract some large number from the time() (like 1 billion+) to keep the number of digits down to something that will comfortably fit in a number column. I haven't thought this through much yet, though -- I imagine there'd be a problem when we hit digit rolls in the diminished number


By: syskoll ( Fred )
Generic throlling algorithm
2003-02-11 12:01

Isn't this algorithm a bit complex? Generally, when you have a time-dependant resource, you should use the generic throttling algorithm. It works this way:

1. Define a frequency for your resource and a time period. Example: if your messages are throttled at 100 messages per hour, then define frequency F=100 and time period TP=1 hour.
2. At first use (in our case, first message sent), initialize a time-stamp T=CurrentTime and a counter C=F.
3. Each time an event occurs (here, the user wants to send a message), do the following:
3a. If C=0, deny the event (here, tell user he cannot send messages anymore)
3b. If C>0, decrement C and allow the event.
3c. calculate D =CurrentTime - T. If D > TP then T=CurrentTime and C=F.

In our case, this means we need two fields per user row, T and C. We need two parameters in the code, F and TP (number of messages per time period).

This is the same algorithm as for throttling a TCP/IP link. I hope it makes sense. Please do not hesitate to ask me to rephrase this if needed.

Also, please see my append in the "throttling" thread. I wrote:

"Assume you have a counter in the user row, initialized to, say, 500. You decrement it each time an unsollicited (non-reply) message is sent from SG. When counter reaches 0, display nice message such as "That was your last message for the day" (and probably log it too since it's a highly abnormal event). At 3AM every day, run a job that locks every user row, resets the counter to 500, unlock the row and so on for the whole table. Shouldn't hamper the regular SG operations."

This suggestion is of course *bad* because you'd need to run a special extra job to reset the counters. The generic throttling agorithm works much better, which is why I recommend it.

Let me know what you think.

-- SysKoll

By: jqh1 ( Josiah Hamilton )
RE: rollout and release
2003-02-11 13:39
That does seem better. The window doesn't "roll forward" as it does above, but now that I think about it, that wouldn't really be necessary... Using the algorithm tcp/ip approach, it would be possible to send more than the max during the course of one period (say rate is 100 messages per hour - user sends one message at 1:00, which sets that timestamp, then at 1:59, user sends 99 messages which is OK, then at 2:00 user sends 100 more messages which results in 199 messages in two minutes) -- the net effect is pretty much the same, though.


By: syskoll ( Fred )
RE: rollout and release
2003-02-11 14:27

Yes, you're right. The throughput of the resource is averaged over two time periods. Hadn't thought of that. As you pointed out, it doesn't really matter. If you're adamant on limiting your users to C messages per time period TP, then allow them C/2 per TP/2 and you're guaranteed the average max will be C per TP.

Now that we have an algorithm, it's time to think about how to insure spammers can't open multiple accounts quickly. I've opened a separate thead called "avoiding automatic account generation".

Return to Developers

Who is online

Users browsing this forum: No registered users and 2 guests