bbs.spamgourmet.com

by **Guest** » Fri Aug 29, 2003 1:48 pm

By: maratheamit ( Amit Marathe )
Throttling with a daemon
2003-08-25 15:18
A while back when SG was running on a shared host it wasn't possible to run daemons (explicitly) and we therefore came up with the idea of launching the daemon through the .forward file with a small script. With the new server, I think a better architecture would be as follows:

1. Each email received by SG get's written to a unique file in some directory (let's call it maildir). This could be done either by the MTA (sendmail) or by a low-footprint program we specify in .forward.

2. A daemon runs constantly in the background and wakes up to perform an iteration every few minutes. On each iteration it goes through all the files in maildir and processes each one according to the SG algorithm (deleting the file at the end).

With this design, we can implement throttling by simply maintaining counters in the daemon without any database changes. e.g. if we run the daemon every 10 minutes and program it process only 10 messages per user in each iteration, we get a limit of 60 messages per user per hour. The advantage would be that we won't have to discard any messages: any ignored email can be considered in the next iteration, provided no new email has arrived for the same user. And the eaten message log can be implemented in the filesystem by just moving the maildir file to another directory (possibly overwriting an older log in the process).

There are a few issues to consider (e.g. a DOS attack filling up the filesystem) and many details to iron out but let me know your feedback on the overall idea.

-- Amit

By: syskoll ( Fred )
RE: Throttling with a daemon
2003-08-26 21:03
Amit,

What you propose here is called a job spool. It's the same as a print spool in the old lpd structure where files to be printed are written in a print queue directory and then processed by the print daemon.

It works reliably, and the algorithm is well known (just look at the Linux or BSD code for lpd). The lpd algorithm is to create a file that contains the job (here, the email to process) and another file that is just a job descriptor (e.g., the name of email file). The daemon waits for this descriptor file to show up in its queue dir. It cannot wait on the job file itself because since it's a potentially large file, it could still be incomplete when the lpd daemon wakes up. Remember this, it's important for later.

But it should be noted that going through the file system is always slower than going through a Unix pipe. Whenever there is a way to use a pipe, we should do it.

It all boils down to what architecture we want here. With the architecture you propose, we'd have one instance of the spameater daemon. Right now, we have one spameater instance per received email, and many can be active at the same time. One daemon instance is certainly more CPU-efficient.

So I agree on the one-daemon architecture. Now, do we use your job spool? Do we use a TCP/IP socket like the forwarder.pl I wrote? It's a matter of taste. I am partial to my code :-)

. But the performance of your solution might actually be better: forwarder.pl needs to fork a perl process for each message, while your solution just needs to create a couple of files.

PROs and CONs:

fowarder.pl:
PRO: uses sockets, faster than creating files
CON: needs to fork a Perl process for each incoming email

job spool:
PRO: don't need to fork a Perl process
CON: 1. use file system, slower than pipes -
2. Limited to what you can do in a .forward

That last point might be crucial. I know the .forward syntax to save the incoming email into a job file, but I don't know how to create a job descriptor file from a .forward. For this, I'd need to fork a process, which entirely negates the Pro of the job spool method.

Comments, please?

-- SysKoll

By: maratheamit ( Amit Marathe )
RE: Throttling with a daemon
2003-08-27 10:06
It does boil down to a matter of taste. I would tend to go with the job spool approach because it is more modular and efficient. Modular because accepting an email is only loosely coupled to processing it (with the forwarder.pl design you have to make contact with the daemon each time you receive a message and also handle the case where the daemon is unreachable for some reason). Efficient in the sense that daemon does not have to wake up to handle every message received (rather it can process the accumulated message every iteration as a batch).

There are downsides as SysKoll points out: file I/O is less efficient than communicating directly over a socket and the MTA has to have the ability to create unique files in an atomic fashion (i.e. partially written files should not be visible).

Since the forwarder.pl code is already written it makes sense to put that into production first. I wanted to open a discussion on what is a good long-term design.

-- Amit

By: syskoll ( Fred )
Job spool a better long-term architecture
2003-08-27 12:25
Amit,

I agree with the efficiency points you make. In the long run, I think that a job spool architecture would minimize the CPU consumption.

To implement the job spool, the main problem is finding a way to keep partially written email files from being processed by the spameater daemon, as you point out. How can we do that?

The only clean method I can see is the algorithm used by the lpd print spool daemon. It requires either a program to be invoked by the .forward file, or a special modification of the MTA. If we want SG to be adopted widely, we don't want MTA modifications. An SG site should be able to run with a standard MTA.

So we're back to invoking a program from the .forward file. Note, however, that it can be a short C (not Perl) program. We'd still have to fork a process, but we wouldn't have to spend time on loading and compiling a Perl file.

So unless you find a better way, we can base the future design on a job spool containing the email file and with a small C program atomically creating a job description file for the spameater daemon.

Comments?

by **josh** » Fri Aug 29, 2003 8:44 pm

Does anybody know of something I can install that will redirect port 25 to one of the high ports?

I can devote an IP address to development on the new server -- if I can start up something like that, we can bind the high port and go nuts without worrying about messing up the service.

I've got stunnel, which does what I'm talking about except that the bound port has ssl -- is there anything that'll just do it raw?

by **SysKoll** » Thu Sep 04, 2003 4:40 pm

I think that forwarding a port can be done by ssh. The syntax is:
ssh -D port localhost.To quote the man page of ssh:

-D port Enable dynamic application-level port forwarding.

I did a ssh -D 25 localhost and then I telnetted from localhost into port localhost:25. Here is what I see in netstat:

Code: Select all: # netstat Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 localhost.localdoma:ssh localhost.localdo:36452 ESTABLISHED tcp 0 0 localhost.localdom:smtp localhost.localdo:36453 ESTABLISHED tcp 0 0 localhost.localdo:36453 localhost.localdom:smtp ESTABLISHED tcp 0 0 localhost.localdo:36452 localhost.localdoma:ssh ESTABLISHED

Is that what you wanted?

by **josh** » Sat Oct 18, 2003 2:00 am

wow - that last post of mine was offtopic. I'm going to start a new thread.

by **SysKoll** » Thu Jan 01, 2004 7:22 pm

The messages from user ritz don't seem to make any sense. Is this a spamming attempt or a mistake?

Josh, if you have a chance, please remove the offending messages.

by **maratheamit** » Thu May 27, 2004 1:47 pm

I have had success in implementing the spool architecture on the test server. The details are as follows:
- a small perl program invoked from .forward writes each incoming message to a unique file in a spool directory.
- spameater.pl is called by cron every 10 minutes to process all files in the spool directory.

Should we try this out in production (during a low activity period) to get an idea of the performance benefits?

by **SysKoll** » Thu May 27, 2004 2:48 pm

I like it. Reduce the cron delay to 3 mins and make sure that the version of spameater.pl spwaned by cron checks that it's the only instance (you don't want to have 2 of them running.)

What algorithm did you use for the spooling?

by **maratheamit** » Thu May 27, 2004 3:59 pm

I am using the File::Temp library to create the unique filenames in the spool directory. After copying the STDIN to this file I rename the file. spameater.pl looks only at renamed files (they have a particular extension) to avoid any race conditions.

by **SysKoll** » Thu May 27, 2004 6:04 pm

Seems pretty good to me.

Are you using some logging?

I am using something relatively primitive but quite helpful:

Code: Select all: ##################################### ## PrintTrace - Prints a message is trace level high enough ## Args: ## $1 - trace level ## $2 - string, printed if global $TraceLvl >= $1 ## String is preceded with one '*' per trace level and the local time ## Uses: $TraceLvl sub PrintTrace { my ($lvl, $str, $tm); $lvl = shift; $str = shift; return unless $TraceLvl >= $lvl ; $tm = localtime; print '*' x $lvl, " $tm $str\n"; }

To use this routine, you set $TraceLvl in your main (I set it to a value passed as a command line arg) and then I call it with the appropriate trace level (high levels for more details.

Eg, if TraceLvl >=1,

Code: Select all: PrintTrace(1, "Starting program");

will print

* Thu May 27 13:50:15 2004 Starting program

whereas; if TraceLvl >=3,

Code: Select all: PrintTrace(3, "Created temp file " . $tmpname);

will print

*** Thu May 27 13:51:26 2004 Created temp file tmpb7c5kLSu9

During debug, you can turn on more detailed traces by feeding a higher trace level (e.g, TraceLevel 5 will print all trace messages up to level 5). And of course, during production, set the trace level to zero to disable these messages.

by **josh** » Thu May 27, 2004 7:05 pm

Mail::Spamgourmet::Config has a method called "debug" in it now -- more primitive than anything you all are talking about. We can rework that one if needed.... If we put the tracelevel as the second arg, it'll be backward compatible.

bbs.spamgourmet.com

Throttling with a daemon

Throttling with a daemon

port redirection

Forwarding port 25

Is user ritz spamming us?

Who is online