Throttling with a daemon

Discussion re sg development. You don't have to be a developer.

Throttling with a daemon

Postby Guest » Fri Aug 29, 2003 1:48 pm

By: maratheamit ( Amit Marathe )
Throttling with a daemon
2003-08-25 15:18
A while back when SG was running on a shared host it wasn't possible to run daemons (explicitly) and we therefore came up with the idea of launching the daemon through the .forward file with a small script. With the new server, I think a better architecture would be as follows:

1. Each email received by SG get's written to a unique file in some directory (let's call it maildir). This could be done either by the MTA (sendmail) or by a low-footprint program we specify in .forward.

2. A daemon runs constantly in the background and wakes up to perform an iteration every few minutes. On each iteration it goes through all the files in maildir and processes each one according to the SG algorithm (deleting the file at the end).

With this design, we can implement throttling by simply maintaining counters in the daemon without any database changes. e.g. if we run the daemon every 10 minutes and program it process only 10 messages per user in each iteration, we get a limit of 60 messages per user per hour. The advantage would be that we won't have to discard any messages: any ignored email can be considered in the next iteration, provided no new email has arrived for the same user. And the eaten message log can be implemented in the filesystem by just moving the maildir file to another directory (possibly overwriting an older log in the process).

There are a few issues to consider (e.g. a DOS attack filling up the filesystem) and many details to iron out but let me know your feedback on the overall idea.

-- Amit


By: syskoll ( Fred )
RE: Throttling with a daemon
2003-08-26 21:03
Amit,

What you propose here is called a job spool. It's the same as a print spool in the old lpd structure where files to be printed are written in a print queue directory and then processed by the print daemon.

It works reliably, and the algorithm is well known (just look at the Linux or BSD code for lpd). The lpd algorithm is to create a file that contains the job (here, the email to process) and another file that is just a job descriptor (e.g., the name of email file). The daemon waits for this descriptor file to show up in its queue dir. It cannot wait on the job file itself because since it's a potentially large file, it could still be incomplete when the lpd daemon wakes up. Remember this, it's important for later.

But it should be noted that going through the file system is always slower than going through a Unix pipe. Whenever there is a way to use a pipe, we should do it.

It all boils down to what architecture we want here. With the architecture you propose, we'd have one instance of the spameater daemon. Right now, we have one spameater instance per received email, and many can be active at the same time. One daemon instance is certainly more CPU-efficient.

So I agree on the one-daemon architecture. Now, do we use your job spool? Do we use a TCP/IP socket like the forwarder.pl I wrote? It's a matter of taste. I am partial to my code :-). But the performance of your solution might actually be better: forwarder.pl needs to fork a perl process for each message, while your solution just needs to create a couple of files.

PROs and CONs:

fowarder.pl:
PRO: uses sockets, faster than creating files
CON: needs to fork a Perl process for each incoming email

job spool:
PRO: don't need to fork a Perl process
CON: 1. use file system, slower than pipes -
2. Limited to what you can do in a .forward

That last point might be crucial. I know the .forward syntax to save the incoming email into a job file, but I don't know how to create a job descriptor file from a .forward. For this, I'd need to fork a process, which entirely negates the Pro of the job spool method.

Comments, please?

-- SysKoll


By: maratheamit ( Amit Marathe )
RE: Throttling with a daemon
2003-08-27 10:06
It does boil down to a matter of taste. I would tend to go with the job spool approach because it is more modular and efficient. Modular because accepting an email is only loosely coupled to processing it (with the forwarder.pl design you have to make contact with the daemon each time you receive a message and also handle the case where the daemon is unreachable for some reason). Efficient in the sense that daemon does not have to wake up to handle every message received (rather it can process the accumulated message every iteration as a batch).

There are downsides as SysKoll points out: file I/O is less efficient than communicating directly over a socket and the MTA has to have the ability to create unique files in an atomic fashion (i.e. partially written files should not be visible).

Since the forwarder.pl code is already written it makes sense to put that into production first. I wanted to open a discussion on what is a good long-term design.

-- Amit


By: syskoll ( Fred )
Job spool a better long-term architecture
2003-08-27 12:25
Amit,

I agree with the efficiency points you make. In the long run, I think that a job spool architecture would minimize the CPU consumption.

To implement the job spool, the main problem is finding a way to keep partially written email files from being processed by the spameater daemon, as you point out. How can we do that?

The only clean method I can see is the algorithm used by the lpd print spool daemon. It requires either a program to be invoked by the .forward file, or a special modification of the MTA. If we want SG to be adopted widely, we don't want MTA modifications. An SG site should be able to run with a standard MTA.

So we're back to invoking a program from the .forward file. Note, however, that it can be a short C (not Perl) program. We'd still have to fork a process, but we wouldn't have to spend time on loading and compiling a Perl file.

So unless you find a better way, we can base the future design on a job spool containing the email file and with a small C program atomically creating a job description file for the spameater daemon.

Comments?
Guest
 

port redirection

Postby josh » Fri Aug 29, 2003 8:44 pm

Does anybody know of something I can install that will redirect port 25 to one of the high ports?

I can devote an IP address to development on the new server -- if I can start up something like that, we can bind the high port and go nuts without worrying about messing up the service.

I've got stunnel, which does what I'm talking about except that the bound port has ssl -- is there anything that'll just do it raw?
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Forwarding port 25

Postby SysKoll » Thu Sep 04, 2003 4:40 pm

I think that forwarding a port can be done by ssh. The syntax is:
ssh -D port localhost.To quote the man page of ssh:

-D port Enable dynamic application-level port forwarding.

I did a ssh -D 25 localhost and then I telnetted from localhost into port localhost:25. Here is what I see in netstat:

Code: Select all

# netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State     
tcp        0      0 localhost.localdoma:ssh localhost.localdo:36452 ESTABLISHED
tcp        0      0 localhost.localdom:smtp localhost.localdo:36453  ESTABLISHED
tcp        0      0 localhost.localdo:36453 localhost.localdom:smtp ESTABLISHED
tcp        0      0 localhost.localdo:36452 localhost.localdoma:ssh ESTABLISHED


Is that what you wanted?
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Sat Oct 18, 2003 2:00 am

wow - that last post of mine was offtopic. I'm going to start a new thread.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Is user ritz spamming us?

Postby SysKoll » Thu Jan 01, 2004 7:22 pm

The messages from user ritz don't seem to make any sense. Is this a spamming attempt or a mistake?

Josh, if you have a chance, please remove the offending messages.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby maratheamit » Thu May 27, 2004 1:47 pm

I have had success in implementing the spool architecture on the test server. The details are as follows:
- a small perl program invoked from .forward writes each incoming message to a unique file in a spool directory.
- spameater.pl is called by cron every 10 minutes to process all files in the spool directory.

Should we try this out in production (during a low activity period) to get an idea of the performance benefits?
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Postby SysKoll » Thu May 27, 2004 2:48 pm

I like it. Reduce the cron delay to 3 mins and make sure that the version of spameater.pl spwaned by cron checks that it's the only instance (you don't want to have 2 of them running.)

What algorithm did you use for the spooling?
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby maratheamit » Thu May 27, 2004 3:59 pm

I am using the File::Temp library to create the unique filenames in the spool directory. After copying the STDIN to this file I rename the file. spameater.pl looks only at renamed files (they have a particular extension) to avoid any race conditions.
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Postby SysKoll » Thu May 27, 2004 6:04 pm

Seems pretty good to me.

Are you using some logging?

I am using something relatively primitive but quite helpful:

Code: Select all
#####################################
## PrintTrace - Prints a message is trace level high enough
## Args:
## $1 - trace level
## $2 - string, printed if global $TraceLvl >= $1
## String is preceded with one '*' per trace level and the local time
## Uses: $TraceLvl
sub PrintTrace {
  my ($lvl, $str, $tm);
  $lvl = shift;
  $str = shift;
  return unless  $TraceLvl  >= $lvl ;
  $tm = localtime;
 
  print  '*' x $lvl, " $tm $str\n";
}


To use this routine, you set $TraceLvl in your main (I set it to a value passed as a command line arg) and then I call it with the appropriate trace level (high levels for more details.

Eg, if TraceLvl >=1,
Code: Select all
PrintTrace(1, "Starting program");

will print

* Thu May 27 13:50:15 2004 Starting program


whereas; if TraceLvl >=3,
Code: Select all
PrintTrace(3, "Created temp file " . $tmpname);

will print

*** Thu May 27 13:51:26 2004 Created temp file tmpb7c5kLSu9

During debug, you can turn on more detailed traces by feeding a higher trace level (e.g, TraceLevel 5 will print all trace messages up to level 5). And of course, during production, set the trace level to zero to disable these messages.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Thu May 27, 2004 7:05 pm

Mail::Spamgourmet::Config has a method called "debug" in it now -- more primitive than anything you all are talking about. We can rework that one if needed.... If we put the tracelevel as the second arg, it'll be backward compatible.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm


Return to Developers

Who is online

Users browsing this forum: No registered users and 12 guests

cron