header parsing, Mail::Audit, etc.

Discussion re sg development. You don't have to be a developer.

header parsing, Mail::Audit, etc.

Postby josh » Sat Oct 11, 2003 1:43 am

OK - the reported bug about the About.com newsletters not working with trusted senders exposed the fact that my parsing code wasn't handling two line headers (it is now in a crufty way), which exposed the fact that my parsing code sucks (it's not even parsing, strictly), which again begs the question of why I'm in the parsing code business in the first place, since it's been done so well so many times before.

Amit pretty much took care of the problem by introducing the Mail::Audit code, but the introduction of the additional modules more than doubled the CPU usage, so I backed it out. We've definitely got more CPU to burn now, but I've still been too scared to turn it back on. Maybe there's a way to cut down on the amount of code that gets loaded -- I really doubt that the Mail::Audit code that we're *using* is causing the CPU load - I'll bet instead its just the dependencies and all the compilation that's required.

In another vein, there may be an alternative approach altogether -- I was reading the tdma.net setup instructions and noticed this:

MTA Configuration (Sendmail)
Gripes

1. The main difficulty with using TMDA under Sendmail, is that Sendmail does not set any of the required environment variables as the other supported MTAs do. In fact, as hard as this is to believe, Sendmail does not provide any envelope recipient information to commands run from a .forward file. In today's complex mail environments, full envelope information in user-space is essential to support advanced applications such as TMDA. It's not an accident why all the other supported MTAs do this; it's a good idea!

If you are stuck with Sendmail and find this deficiency as troubling as I do, you might consider sending a feature request to the Sendmail developers asking that the full envelope information be made available to .forward invoked programs in the form of environment variables (SENDER, RECIPIENT, and EXTENSION). This should be done regardless of whether the Sendmail installation has procmail configured as its local mailer or not.


I feel sort remiss in all this -- what they're saying, of course, is that the other supported mail servers *do* supply environment variables that indicate the sender and the recipient (yes, it's late 2003, and we've been doing this for three years :oops: ). Anyway, it might be good to look at this, too.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

CPU concerns will go away with demonized version

Postby SysKoll » Sat Oct 11, 2003 2:07 am

Josh,

I think you're right to blame the large CPU increase to the extra loads and compiles generated by the Audit environment.

But when we'll finally use the demonized version of spameater, we'll not have to reload and recompile the program for every email. So chances are, the CPU consumption will go way down, allowing us to use Mail::Audit without fear.

So I recommend putting Amit's code into production ASAP, then apply my modifications so that spameater is demonized.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby josh » Sat Oct 11, 2003 6:22 pm

I'm having troubles with the stats and Mail::Audit - the code is set up to update the counts (stats) after the mail has been sent (poor man's transaction) - but I think it's not being reached with Mail::Audit

Code: Select all
  # the following method is final and the program will exit on its completion
  $message->resend($rcpt);


I had switched over last night without knowing this and there was a support req this morning pointing out that the number of messages shown as forwarded today was zero.

I temporarily went back to the old code and the count resumed. I've switched back to testing the Mail::Audit (you guys email me and I'll hook you up with the new testing env).

Amit - is there any way to keep the execution alive without hacking the Mail::Audit files? I'm thinking the hard exit; might be a problem for the daemonization effort, too.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Avoiding exits in Mail::Audit

Postby SysKoll » Sat Oct 11, 2003 6:36 pm

Josh,

Amit - is there any way to keep the execution alive without hacking the Mail::Audit files? I'm thinking the hard exit; might be a problem for the daemonization effort, too.


From the Mail::Audit manual:
Usually, the delivery methods accept, pipe, and resend are final; Mail::Audit will terminate when they are done. If you specify noexit => 1, Mail::Audit will not exit after completing the above actions, but continue running your script.


That's what is needed here.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Performance in Mail::Audit

Postby SysKoll » Sat Oct 11, 2003 6:45 pm

From http://search.cpan.org/~simon/Mail-Audit-2.1/Audit.pm:

By default, MIME messages are automatically recognized and parsed. This is potentially expensive; if you don't want MIME parsing, use the nomime option.


So make sure the nomime option is turned on. I don't see why SG should parse MIME content.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby maratheamit » Sat Oct 11, 2003 7:08 pm

I don't think the noexit option is going to solve the problem. When I made the Mail::Audit changes I remember observing that setCount was always the last routine to be called. So I moved setCount before sendMail (which remained final as per default). There is still an exit after the sendMail call just in case.

What did I miss when making the change?
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Postby josh » Sat Oct 11, 2003 10:15 pm

oh - I merged the new and old, and left the older sequence. The noexit option does seem to work. Was there something else about it?

I really like having the count happen after the send, so that if there's a fatal error on send, the count doesn't get updated. This makes it easier to provide support for people who aren't receiving mail properly.

Anyway, it's looking good so far...
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Postby maratheamit » Sat Oct 11, 2003 11:08 pm

I take it that the about.com multiple line headers are not a problem for Mail::Audit. If they are, that can be the impetus to move to a different MTA altogether (as Josh was suggesting).
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Postby josh » Sat Oct 11, 2003 11:26 pm

Yeah, I'll know in the next couple of days whether Mail::Audit can handle multi-line headers -- I think we're pretty certain it can.

I noticed it's keeping a log file in /tmp -- it's up to about 17 megs so far. Not bad, especially for now, but we might want to turn it off at some point. One oft-repeated entry in the log is:

180(new): message is MIME, but 'nomime' option was set.

That's probably just a warning, or info level message, I suppose. I doubt it'd be rejecting mime messages.

BTW, CPU has *not* been a problem so far. It does seem to spike up higher, but the overall load average is staying the same. I don't believe that was true for the old server.

I'd love to use a lighter weight MTA than sendmail -- I get a near heart-attack everytime I have to mess with sendmail, which is more often than I'd like recently with the vulnerabilities that have come up. At the same time, any work on the mail server gives me heartburn, and it is working for now....

I suppose the question remains whether Mail::Audit would use the environment variables if they were there or go ahead and do its own header parsing anyway. I'd think the latter, since we tend to think of it as a predictable module to be used elsewhere. Maybe there's a way to configure it to use the env vars (or we could add one).

Anyway, assuming all continues to be smooth, I'm all for putting off the mail server question so we can knock out the daemonization and the throttling stuff, followed by the long sought-after features such as sending mail.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Postby SysKoll » Sun Oct 12, 2003 7:32 pm

I noticed it's keeping a log file in /tmp -- it's up to about 17 megs so far. Not bad, especially for now, but we might want to turn it off at some point.


What level of log did you set? Two choices here: suppress the logs altogether (undefine $loglevel or set it to zero), or write a cron that archives the current log every day and deletes the old archives after a few days. I support the latter because you might want to keep logs to check for abuse. I am volunteering to write this cron if needed. I assume you have root access to your server and hence authority to use cron.

That's probably just a warning, or info level message, I suppose. I doubt it'd be rejecting mime messages.


I just sent myself a message with a MIME attachment and it's fine.

So please let us know if you want to keep the logs and install that cron job.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm

Postby maratheamit » Sun Oct 12, 2003 9:54 pm

I think the current log level is 3. Or at least, that's what it is on the test server. We should try setting it to 1 or 2, before deciding to supress all logging.

There is also a logfile option to the Mail::Audit constructor that we can use to rotate logs on a daily basis. Cron can then be used to delete all log files older than 7 days (say).
maratheamit
 
Posts: 82
Joined: Fri Aug 29, 2003 2:35 pm

Deleting Mail::Audit log files

Postby SysKoll » Sun Oct 12, 2003 11:01 pm

There is also a logfile option to the Mail::Audit constructor that we can use to rotate logs on a daily basis. Cron can then be used to delete all log files older than 7 days (say).


Excellent idea. Then the crontab can be simply
Code: Select all
find /tmp/path/whatever* -mtime 7 -exec rm {} \;


where /tmp/path/whatever* is the path prefix matching all the Mail::Audit log files.
-- SysKoll
SysKoll
 
Posts: 893
Joined: Thu Aug 28, 2003 9:24 pm


Return to Developers

Who is online

Users browsing this forum: No registered users and 18 guests

cron