messages to multiple sg recipients

Discussion re sg development. You don't have to be a developer.

messages to multiple sg recipients

Postby josh » Mon Apr 18, 2005 11:14 pm

Everyone,

I thought we had a problem when one message was addressed to more than one disposable address -- the code would get confused and only forward the message to the first one (if any).

I put a new (highly reworked) version of the code in a beta directory:

You can try it if you like -- use one of these domains for a disposable address:

beta.spamgourmet.net
beta.xoxy.net
beta.recursor.net
beta.antichef.net

and the new code will be invoked. ( Don't use them for anything permanent, as we'll probably retire these domains when we move the new code into "production")

but when I send a test message from gmail to multiple sg accounts, it's getting sent as multiple messages -- if there are two To: addresses and one CC: address, the code's getting invoked 3 times (not once, like I thought before), and I'm receiving *nine* messages (three for each address).

Does anyone remember how the original problem was happening? *Was* it happening?

This beta also addresses the matching of exclusive senders against everyone in the To: and CC: list, and includes support for date-expiring disposable addresses (eg, word.2005-04-28.user@beta.spamgourmet.net).
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Postby josh » Tue Apr 19, 2005 4:46 pm

OK -- testing with Syskoll, it appears that:

1) some mail servers decide to send one message for multiple recipients if they're on the same server (or perhaps if they have the same domain -- we didn't try with different domains). If on of the recips is BCC'ed and the others aren't, you don't get the BCC in the message at all (we haven't tested with the BCC address being the only recip).

2) other mail servers choose to send a separate message for each recipient (BCC or otherwise). The messages all have the same message ID, though. In this case, the BCC recipient is evident in the message sent for it, but absent in the others. The beta implementation sends multiple messages in this case.

In the first case, sg handles all the to: and cc: recipients once, but misses the bcc: altogether -- the production code would probably just get the first to: recipient. In the second case, the beta code blasts out a bunch of duplicate messages (and decreases the counts for each) -- the production code would deliver just one message to each recipient, including the bcc:

Ugh...

We have steadfastly avoided keeping message-level state across code invocations because of the probably large resources it would require. If there were some very fast way to keep state (handled message IDs) for say, 5 minutes (with garbage collection), that would make the beta code handle situation 2) correctly (and, as noted, it handles situation 1) better than production already, although not perfectly).

As for the absence of bcc: recipient data on a combined delivery, we may be doomed, because sendmail just isn't providing that info.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm

Postby josh » Tue Apr 19, 2005 11:41 pm

I added a new class called MessageIdChecker that attempts to maintain message Id state for 10 minutes from the first message it receives. So far, it's making the gmail scenario (separate messages for each recipient) work without duplicate messages. It may not be able to handle the volume without tweaking (or at all?), though.

It takes the message ID and makes a filename out of it (in the /tmp/spamgourmet directory) -- it takes the first 6 characters of the id and makes a directory out of that, then puts the file in there to keep the files separated into multiple directories (so the file system won't thrash as much when it scans. It has a cleanup routine that fires at startup which keeps another state file to indicate when the last cleanup occurred. If the state file is newer than 10 minutes old, it doesn't do the cleanup. If it's older, then it deletes directories that are older than 10 minutes. I'm thinking we need to use fewer characters for the directories (like 2) and then check the age of the files. Any way you look at it, this "garbage collection" is not fun under a heavy load.


Anyway, this is one approach. The logic for handling BCC messages is fairly subtle, but it's working. If we were to go with a daemon for the mailhandler, we could use an in-memory structure for this instead of files -- we wouldn't have to change the mail handler, only the MessageIdChecker implementation.
josh
 
Posts: 1371
Joined: Fri Aug 29, 2003 2:28 pm


Return to Developers

Who is online

Users browsing this forum: No registered users and 1 guest

cron