Eaten message log

Discussion re sg development. You don't have to be a developer.

Eaten message log

Postby Guest » Fri Aug 29, 2003 1:38 pm

By: maratheamit ( Amit Marathe )
Eaten message log
2002-12-09 08:51
Josh, what kind of disk space limitations are you running under at the SG hosting site? I was thinking of storing the last n eaten/forwarded messages on disk. This will be much faster than storing them in the database (which led to performance problems in the past). Another advantage would be that we could store the entire message rather than being restricted to a few hundred bytes per message.

The only concern is that we should not exceed our disk quota because of this. But if we implement this feature on a per user basis and keep it turned off by default we should stay well within the quota. In addition to allowing interested people to see where spam is coming from it can be a valuable debugging aid for us.

What do you think?
Amit


By: syskoll ( Fred )
Eaten message log storage
2002-12-09 17:52
Amit,

Good idea. How do you want to store the eaten message logs (EML)? As a flat file for each user? Limitting the length of the file to x bytes per message (e.g. 1K) and max N messages (e.g. N=20) would give, with these values, 20 KB per user. For 20,000 users, that's 40 MB. Josh, do we have that much space on the machine?

Also, how do you store and retrieve 20,000 files in a Unix filesystem? A simpleminded approach would be to compute an MD5-like hash on the user name and split it in it bytes, then use these values as subdirs names. Eg, if the 4-byte hash for user joe6pack is 5e76BF2D, put his EML file in directory EMLs/5e/76/BF/2D/joe6pack.txt. This ensures that the directories in this dir tree have 256 entries max, well within performance limits of the Linux FS.

Also, when we code the routines that save and display the EML, we could query the /proc/loadavg file (I hope this Linux version has it) and check the average CPU load. If the load is too high, we disable the routines (return immediately with a specific error code or something) . This way, we avoid a CPU overload disaster.

Your thought?

-- SysKoll


By: maratheamit ( Amit Marathe )
RE: Eaten message log
2002-12-09 19:00
I am envisaging a directory per user containing n flat files each containing an eaten message. We can compute a hash of the username and use it to create parent directories so that any single directory does not contain more than a few hundred entries.

Keeping each EML in a seperate file (rather than having one file per user) simplifies the management of these logs. i.e. when writing a new EML the perl code can simply overwrite the oldest file in the user's directory.

Space will get problematic if all users turn on the EML feature: your back-of-the-envelope calculation shows that at current usage we would need 400MB to store 20 logs. While we can play tricks like truncating the logs or storing only 10 of them, I suspect we still would not have enough disk space to accomodate all the current users.

But I think we should still go ahead and implement this idea in the backend spameater.pl. We don't have to expose it in the UI just yet and till the time we do that it can serve as a debug tool for developers.

-- Amit


By: syskoll ( Fred )
RE: Eaten message log
2002-12-09 21:08
Amit,

Keeping N separate files (N being the number of eaten message per user) makes sense for speeding up the program. Since the unit of disk space is one page (4K), we could decide to store the first 4 KB of the N most recent EMs.
There would be little sense in storing less.

As for storing the full message, that would expose the system to a DoS caused by a single mailbomb saturating the file system where the EMs are kept.

I agree we can already start the implementation of the EML, even if we can only deploy a limited test version of it on the SG server. We need some elements from you or Josh if you have shell access to the SG server:
1) What's the normal CPU load on the SG machine in normal use right now? (result of the "uptime")
2) What's the disk space we can use, at least for test? (Result of the "df" command)

Thanks,

--SysKoll


By: maratheamit ( Amit Marathe )
RE: Eaten message log
2002-12-11 07:37
Good point about how the system can become vulnerable to a DoS attack if we store the entire message. I hadn't thought about that...

It might be better to hold off on making any changes in CVS related to this project till we have tested and deployed the Mail::Audit version of SG. Otherwise it will become difficult to trace the cause of the bugs uncovered in testing.

-- Amit


By: syskoll ( Fred )
EML with space constraints
2002-12-11 10:23
I was looking at an older project where I had both tight disk space and extensive logging requirements. Just like the EML function on SG.

Our space constraints might well preclude the use of one file per eaten message. One file is 4K minimum.

The way I solved that is the following:
1) Have 2 logfiles, A and B, of a few K each, sized so that each user can have a pair of log and not saturate the system. The eaten messages (EM) will be stored in these files.
2) Decide of a maximum length that will be stored for each eaten message (say, between 1K and 4K).
3) Append that many bytes of the EM onto a logfile
4) When one of the log file is full, truncate the other to zero and start appending the other.

If we have 320 megs on the server and 20,000 users, that's 16K per user, or 2 log files of 8 K each. If we store the first 1024 bytes of each EM, that's enough for being able to see at least the last 8 (and up the to last 16) EMs.

Of course that's still way too much for our small SG host!

The benefit of doing this is that it's easier to accomodate tight disks than with 1 file for each EM.

Your thoughts?

-- SysKoll
Guest
 

Return to Developers

Who is online

Users browsing this forum: No registered users and 13 guests

cron