Here's what's going on now and how it relates to delayed messages and the queues:
During the time that we were down completely, mail servers that wanted to send mail to spamgourmet couldn't, so presumably they queued it (some may have given up right away -- we can't know), and started to retry at some regular interval.
When we came back up, the server was, of course, hit with a huge number of connect requests -- more than it could handle. Those remote servers who made a connection got their mail into our server. Those who couldn't kept trying. As time has worn on, the number of requests is only now consistently falling below our ability to handle them. If you were following the news here, you'll know that some sg users had been scripting addresses, and so had caused a really high load -- this, along with our normal traffic, compounded while we were down, and no doubt continued to compound (albeit at a rate that was slower than our deliveries) as the server was maxed out and rejecting connections. For years, our ratio of eaten to delivered mail was about 10 to 1. Recently, it's
been more like 10 to 2.5, which is unhealthy for us at the volume we're handling. I'm not sure if this is because more users have figured out how to make addresses forward permanently (there's a reason for why we discouraged this
) or if it's something else, but the ratio has changed.
As of now, we're still spiking up to the point where we reject connections, but that percentage of the time has gotten small and will hopefully stop altogether soon. We had also taken some steps to reduce the effect of the scripted addresses, and this is helping too.
No doubt some mail servers have tried at all the wrong times and have given up trying to deliver messages. Others are likely still trying -- for these, messages are "en route" but not yet in our queue.
What can we learn from this? 1) basically that our service is pretty much at capacity, and that any downtime (and the resulting build-up in to-be-delivered mail) pushes it over the edge into a delay condition, and 2) we probably need to figure out how to restore our normal ratio of eaten to delivered messages -- this *may* have to involve a change to the exclusive sender logic.
Issue number one could probably be solved with better hardware, but as you all know we don't have the budget for that -- what we can do is refactor the software to run more efficiently. From here on, all our spamgourmet efforts will be directed at that.