Mail Sorting Strategies

Think a moment about how you read mail. If you're like me, there are certain people whose messages you want to read right away and others whose messages you'd rather not read at all. If you are subscribed to a mailing list or two, you might want to group those messages together. Some messages you'll want to keep for years and be able to find after a few minutes of searching. Some messages are from colleagues and others are from friends. Your work life and home life are separate—why isn't your work and home mail? If you recognize these small frustrations using email, you, my friend, need automated mail sorting.

Most mail programs offer the option (if you can find it) to sort messages into different mailboxes based on information found in email headers. For instance a really simple strategy would be to sort mail to boxes based on the From header. If I sent you a message, your mail program might create a box named "Jon Ericson" and move my messages there. Or you could sort everyone who sends you mail into the "Friends" and "Foes" mailboxes. As a matter of fact, most of my sorting rules make use of the From header. I can normally predict what sort of things a person will mail me about.

If you want to keep mailing list messages together, the From header is unreliable and awkward. That's because list server software preserves the From header when it forwards mail to its list. Fortunately it normally adds other headers for the purpose of allowing easy sorting and as a sort of signature. A fairly reliable thing to look for is the address of the list in the Sender header.

So far we've looked at "whitelist" strategies: sorting messages from people you want to receive messages from. Most people are driven to mail sorting because they want to "blacklist" certain types of messages. Most of us would like to never again see the get-rich-quick schemes, Viagra and diploma sales pitches, invitations to visit pornographic web pages, and who-knows-what-they-say foreign language emails. The From header is rarely useful, since spammers tend to use some bogus Hotmail, Yahoo! or AOL address.

In the epic battle between large ISPs (who get the majority of complaints) and the spammers (who are attracted to the huge number of accounts) victory goes to the innovator. When ISPs blocked spam with identical Subject headers to thousands of subscribers, spammers began including random numbers pushed to the end by spaces. So a simple blacklist rule is to junk subjects with more than 2 spaces in a row.

Of course you can also censor based on certain words in the Subject. My earliest spam filter was to junk messages with the string 'sex' in the Subject. Naturally I lost a message a few days later because one of my colleagues misspelled consecutive. The first lesson to learn is that you should never throw mail away without glancing over it first. For this reason, my blacklisted mail goes to the Junk mailbox, not to the trash.

The second lesson is to be sure to filter words not strings. Maybe I don't want to read what strangers might have to say about sex, but perhaps I would be interested in messages about sextants. (On the other hand I suppose I don't care about sextons.)

Perhaps the most important lesson though is to whitelist filter aggressively before you blacklist. You know from deduction that emails from coworkers are usually about work. You know from induction that emails about sex are usually spam. So your whitelist rules tend to be more certain than your blacklist rules.

Here's the code (Emacs lisp) that splits my mail:

(setq nnmail-split-methods 'bbdb/gnus-split-method) (setq nnmail-split-fancy '(| (: nnmail-split-fancy-with-parent) (from mail "INBOX") ("sender" "owner-advanced-sl@multimanpublishing.com" "ASL") ("sender" "cygwin-announce-owner@cygwin.com" "Cygwin") ("sender" "xemacs-announce-admin@xemacs.org" "xemacs") ("mailing-list" "perl.org" "perl") (to "all.personnel@list.jpl.nasa.gov" "NASA") ("mailing-list" "swish" "swish") ("content-type" "text/html\\|big5\\|gb2312\\|ks_c_.*" "Junk") ("subject" ".*[1²°¶÷].*\\| [0-9A-Z]\\|panties" "Junk") ("subject" "earn" "Junk") ("message-id" "tw" "Junk") (from "jpl.nasa.gov" "NASA") (from "raytheon.com" "Raytheon") ("precedence" "bulk" "Junk") (to "Ericson\\|jlericson" "INBOX") (to "jlericson@yahoo.com" "INBOX") ; ("x-mindspring-loop" "jlericson@yahoo.com" "INBOX") "Junk"))

The first line sorts mail from people I know using Insidious Big Brother Database. It's basically an address book that links with Gnus (the News/Mail reader for Emacs).

(setq nnmail-split-methods 'bbdb/gnus-split-method)

Then I sort mailing lists:

(setq nnmail-split-fancy '(| (: nnmail-split-fancy-with-parent) (from mail "INBOX") ("sender" "owner-advanced-sl@multimanpublishing.com" "ASL") ("sender" "cygwin-announce-owner@cygwin.com" "Cygwin") ("sender" "xemacs-announce-admin@xemacs.org" "xemacs") ("mailing-list" "perl.org" "perl") (to "all.personnel@list.jpl.nasa.gov" "NASA") ("mailing-list" "swish" "swish")

Next, get rid of likely spam:

("content-type" "text/html\\|big5\\|gb2312\\|ks_c_.*" "Junk") ("subject" ".*[1²°¶÷].*\\| [0-9A-Z]\\|panties" "Junk") ("subject" "earn" "Junk") ("message-id" "tw" "Junk") (from "jpl.nasa.gov" "NASA") (from "raytheon.com" "Raytheon") ("precedence" "bulk" "Junk") (to "Ericson\\|jlericson" "INBOX") (to "jlericson@yahoo.com" "INBOX") ; ("x-mindspring-loop" "jlericson@yahoo.com" "INBOX") "Junk"))


Home
Jon Ericson
Last modified: Mon Feb 23 18:07:17 PST 2004