Taking control of your spam: Part 1 - Sorting by spam level
Written by Kevin Dommer
Although cPanel allows some basic configuration of SpamAssassin to help you handle your spam, it is a bit limited. I’ve seen some cPanel demos around
the web and some seem to have features and options available that others do not. I’m not sure about the discrepancy, but this page aims to help other
cPanel users take better control of their spam. For example, I have seen a “Spam Box” option in some cPanel demos but I don’t have this option in my
cPanel with my hosting company. But not to worry, because you can still set up a “spam box”, and that is just one of the ways you can help sort your spam.
While the instructions on this page work with my particular cPanel setup, I imagine they should work with most cPanel setups offered by various hosting
companies.
I am going to try to write this in as clear a way as possible, but there is only so much you can do with a technical document and I am not a technical writer.
Before I begin, the examples I will give are just a guideline. There are several ways that one can achieve handling of spam. Some ways are easier, some are
harder. I’ve been using SpamAssassin for around 9 years at the time of this writing and this is how I have come to prefer handling my spam. Over the years I
have gathered bits of information from various online sources on how to do some of these things, and some of it I have come up with myself through
experimentation. You can follow this to the letter or change things a bit to suit your own needs or tastes.
This tutorial is presented with some assumptions as listed below. I am not going to go into too much detail about these things because they are something
that should be understood to a certain degree if you decide you want to follow this tutorial. In some cases my instructions may be enough even if you are
treading in new territory but prior knowledge or experience in these things may help.
ASSUMPTIONS:
1) You know what SpamAssassin is and have a basic idea of how it works (rule-based spam filtering which assigns a spam score to every email it scans). For
2) You are familiar with cPanel and know how to get around in it.
3) You know how to access and edit plain text files on your site (whether through cPanel’s File Manager or by using your favorite FTP application) and also
how to adjust file permissions (CHMOD). These will come into play more for bayes training.
4) You know how to set up email accounts on your site.
5) You know how to access the email in your accounts (both from your PC and through Webmail).
6) You understand that this is merely a guide and no guarantees are made as to the results you will receive. Incorrect or careless settings could possibly
result in missing emails! I assume no liability for any problems or damages that may arise from following this guide.
7) The settings I use are for a single domain (no sub-domain) and although we have several email addresses, they are all for a single household and
therefore we have no issues with privacy in the sense that we don’t mind if someone else accidentally sees an email not intended for them. (This can
happen when going through a “spam box” and when training SpamAssassin’s Bayesian database in part 2). There are steps you can take to minimize this,
but it is likely to happen at some point. This tutorial is not meant for resellers or managers of websites with various clients who have individual (private)
email accounts through your website. I will include some additional info on how to maintain a bit of privacy while still training your bayes database.
8) Advanced settings and adjustments like this are outside of the normal scope of most web host providers’ support. While they MIGHT help with minor
issues, they should not be expected to do so. You may try our forum here if you’d like more help with anything on this page.
Setting up your multi-level spam handling
The goals:
* We will designate a minimum score for SpamAssassin to mark an email as spam.
* We will designate a slightly higher score and set up a filter to send those messages to another mailbox (a “spam box”) so they do not come into
your inbox every time you check your email.
* We will designate an even higher score and set up another filter to send those messages to yet another mailbox, with the eventual goal being
that those very high scoring spam mails will be deleted without you ever having to look at them (optional but recommended after taking some
time to verify your settings).
How To:
1) Set the number of hits required before a mail is considered spam. I believe the default is 5. Over time you may want to adjust this number up or down.
For our purposes here, this number is not critical, but ideally it will only flag a message as spam if it really is in fact spam.
a) In cPanel, click on the SpamAssassin shortcut
b) Click the ‘Configure SpamAssassin’ button
c) For required_score, enter 5
d) Click ‘Save’ at the bottom to save the changes
2) Back in the SpamAssassin configuration page you will notice that there is a Spam Auto Delete option (and you may have a Spam Box option depending on
your host provider). While at some point the plan is to use these features, I do not recommend setting them here! We will do this manually later as it will
give us more control, and even after that I do not recommend changing the settings here because it can cause problems with the filters we will create. In
short, do not change any of these ‘Spam Auto Delete’ or ‘Spam Box’ settings here. With that warning out of the way, lets set up some new email accounts
and some account-level mail filters!
a) Back at your main cPanel page, click on the Email Accounts shortcut
b) Create a new email account called spam (you can use any name you want, but I will refer to it as spam in this tutorial). This is where our mid-range
scoring spam will get redirected to.
c) Create another new email account called spam2 (you can use any name you want, but I will refer to it as spam2 in this tutorial). This is where our high
scoring spam will get redirected to, and the goal is to eventually delete this account and have the emails being sent here get immediately deleted instead.
Emails scoring this high are always spam, so straight to the trash they will go (eventually)!
d) Back at your main cPanel page, click on the Account Level Filtering shortcut
e) NOTE: If you already have other account level filters in place, you will need to decide in what order you want them processed. The order they are
listed in the filter list is the order that they are processed when your mail comes in. Incorrect filter order can cause unexpected results! I have no way of
knowing what other filters you might have so use your best judgment, but generally speaking I would think that you would want these new spam filters
first. We will assign numbers to the beginning of our new filters to help remind us what order they need to be in (very important). cPanel always puts the
last filter you edit at the end of the list. If they end up out of order, go back to a filter you want to move down and edit then activate it to move it down.
f) Click the ‘Create a New Filter’ button. This will be our filter for very high spam. Initially you want to set this pretty high (we will start with 15), but you
will bring this down quite a bit over time.
*) Filter Name: #1: Spam High
*) Rules: [Spam Bar] [Contains] +++++++++++++++ (note that is 15 + signs)
*) Actions: [Deliver to folder]
*) Click the dropdown box that appears, click the + sign next to your domain name, then click on spam2. The box should then say /yoursite.com/spam2
*) Click the [+] button off to the right to add another action.
*) For the second action, choose [Stop Processing Rules]. If you don’t do this, then high spam will be caught again in the next filter and routed to your
mid-range spam box rather than the high spam box. We don’t want that to happen!
*) Click Activate to activate the filter then click to go back to the main filter page.
g) Click the ‘Create a New Filter’ button again. This will be our filter for mid-range spam. This should be a number higher than the minimum spam score
you set earlier, but not too much higher. If you used the default of 5 earlier, maybe set this to 7 (+++++++). The idea is that any spam below this number
(spam score of 6.9 or lower) will go to your inbox as normal because it may not really be spam and you don’t want to miss it. Spam with a score between
this number and your “high” number is most likely spam but we can’t really be sure, so we will redirect it to our mid-range spam box and check it
periodically.
*) Filter Name: #2: Spam Mid
*) Rules: [Spam Bar] [Contains] +++++++ (note that is 7 + signs)
*) Actions: [Deliver to folder]
*) Click the dropdown box that appears, click the + sign next to your domain name, then click on spam. The box should then say /yoursite.com/spam
*) Click Activate to activate the filter then click to go back to the main filter page.
h) Now back on your main filter page, be sure that it lists your two new filters and that they are in the correct order. Remember that if you edit one, it
may change the order on you. Edit the other one and activate again to bring it down to the bottom. #1 (Spam High) should be first in the list and #2 (Spam
Mid) should be second.
You might be thinking: So what did I just do and what is going to happen to all of my email? Here’s a quick breakdown:
1) All emails determined to be “ham” (not spam) will be delivered to whichever mailbox they were originally intended for. Email sent to
you@yourdomain.com will still arrive in your inbox. Email sent to otheryou@yourdomain.com will still arrive in that inbox.
2) Any emails that SpamAssassin flags as spam with a score below the number of + signs you designated in your second filter (mid-range spam) will still
arrive in their originally intended mailbox as mentioned above, with the exception that the subject line will be modified to say that it is spam. Don’t panic
if this happens to a legitimate email. We can train SpamAssassin later, and so long as it is a relatively low spam score there really is no harm done anyway.
After all, the email still arrived in your inbox, right?
3) Any emails that SpamAssassin flags as spam with a score at or above the number of + signs you designated in your second filter and below the number of
+ signs you designated in your first filter will be routed to your spam mailbox. This puts it all in a handy spot that you can check periodically to make sure
you didn’t miss out on an email that was incorrectly flagged as spam for some reason. Should you ever find a legitimate email here you can easily forward it
to its original recipient (i.e. you) through webmail or however you access this mailbox and it should then arrive in your normal inbox. We will also use the
email here to train SpamAssassin in the next part of this tutorial. You can add this account to your normal email client (such as Outlook Express) if you'd like
to check it regularly, or only check them through webmail (which is what I do) so I don't have to look at them every day.
4) Any emails that SpamAssassin flags as spam with a score at or above the number of + signs you designated in your first filter will be routed to your spam2
mailbox. These will be high-scoring spam and as already mentioned, eventually the goal is to delete these without ever seeing them. Again, you can add
this account to your normal email client (such as Outlook Express) if you'd like to check it regularly, but emails making it to this mailbox are generally always
spam so there really is no need to check them regularly.
5) Note that SpamAssassin only examines messages below a certain size in order to prevent it from choking on large emails and slowing the server down (I
don’t remember the exact size). What this means is that emails with large attachments or lots of images don’t get scanned by SpamAssassin at all. Because
of this, some spam can slip right through with ease (this also applies to emails from people in your blacklist). Fortunately, very little spam mail is ever larger
than the imposed size limit.
So what kind of immediate results can you expect from all of this? Well, I almost guarantee that you will continue to receive spam in your inbox when you
check your email. Some should be properly flagged as spam, some may be flagged as spam but it is really a legitimate email, some spam will not be flagged
as spam at all. In other words, you likely won’t see much of an immediate change. This is where the tweaking begins!
In my examples above, I purposely suggested very conservative numbers for your required _score as well as the required spam levels for your two email
filters. This is to prevent you from missing any important emails, but as a result, your multi-level spam handling will not be very effective until you tweak
the numbers, so read on as we get into that. After careful testing of my PERSONAL spam situation, extensive bayes training and tweaking of scores assigned
to specific SpamAssassin tests, my personal settings are: required_score 3.7, Mid-range spam score (minimum score to get placed into my spam box) is 5,
and all emails with a score of 8 and higher get deleted. Do not use these settings for yourself! You really must take your time and do lots of bayes training
before you can begin to tighten things down like this.
When email is flagged as spam, you can see what kind of score it got by examining the email headers. Using this information, you can then further tweak
your required_score number as well as adjust the levels at which your spam gets sorted to your other mailboxes. You may choose to also set up your email
client to check messages in your spam and spam2 accounts, but I prefer to do that through webmail. Initially you will probably find much more spam in your
normal email account’s inbox than in the spam and spam2 accounts because the scores required for making it to the spam and spam2 account are pretty
high.
You can also tweak scores for certain SpamAssassin tests, which will help increase the effectiveness of your multi-level filtering. For example, a lot of
spam I receive gets points added for a test called “RCVD_IN_BL_SPAMCOP_NET”. This particular SpamAssassin test looks at a public internet blacklist to see
if it from a known spammer. While I suppose anything is possible, a hit on this test almost guarantees that it is spam. cPanel allows you to tweak scores for
tests and it is easy to do. I forget what the original score for this particular test is, but I have increased it in my SpamAssassin configuration file. Here’s how:
From your main cPanel screen, click on the SpamAssassin icon then choose ‘Configure SpamAssassin’. There should be some blank boxes next to labels
called score. For this example, type (or copy & paste) the following into that box:
RCVD_IN_BL_SPAMCOP_NET 3.5
Save your changes, then go back to your configuration and you should see your new test score has been set. The next time an email comes in and gets a hit
on that test, it will now get 3.5 points added to the score. Obviously this increases the likelihood that this spam mail will be bumped up into your mid-
range spam box. There are several other tests that I have adjusted the scores on, including Bayes tests, but the points you assign to these tests should
always be slowly tweaked over time. While the above sample just about ALWAYS indicates spam, the problem is that just because a test gets a hit on a
piece of spam, it does not mean that it ALWAYS indicates spam. It is also a good idea (before you begin to get more aggressive with your spam filtering) to
utilize the whitelist option in SpamAssassin to whitelist all of your friends and other important email addresses that you want to prevent from getting
flagged as spam. Whitelisting an address automatically assigns a score of -100 to the email, thus eliminating the possibility of a false-positive. You can
easily add email addresses to your whitelist and blacklist through the ‘Configure SpamAssasin’ page in cPanel.
As each day passes, you will get ham and spam coming in (just as you already have been). Your job now is to look at all of them and see what kind of scores
your spam is getting and what kind of scores your ham (non-spam) is getting. Then SLOWLY adjust your required_score as well as the number of + signs that
determine how to route the spam with your filters. It is important to resist the urge to use aggressive numbers right away as this will only lead to increased
false-positives. My personal goal is all legitimate email and little to no spam making it to my normal inbox, less than 20-25 spams making it to my mid-range
spam box (per WEEK) with no false-positives, and all the rest making it to my high spam box (which is actually just deleted now), and after a couple months
of tweaking and BAYES TRAINING, I have reached that goal. I will cover bayes training in another installment. It is a bit more complicated and involved than
what we’ve covered here but the rewards can be quite worth it.
Earlier I mentioned that the eventual goal for the high scoring spam was to get rid of it altogether and never see it. Also as mentioned, I am at that point
now, but you must give it time and wait until you are CERTAIN that nothing legitimate ever makes it to that last high spam box. Once you are sure of that,
you can make these final changes below. This will cause all of these high-scoring spam emails to get deleted immediately. Warning: You will never see
them and there is no way to ever get them back. If you later decide you do not want to automatically delete high scoring spam anymore, change the rules
back to how you had them set before (as described above).
1) From your main cPanel screen, go into Account Level Filtering and click ‘Edit’ next to your first (#1 Spam High) filter.
2) For the first action (Deliver to folder), change it to [Discard Message] and click Activate. Be sure to leave the [Stop Processing rules] action in place.
3) Go back to the main filters page. You will notice that cPanel has now moved your “#1” filter below #2. As mentioned earlier, that’s not what we want and
that will allow all those high spams into your mid-range spam box. To fix this problem, click on ‘Edit’ next to the “#2” filter to bring up the filter’s settings.
Click Activate and go back to your filter page. They should now be in the correct order.
4) If you are sure you will no longer ever want to keep that high scoring spam again, go ahead and delete the Spam2 email account.
I should also mention that no matter how you do it (whether directly through your favorite email client or Webmail), you might want to periodically empty
the mail out of your Spam and Spam2 mailboxes, especially if you are concerned about a mailbox size quota. If you are not doing any bayes training, there
is no need to keep this extra spam at all once you are done checking for false-positives and determining the scores and if/how you want to tweak your
settings. If you will do bayes training, you will want to hang on to them in order to feed them into SpamAssassin. On that note, there is a setting you can
adjust in SpamAssassin to automatically learn spam over x score. This is what I do with the really high spam that my filter deletes. I never see it so I can’t
use it to manually train SpamAssassin, but I don’t need to because SpamAssassin automatically learns it as spam for me as soon as it comes in!
See part 2 right below for information on Bayes Training in SpamAssassin. This allows you to teach SpamAssassin what is legitimate email (“ham”) and what
is spam. It takes a while before the bayes filter kicks in (SpamAssassin does not use bayes tests until it has learned at least 200 ham and 200 spam
messages), but once it does, SpamAssassin’s accuracy goes up pretty quickly. What’s more, you can assign higher scores to bayes tests, such as “BAYES_99”,
which means that SpamAssassin is 99% sure that the email is spam based on bayes testing. Armed with that, you can assign it a higher score and get it out of
your inbox (and possibly even have your #1 filter (Spam High) delete it automatically should you so choose).