K9 - Whitelist/Blacklist Filtering


K9's advantage as a spam email filtering program is that it doesn't need to maintain a list of constantly updated rules. However, there are occasions when using simple rules to match email content can be useful, especially when K9 is first used and in its initial learning phase or if you would like to ensure that you always classify email from a certain person as non-spam for example
Stacks Image 228
Rules are set in two text files that can either be accessed directly through the file system or, more conveniently by simply enabling and clicking the Edit button for the whitelist or blacklist in the Advanced page.
Stacks Image 229
To make creating filter lists rules even easier you can right-click any message in the window (e.g. Recent Emails) and select Whitelist or Blacklist from the menu and add the required rules by following the directions.
Whitelist rules always have precedence over the blacklist rules and when a match occurs the email is given the highest (100%) or lowest (0%) possible spam ranking depending on whether the match was made by the blacklist or the whitelist rule respectively. A match made with a list rule will effectively overrule the statistical analysis (the K9 databases will still be updated) so you should be sure that any you create are not prone to mis-identifying messages. If you create rules in the blacklist file it is advisable to also create whitelist rules to ensure you never mark a friend's email as spam.

It should also be noted that if you attempt to re-classify an email identified using the word/phrase matching lists K9 will re-classify them but if you Score the emails again they will still show 0% or 100% depending on the list rule. This is because as previously mentioned the word/phrase rules override the statistical rules. In a situation where an email has been mis-identified like this you should analyze your rule lists and remove or change the matching phrase accordingly so that it does not occur again.


A line in the whitelist or blacklist looks like this:

Subject nocase contains :university diploma

The line comprises a keyword (in this case Subject) followed by several optional modifiers separated with one or more spaces, followed by a colon character. Immediately following the colon is the word or phrase you are searching for. Do not place a space after the colon unless you intend that to be part of the word or phrase you are searching for. If you have any doubts about the correct syntax of a rule when editing the file directly it is highly recommended that you instead use the right-click context menu (see above) where the hard work is done for you automatically.

Keywords are used to narrow down the particular area within an email that you want to search in. K9 employs 6 special keywords:

Any
Look in the entire email, including the headers.

Header
Only look in the headers section.

Subject
Only look in the Subject header line.

From
Only look in the From header line.

To
Only look in the To header line.

Cc
Only look in the Cc header line.

Bcc
Only look in the Bcc header line.

Body
Look in the body of the email only. The headers will not be searched.


Keyword modifiers are used to specify exactly how the match is made.

Contains
A match is made if the word/phrase is found anywhere in the searched area. This is the default action.

Equals
The entire word/phrase must match the search area in its entirety.

Starts
A match is made if the searched area starts with the given word/phrase.

Ends
A match is made if the searched area ends with the given word/phrase.

Matches
A match is made if the regular expression matches the searched text. See below to learn how to enable and use regular expressions.

Case
The case (upper or lower) of the word/phrase must match exactly.

Nocase
The case (upper or lower) of the word/phrase does not matter. This is the default modifier.

Not
Negates the logic of the test.


Here are some examples.

Any :sildenafil
Matches the word sildenafil anywhere in the entire email and does not care about the case of the word.

Body case :CLICK HERE
Matches the phrase CLICK HERE if it occurs in the email body (not the headers) and only if it is an exact case match. Be aware that searching for long phrases in the body of an email may not trigger when expected because phrases in the email may be broken across paragraphs or separated by HTML comments or other disguising factors. When in doubt do not add the rule since more than likely the email will be correctly scored by the statistical analysis alone.

From :<big@boss.com>
Matches the word <big@boss.com> if it is contained in the email header From field.

Header :X-YahooFilteredBulk
Matches the word X-YahooFilteredBulk if it is found anywhere within the email headers section, regardless of case.

Subject : Z
Matches the single letter Z if it is found anywhere in the email header Subject field.

Header case :FROM: "Microsoft
Matches the phrase FROM: "Microsoft if it is found anywhere within the email headers section, but only if the case is the same.

Subject :
This one is not so obvious from looking at it here, but following the colon I've placed 5 space characters. This rule will therefore match any Subject line that contains 5 or more spaces - a common trait of spams.

Header case not contains :To:
Matches if the header section of the email does not contain the exact word To:

Finally, you can add comments to the filter list files so that you can remember what the rules are used for in case they aren't obvious. To enter a comment simply begin the line with a # character, like this:

# This is a whitelist rule to always accept emails from Fred.
From :fred@hotmail.com


Regular Expressions


Regular expressions in K9 are made possible through the use of a free 3rd party library, called PCRE. Before you can use regular expressions with K9 you must download the PCRE package and place the pcre.dll file in K9's directory. So, firstly go to this page and download the Binaries package. You can choose either the Setup version or the ZIP version.

http://gnuwin32.sourceforge.net/packages/pcre.htm

If you chose the ZIP package, open the ZIP file and extract the file pcre.dll in the bin directory and place it in K9's directory. K9 is usually installed in C:\Program Files\KeirNet\K9.

Note: newer versions of PCRE have had the name of the pcre.dll changed with the addition of a number at the end, for example pcre3.dll. Simply rename the file to pcre.dll and K9 will recognize it.

If you chose the Setup package, run the program to install the package then locate the directory where it was installed. By default this is at C:\Program Files\GnuWin32. In the bin directory at this location copy the file pcre.dll and place it in K9's directory. K9 is usually installed in C:\Program Files\KeirNet\K9.

Once the pcre.dll file has been placed in K9's directory, re-start K9 and it will then have the ability to use full regular expressions by using the "matches" keyword in the whitelist and blacklist files. You can quickly test if regexes are working by clicking on the Test Regular Expression... button on the Advanced page.

A regular expression, or simply "regex" is a "formula" for identifying a text pattern within a block of text. Regular expressions can help when there are several possibilities for the type of text string you are looking for. For example, if you wanted to know if the text string viagra was contained in the subject line of an email you could enter the K9 rule

Subject contains :viagra

This would even account for the word being in uppercase or mixed case since the default modifier is to ignore case. However, what if the email subject contained the word V1AGRA (the letter 'I' was replaced with a digit '1')? To you and I it is quite obvious that the word is still meant to be VIAGRA but the rule we created above wouldn't spot it. We could add another rule to look for the word "V1AGRA" but then maybe the word was spelled "V1AGR@". We end up creating possibly dozens of rules to try to identify all the possible variations. This is the beauty of regular expressions! A very simple regex to catch the first example would be

Subject matches :v[i1]agra

This simply says match a "v" followed by either a "i" or a "1" followed by the letters "agra".

I cannot attempt to teach you how to create regular expressions. Some can be simple like the one just shown while others can appear to be tremendously complicated and hard to understand. I suggest you do a Google search for regular expressions to find out more.

If you are interested, here's a selection of regular expressions to catch many common spam word variations and harmful attachments. You can copy and paste this section of text directly into your blacklist.txt file if you like, although be warned that if you expect to receive any of these words in your regular non-spam email they will be blacklisted, so use at your own discretion.

# Fake Microsoft attachments
Any nocase matches :(?s)From:[\s]*[\S]*\@microsoft.com\r\n.*\[Attachment\!

# Xanax
Any nocase matches :x.{0,2}[a@].{0,2}n.{0,2}[a@].{0,2}x
Any nocase matches :[a@].{0,2}x.{0,2}[a@].{0,2}n.{0,2}x

# Cialis
Any nocase matches :c.{0,2}[\|li1\!].{0,2}[a@].{0,2}[\|li1\!].{0,2}[\|li1\!].{0,2}s

# P*nis
Any nocase matches :p[\W_]{0,2}[e3][\W_]{0,2}[n][\W_]{0,2}[\|li1\!][\W_]{0,2}[s5]

# Viagra
Any nocase matches :v.{0,2}[\|li1\!].{0,2}[a@].{0,2}g.{0,2}r.{0,2}[a@]

# Common executable attachments
Body nocase matches :\[Attachment!(exe|bat|cmd|pif|scr|com):


Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel, and copyright by the University of Cambridge, England.

Information about PCRE can be found at http://www.pcre.org/

Source code can be found at ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/