[ Home | What's New | Software | Links | About | Forum | Email | Donate ]
|
K9's advantage as a spam email filtering program is that it doesn't need to maintain a list of constantly updated rules. However, there are occasions when using simple rules to match email content can be useful, especially when K9 is first used and in its initial learning phase or if you would like to ensure that you always classify email from a certain person as non-spam for example
Whitelist rules always have precedence over the blacklist rules and when a match occurs the email is given the highest (100%) or lowest (0%) possible spam ranking depending on whether the match was made by the blacklist or the whitelist rule respectively. A match made with a list rule will effectively overrule the statistical analysis (the K9 databases will still be updated) so you should be sure that any you create are not prone to mis-identifying messages. If you create rules in the blacklist file it is advisable to also create whitelist rules to ensure you never mark a friend's email as spam. It should also be noted that if you attempt to re-classify an email identified using the word/phrase matching lists K9 will re-classify them but if you Score the emails again they will still show 0% or 100% depending on the list rule. This is because as previously mentioned the word/phrase rules override the statistical rules. In a situation where an email has been mis-identified like this you should analyze your rule lists and remove or change the matching phrase accordingly so that it does not occur again.
A line in the whitelist or blacklist looks like this: Subject nocase contains :university diplomaThe line comprises a keyword (in this case Subject) followed by several optional modifiers separated with one or more spaces, followed by a colon character. Immediately following the colon is the word or phrase you are searching for. Do not place a space after the colon unless you intend that to be part of the word or phrase you are searching for. If you have any doubts about the correct syntax of a rule when editing the file directly it is highly recommended that you instead use the right-click context menu (see above) where the hard work is done for you automatically. Keywords are used to narrow down the particular area within an email that you want to search in. K9 employs 6 special keywords:
Keyword modifiers are used to specify exactly how the match is made.
Here are some examples.
Any :sildenafil
Body case :CLICK HERE
From :<big@boss.com>
Header :X-YahooFilteredBulk
Subject :à
Header case :FROM: "Microsoft
Subject :
Header case not contains :To: Finally, you can add comments to the filter list files so that you can remember what the rules are used for in case they aren't obvious. To enter a comment simply begin the line with a # character, like this:
# This is a whitelist rule to always accept emails from Fred. Regular ExpressionsRegular expressions in K9 are made possible through the use of a free 3rd party library, called PCRE. Before you can use regular expressions with K9 you must download the PCRE package and place the pcre.dll file in K9's directory. So, firstly go to this page and download the Binaries package. You can choose either the Setup version or the ZIP version. http://gnuwin32.sourceforge.net/packages/pcre.htm If you chose the ZIP package, open the ZIP file and extract the file pcre.dll in the bin directory and place it in K9's directory. K9 is usually installed in C:\Program Files\KeirNet\K9. Note: newer versions of PCRE have had the name of the pcre.dll changed with the addition of a number at the end, for example pcre3.dll. Simply rename the file to pcre.dll and K9 will recognise it. If you chose the Setup package, run the program to install the package then locate the directory where it was installed. By default this is at C:\Program Files\GnuWin32. In the bin directory at this location copy the file pcre.dll and place it in K9's directory. K9 is usually installed in C:\Program Files\KeirNet\K9. Once the pcre.dll file has been placed in K9's directory, re-start K9 and it will then have the ability to use full regular expressions by using the "matches" keyword in the whitelist and blacklist files. You can quickly test if regexes are working by clicking on the Test Regular Expression... button on the Advanced page. A regular expression, or simply "regex" is a "formula" for identifying a text pattern within a block of text. Regular expressions can help when there are several possibilities for the type of text string you are looking for. For example, if you wanted to know if the text string viagra was contained in the subject line of an email you could enter the K9 rule Subject contains :viagraThis would even account for the word being in uppercase or mixed case since the default modifier is to ignore case. However, what if the email subject contained the word V1AGRA (the letter 'I' was replaced with a digit '1')? To you and I it is quite obvious that the word is still meant to be VIAGRA but the rule we created above wouldn't spot it. We could add another rule to look for the word "V1AGRA" but then maybe the word was spelled "V1AGR@". We end up creating possibly dozens of rules to try to identify all the possible variations. This is the beauty of regular expressions! A very simple regex to catch the first example would be Subject matches :v[i1]agraThis simply says match a "v" followed by either a "i" or a "1" followed by the letters "agra". I cannot attempt to teach you how to create regular expressions. Some can be simple like the one just shown while others can appear to be tremendously complicated and hard to understand. I suggest you do a Google search for regular expressions to find out more. If you are interested, here's a selection of regular expressions to catch many common spam word variations and harmful attachments. You can copy and paste this section of text directly into your blacklist.txt file if you like, although be warned that if you expect to receive any of these words in your regular non-spam email they will be blacklisted, so use at your own discretion.
# Fake Microsoft attachments
Any nocase matches :(?s)From:[\s]*[\S]*\@microsoft.com\r\n.*\[Attachment\!
# Xanax
Any nocase matches :x.{0,2}[a@].{0,2}n.{0,2}[a@].{0,2}x
Any nocase matches :[a@].{0,2}x.{0,2}[a@].{0,2}n.{0,2}x
# Cialis
Any nocase matches :c.{0,2}[\|li1í\!].{0,2}[a@].{0,2}[\|li1í\!].{0,2}[\|li1í\!].{0,2}s
# P*nis
Any nocase matches :p[\W_]{0,2}[e3][\W_]{0,2}[n][\W_]{0,2}[\|li1í\!][\W_]{0,2}[s5]
# Viagra
Any nocase matches :v.{0,2}[\|li1í\!].{0,2}[a@].{0,2}g.{0,2}r.{0,2}[a@]
# Common executable attachments
Body nocase matches :\[Attachment!(exe|bat|cmd|pif|scr|com):
Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel, and copyright by the University of Cambridge, England. |
[ Home | What's New | Software | Links | About | Forum | Email | Donate ]