K9 - Advanced Training

The steps outlined here are not required in order to begin using K9 but they will provide it with a head start in its learning process. You can skip this entire section if you are unsure how to perform the tasks here.

Initial Feeding

When you run K9 for the first time you will notice that it creates 5 sub-directories under an Emails directory, found where you installed K9.

Grab all of the spam emails you can find and copy them into the Spam directory. K9 recognizes any text files that contain the entire plain text version of emails, including headers, such as those exported from Outlook Express (*.eml), which you can drag and drop directly from OE into the directory. Do not attempt to place proprietary binary email format files here - K9 won't understand them - they must be plain text files.

Grab all of your known "Good" emails and copy them into the Good directory, similar to above.

Note: It isn't necessary to keep all of these emails in these folders for K9 to work but it is advisable to build up a good collection for initial training or for rebuilding the word databases at some point if that becomes necessary.

Basic Training

Now that you have given K9 some initial information to work with it needs to build the Spam and Good word databases from the files you provided.

Go to the Statistics tab and click the Rebuild the word databases... button. This may take several seconds. This normally only needs to be done once when you are teaching the program, but it can also be useful to rebuild your databases at a future date from your collection of emails if you have made many mistakes when re-classifying emails.

We initially need to tell K9 to correct any mistakes it has made when analyzing the emails you provided in the previous steps.

Select the Storage Area tab. The 2 leftmost buttons on the toolbar correspond to the 2 email directories. Whenever you move between these folders by clicking any of the 2 folder buttons on the toolbar the program will reload the emails.

Select the Spam folder. Once the emails have loaded click the Score button on the toolbar. Depending on how many emails you have in this folder it may take several seconds to process them all. When scoring is complete the Spam % column (rightmost) will have automatically been sorted to show the least spam-like emails at the top. You can re-sort any column by clicking the column header.

Any emails above the default 50% are considered spam and will be shown as such in the 1st column. If the program has mis-identified any of the spam emails as good they will be shown in blue.

Highlight the mis-classified emails (multi-select is done by holding CTRL or SHIFT down when selecting) that need to be re-classified as Spam and click the Spam button.

Similarly if K9 has identified any accidentally included Good emails as Spam you can select them and click the Good button to re-classify them as Good. You can make existing Spam even "Spammier" by repeatedly re-classifying or make "Good" emails even "Gooder" by repeating this action. Be aware that re-classifying an email can affect how all other emails are scored so before you have finished with the folder hit the Score button again to verify all emails are as expected. Re-classify and repeat as necessary.

The Organize toolbar button is used to move all messages into their correct folder. Clicking Organize from either of the folder views moves Good emails to the Good folder and Spam emails to the Spam folder. So after the final scoring within a folder view when you have everything classified as you like, if you have any Good emails in your Spam folder hit Organize to move them out into the Good folder.

Now do the same thing for the Good folder. Any emails that are rated as spam in the good folder will be colored red. Your aim is to re-classify until all genuinely good emails are Good.

The Recent folder is where all emails are placed by the K9 proxy server as email is read in and classified when you check your email with your email program. Messages from the most recent download session are highlighted in bold.

That completes basic training.

If you mess things up go to the Statistics page and rebuild the word databases.

You will notice that K9 will have renamed the files you placed in its email directories. This is to detect duplicates. The filename is actually a HEX representation of the file's CRC.

Further Training

If you want to add to K9's knowledge base of Spam and Good words without having to physically copy files into the directories it uses you can use the Spam and Good folder windows in the Storage Area as a place to drop files from the filesystem. K9 will automatically parse through every file and folder that you drop onto its window and add the words contained in each file to the associated word database. For example, if you had a directory on your disk containing all known Spam emails you could select them all in Window's Explorer shell and drop them directly into K9's Storage Area - Spam folder.