Project documentation

  • A set of Frequently Asked Questions (FAQ).
  • The SpamBayes wiki exists to let the users and developers of SpamBayes cooperate to develop documentation, share tips and recipes, and generally help each other out. It would be great to see documentation improvements, hints and tips, scripts and recipes, and anything else (related to SpamBayes) that takes your fancy added here.
  • Instructions on installing Spambayes and integrating it into your mail system.
  • The Outlook plugin includes an "About" File, and a "Troubleshooting Guide" that can be accessed via the toolbar. (Note that the online documentaton is always for the latest source version, and so might not correspond exactly with the version you are using. Always start with the documentation that came with the version you installed.)
  • The README-DEVEL.txt information that should be of use to people planning on developing code based on SpamBayes.
  • The TESTING.txt file -- Clues about the practice of statistical testing, adapted from Tim comments on python-dev.
  • There are also a vast number of clues and notes scattered as block comments through the code.

Search the mailing lists

A quick-n-dirty google search interface for the mailing list archives - put your search terms in the box with the existing ones:

Enter search terms:

Glossary

A useful(?) glossary of terminology

Bayesian
A form of statistical analysis used (in a form) in Paul Graham's initial "Plan for Spam" approach. Now used as a kind of catch-all term for this class of filters, no doubt horrifying statisticians everywhere.
corpus
In this context, a body of messages. Usually referring to a training database. (plural is corpora).
false negative
A spam that's incorrectly classified as ham. Also abbreviated as "fn" or "FN".
false positive
A ham that's incorrectly classified as spam. Also abbreviated as "fp" or "FP".
ham
The opposite of spam; not necessarily email that you want or that you asked for, just anything that's not unsolicited bulk email. There is a second use for the term which means an email message which SpamBayes classified as good email. That doesn't mean it's so, just that based upon the evidence provided to the classifier it looked like good email. (See also: spam, unsure.)
hapax, hapax legomenon
A word or form occurring only once in a document or corpus. (plural is hapax legomena).
spam
Broadly speaking, any email that's not wanted by the end-user. More specifically: unsolicited bulk email; email that you do not want and did not ask for, and was sent to a whole bunch of people by automated means at the same time it was sent to you. This definition deliberately excludes viruses and those stupid jokes sent to you by your Aunt Tillie. There is a second use for the term which means an email message which SpamBayes classified as bad email. That doesn't mean it's so, just that based upon the evidence provided to the classifier it looked like bad email. (See also: ham, unsure.)
training
The process of feeding spambayes some sample spam and ham messages, to teach it what to look for.
unsure
An email message that could not reliably be classified as either ham or spam.