Re: new spam filter - "scoring.txt" yEnc (alt.binaries.beatles)

Easynews - www.easynews. ..

2018/01/05 22:02

scoring.txt
Intro to Xnews' score file.

Xnews happily steals the scoring system concept from slrn.  Basically, each article is assigned a score from -9999 to 9999.  By default, an article has score 0 (no score, or neutral).  Articles with score of -9999 or less are killed (you'll never see them, unless you turn off hard kill, in which case they show up with a red X icon).  Articles with score of 9999 or more are considered "important" and flagged with a yellow ! icon.

Xnews scores articles using rules you specify in the score file score.ini.  This file is a plain text file which you can edit with any text editor (Notepad will do). Spaces, tabs, and blank lines are ignored except in keywords and regular expressions.  The score file consists of sections (like ini files).  Each section begins with a section header

[group-regular-expression]

that is, a regular expression surrounded by [ ]  The header indicates which groups this section applies to.  For example,

[.]

indicates this section applies to all groups.

[^alt\.binaries]

indicates this section applies to all groups starting with 'alt.binaries'

[babylon|trek]

indicates this section applies to all groups containing the word 'babylon' or 'trek'.

If group-regular-expression is preceded with a tilde (~), then the meaning is inverted.  For example, [~babylon|trek] applies to all groups that do NOT contain either 'babylon' or 'trek'

Within each group, you have one or more scoring rules.  Each rule begins like so:

Score: number

that is, the keyword 'Score', followed by ':', followed by one or more spaces, then an integer from -9999 to 9999.

You define each rule by specifying one or more of these headers.and the expression to match, like so

keyword: expression-to-match

where keyword is one of:

Message-ID, Subject, From, Xref, Lines, References

Here's an example:

[.]
Score: -9999
Subject: \$\$\$+

Score: 9999
From: luu.*tran

The two rules above apply to all groups.  The first assigns a score of -9999 (i.e., kills) any article which has three or more consecutive $ in the subject.  The second marks as important any article whose author is yours truly (ah, ego runs amok!).

If you precede keyword with a ~, then the meaning is inverted.  so

Score: -1000
~Subject: mulder|sculley

subtract 1000 from the score any article that does NOT mention mulder or sculley in the subject.

When the keyword is Lines, then the expression-to-match is an integer specifying the minimum number of lines.  For example,

[~binaries]
  Score: -9999
  Lines: 1000
  ~Subject: faq|rfd|rfc

kills any article appearing outside a binaries group with more than 1000 lines AND is not a FAQ, RFC, or RFD.

On the other hand

[binaries]
  Score: -9999
  ~Lines: 50

kills any article in a binaries group with 50 or fewer lines.

By the way, I'm only indenting these lines to make it more readable; it makes no difference otherwise.  The only important thing is each of the lines above must appear on a single line by itself.  Also, you can add comments by putting a  percent sign (%) at the beginning of the line.

By default, each individual test must pass in order for the scoring rule to apply, i.e., boolean AND is the default.  If you want to use boolean OR instead, add an extra : after the Score keyword.  For example,

[^comp\.]
  Score:: -9999
  Lines: 1000
  Xref: advocacy
  From: beavis
  From: butthead

kills any article posted in a comp group which has more than 1000 lines, or is crospposted to advocacy, or is from beavis or butthead.

Note that all keywords are case insensitive.  So too are all regular expressions, so you don't need to write [Ff][Oo][Oo], just foo will do. If you want the  expression-to-match to be case sensitive, then put an equal sign instead of of a : after the keyword, e.g.

[.]
  Score: -9999
  ~Subject= .*[a-z]

kills anything that does not contain at least one lower case letter in the subject.

Sections and rules within each session are applied in the order that they aappear in the score file.  Each article starts out with a score of zero, then for each rule that it passes, its score is incremented or decremented by the rule's score value except when the rule's score value is -9999 or 9999, in which case the program assigns that score and stops any further testing.  If you want to assign a single score other than 9999 or -9999, then put an equal sign in front of the score value.  For example

Score: =500
From: joey

means "if an article was posted by joey, then assign it a score of 500 and don't bother looking at any other rule."

You can add an expiration date to each rule, after which time the rule will no longer fires.  The format is

Expires: date

The line, if present, must appear immediately after a Score: line.  For example

% this rule expires at the end of the millenium
% (okay, the year 1999 anyway :)

Score: 2000
Expires: 12/31/1999
....

The date format is locale-dependent.  It's mm/dd/yyyy for those in the US and dd/mm/yyyy for those in the UK and elsewhere.

If you have an empty section, then scoring for groups matching that section will halt at that point.  For example, if you want to apply some scores to binaries group, but not discussion binaries group, you can do this

[binaries.*\.d$]
% this section is left intentionally empty.  It prevents the scores below from
% being applied to binaries groups ending in .d

[binaries]
Score: 2000
......

Xnews' score file format is very similar to slrn's, with (at least) these differences: 1) regular expressions are case insensitive in Xnews; 2) slrn treats section headers as wildcard expressions while xnews treats them as full regular expressions; 3) Xnews does NOT allow scoring on any header other than those mentioned, namely, Message-ID, From, Subject, XRef, Lines, and References;  and 4) as far as I know, slrn doesn't use the empty section as a mean to stop evaluation.

You can read an intro to slrn score file at

http://kwaziwai.cc.columbia.edu/acis/tutor/slrnscore-txt.html

There's also a slrn score file FAQ at

http://kwaziwai.cc.columbia.edu/acis/tutor/slrnscorefile-faq.html

You can incorporate ideas in the FAQ to make your own score file, taking into account the differences stated above.

Luu Tran
Feb 23, 1999

Prev.

Article List

Favorite