Discussion:
yEnc-encoded articles in newsgroups
(too old to reply)
Julien ÉLIE
2024-03-28 08:05:39 UTC
Permalink
Hi all,

I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight
(have a look at soc.culture.french for instance). Examples:
<17ba4ef578674e9c$60891$141478$***@news.vipernews.com>
<1O6IN.329500$***@fx09.ams1>

Wouldn't it be worthwhile having NoCeM notices of type "binary" or like
to help cleaning non-binary newsgroups from these unwanted articles?
Naturally, other kinds of "binary" stuff could also be in these notices,
and not only yEnc.

Just asking, in case a current NoCeM issuer would be interested in
adding such filters. (I'm not going to send NoCeM notices.)
--
Julien ÉLIE

« Les amis de la vérité sont ceux qui la cherchent, et non ceux qui se
vantent de l'avoir trouvée. » (Condorcet)
Marco Moock
2024-03-28 11:05:16 UTC
Permalink
Post by Julien ÉLIE
Wouldn't it be worthwhile having NoCeM notices of type "binary" or
like to help cleaning non-binary newsgroups from these unwanted
articles? Naturally, other kinds of "binary" stuff could also be in
these notices, and not only yEnc.
Sounds good.
Post by Julien ÉLIE
Just asking, in case a current NoCeM issuer would be interested in
adding such filters. (I'm not going to send NoCeM notices.)
Why don't you send NoCeM messages?
Do you filter yenc out?
If so, implementing NoCem shouldn't be that much work.
Julien ÉLIE
2024-03-28 13:00:57 UTC
Permalink
Hi Marco
Post by Marco Moock
Post by Julien ÉLIE
Just asking, in case a current NoCeM issuer would be interested in
adding such filters. (I'm not going to send NoCeM notices.)
Why don't you send NoCeM messages?
I'm just not keen on doing that; I already have enough other tasks to
do, and don't want to add yet another one, especially when there already
are lots of experts here in this newsgroup :)
Post by Marco Moock
Do you filter yenc out?
Yes, I don't want yEnc articles (neither in nor out) but unfortunately I
see some that pass local filters. That's why I thought that dedicated
NoCeM notices for binaries would be interesting: it is easier for news
admins to just rely on NoCeM notices than to keep their filters
up-to-date (for filters still maintained upstream) or locally adjust
rules and keep an eye on how well they perform.
--
Julien ÉLIE

« Hâte-toi de bien vivre et songe que chaque jour est à lui seul une
vie. » (Sénèque)
Retro Guy
2024-03-28 13:03:38 UTC
Permalink
Post by Julien ÉLIE
Hi all,
I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight
Wouldn't it be worthwhile having NoCeM notices of type "binary" or like
to help cleaning non-binary newsgroups from these unwanted articles?
Naturally, other kinds of "binary" stuff could also be in these notices,
and not only yEnc.
Just asking, in case a current NoCeM issuer would be interested in
adding such filters. (I'm not going to send NoCeM notices.)
That looks pretty easy to filter out, but I'm not seeing these on my servers due to another "feature" of the articles. I'm happy to add filtering for yenc as I don't serve binary groups on my servers, so this would only check text newsgroups.

I'll get on that in a few days, but I'll check here first in case someone has reasons that I should not do so.
--
Retro Guy
Ray Banana
2024-03-28 13:24:06 UTC
Permalink
Post by Retro Guy
Post by Julien ÉLIE
Just asking, in case a current NoCeM issuer would be interested in
adding such filters. (I'm not going to send NoCeM notices.)
That looks pretty easy to filter out, but I'm not seeing these on my
servers due to another "feature" of the articles. I'm happy to add
filtering for yenc as I don't serve binary groups on my servers, so
this would only check text newsgroups.
Same here. These articles never make it to the binary filter and if they
do, they get rejected by cleanfeed.local (with a somewhat more
sophisticated yEnc filter). Should be doable over the holidays, will
probably use a seperate type like "binary" rather than "spam" or "bot".
Post by Retro Guy
I'll get on that in a few days, but I'll check here first in case
someone has reasons that I should not do so.
Let's go belt and suspenders. Better safe than sorry ;-)
--
Пу́тін — хуйло́
https://www.eternal-september.org
Adam H. Kerman
2024-03-28 13:37:29 UTC
Permalink
Post by Julien ÉLIE
Hi all,
I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight
Wouldn't it be worthwhile having NoCeM notices of type "binary" or like
to help cleaning non-binary newsgroups from these unwanted articles?
Naturally, other kinds of "binary" stuff could also be in these notices,
and not only yEnc.
Just asking, in case a current NoCeM issuer would be interested in
adding such filters. (I'm not going to send NoCeM notices.)
I don't understand. Isn't misplaced binary content addressed at the
Cleanfeed filter?
Jesse Rehmer
2024-03-28 18:48:27 UTC
Permalink
Post by Adam H. Kerman
Post by Julien ÉLIE
Hi all,
I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight
Wouldn't it be worthwhile having NoCeM notices of type "binary" or like
to help cleaning non-binary newsgroups from these unwanted articles?
Naturally, other kinds of "binary" stuff could also be in these notices,
and not only yEnc.
Just asking, in case a current NoCeM issuer would be interested in
adding such filters. (I'm not going to send NoCeM notices.)
I don't understand. Isn't misplaced binary content addressed at the
Cleanfeed filter?
Cleanfeed and pyClean's binary filters are far from perfect. I've used both at
the same time and some still get through where they should not.
llp
2024-03-29 21:41:32 UTC
Permalink
Post by Julien ÉLIE
Hi all,
I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight (have
Wouldn't it be worthwhile having NoCeM notices of type "binary" or like to
help cleaning non-binary newsgroups from these unwanted articles?
Naturally, other kinds of "binary" stuff could also be in these notices, and
not only yEnc.
Just asking, in case a current NoCeM issuer would be interested in adding
such filters. (I'm not going to send NoCeM notices.)
I don't have these articles on my server.
--
Admin of news.usenet.ovh
Retro Guy
2024-03-30 15:27:54 UTC
Permalink
Post by llp
Post by Julien ÉLIE
Hi all,
I've noticed yEnc-encoded articles in some newsgroups of the Big-Eight (have
Wouldn't it be worthwhile having NoCeM notices of type "binary" or like to
help cleaning non-binary newsgroups from these unwanted articles?
Naturally, other kinds of "binary" stuff could also be in these notices, and
not only yEnc.
Just asking, in case a current NoCeM issuer would be interested in adding
such filters. (I'm not going to send NoCeM notices.)
I don't have these articles on my server.
Same here. After looking deeper, these seem mostly in groups that my
servers do not carry, and if they are carried, the articles are filtered by
cleanfeed (before spamassassin in my setup).

Seeing that Ray seems to carry these groups, and looks like he's doing a
great job identifying the articles, I'm going to delay diving into this
issue. Maybe take some time to work with Perl without tearing my hair out
first :)
Ray Banana
2024-03-30 16:58:41 UTC
Permalink
Post by Retro Guy
Same here. After looking deeper, these seem mostly in groups that my
servers do not carry, and if they are carried, the articles are filtered by
cleanfeed (before spamassassin in my setup).
That is also the case here. I just added a check for binary articles to
filter_first (before all tests) to add the articles to the NoCem queue
and then continue with the normal cleanfeed processing. I have, however,
added a filter to eliminate the most obvious bogus group names like "a.b.something".
Post by Retro Guy
Seeing that Ray seems to carry these groups, and looks like he's doing a
great job identifying the articles, I'm going to delay diving into this
issue. Maybe take some time to work with Perl without tearing my hair out
first :)
;-)

PS: You seem to have an apprentice spam boy on i2pn2: <uu8uit$3h91i$***@i2pn2.org>
--
Пу́тін — хуйло́
https://www.eternal-september.org
Retro Guy
2024-03-30 19:49:27 UTC
Permalink
Post by Ray Banana
Post by Retro Guy
Same here. After looking deeper, these seem mostly in groups that my
servers do not carry, and if they are carried, the articles are filtered by
cleanfeed (before spamassassin in my setup).
That is also the case here. I just added a check for binary articles to
filter_first (before all tests) to add the articles to the NoCem queue
and then continue with the normal cleanfeed processing. I have, however,
added a filter to eliminate the most obvious bogus group names like "a.b.something".
That's a good idea, I may do so. I'm doing some testing on a test inn(stall) so I can feel free to break it if necessary :)
Post by Ray Banana
Post by Retro Guy
Seeing that Ray seems to carry these groups, and looks like he's doing a
great job identifying the articles, I'm going to delay diving into this
issue. Maybe take some time to work with Perl without tearing my hair out
first :)
;-)
I'm not a fan of DO, IF. I much prefer IF, THEN. Maybe it's just the author of cleanfeed that prefers that order of doing things. I'm the one not very well schooled in Perl so I don't really have any room to talk.
He needs to try harder, lol, I'll keep an eye on it. I find a few since 22 Feb and have sent some nocem for them, but not too bad so far.

I have things set up so it's easy for me to review the first few posts of any new user without having to wade through all the regular user's posts. I do this at least once per day.
--
Retro Guy
Julien ÉLIE
2024-04-02 07:25:07 UTC
Permalink
Hi Wolfgang,
Post by Ray Banana
I just added a check for binary articles to
filter_first (before all tests) to add the articles to the NoCem queue
and then continue with the normal cleanfeed processing. I have, however,
added a filter to eliminate the most obvious bogus group names like "a.b.something".
Thanks a lot!
I see that the binary spam coming from vipernews.com is caught, that's
great!
Incidentally, in your NoCeM notices, wouldn't it be useful to list all
the newsgroups they are sent to? Only the first one is currently
written whereas they could for instance be written on subsequent lines
starting with whitespace, or on the same line. (I agree it would lead
to more lengthy messages or lines.)


I think some newsgroups should be marked as allowing binaries or HTML.
<CAOLa=ZSo7ngBUxkfR+EEojhr4a-mM+3=f-***@mail.gmail.com> in
linux.kernel.git was caught in the Bot-misplaced_binary filter but looks
like a valid article.

As for <XMJON.158253$***@fx06.iad> in alt.binaries.clip-art,
which was only posted to that newsgroup, maybe it should be considered
valid as posted in a newsgroup with a "binaries" component.

Thanks again for your work and involvement in fighting spam.
--
Julien ÉLIE

« Hâte-toi de bien vivre et songe que chaque jour est à lui seul une
vie. » (Sénèque)
Ray Banana
2024-04-02 09:52:18 UTC
Permalink
Post by Julien ÉLIE
Incidentally, in your NoCeM notices, wouldn't it be useful to list all
the newsgroups they are sent to? Only the first one is currently
written whereas they could for instance be written on subsequent lines
starting with whitespace, or on the same line. (I agree it would lead
to more lengthy messages or lines.)
I'm using News::Article::NoCeM from CPAN to generate NoCeM messages and
it puts each additional newsgroup on a separate line starting with a TAB
and ending with CRLF, which led to people (wrongly) complaining about the
structure of my messages. Currently, I'm testing a patch for
News::Article::NoCeM that will put all newsgroups on the same line as the
M-ID with a TAB between the M-ID and the first article and a blank
between the individual group names.
Post by Julien ÉLIE
I think some newsgroups should be marked as allowing binaries or HTML.
in linux.kernel.git was caught in the Bot-misplaced_binary filter but
looks like a valid article.
My filter makes use of the is_binary () function in Cleanfeed, which in
turn relies on some configuration variables. The problem in the case of
the linux.kernel.git messages is that some of them have a Content-Type
of multipart/mixed with the PGP signature included as a Base64 encoded
attachment.
Post by Julien ÉLIE
which was only posted to that newsgroup, maybe it should be considered
valid as posted in a newsgroup with a "binaries" component.
Groups with "binaries" in the group name should already be excluded from
the binary filter, will double-check this.

Thanks for your feedback. It is much appreciated.
--
Пу́тін — хуйло́
https://www.eternal-september.org
Julien ÉLIE
2024-04-02 11:31:46 UTC
Permalink
Hi Wolfgang,
Post by Ray Banana
I'm using News::Article::NoCeM from CPAN to generate NoCeM messages and
it puts each additional newsgroup on a separate line starting with a TAB
and ending with CRLF, which led to people (wrongly) complaining about the
structure of my messages. Currently, I'm testing a patch for
News::Article::NoCeM that will put all newsgroups on the same line as the
M-ID with a TAB between the M-ID and the first article and a blank
between the individual group names.
Sounds great with a one-line list of newsgroups, separated with a space,
thanks.

FYI, it will be useful with the perl-nocem program shipped with the next
release of INN (2.7.2) as I have added the possibility to only process a
subset of Message-IDs within a notice, according to specific rules by
the news admin (sort of a local function called like in
cleanfeed.local). Having the whole list of newsgroups will permit for
instance to process Message-IDs of articles posted to a newsgroup
actually carried by the server. Or more complex cases like processing
NoCeM notices for only a subset of newsgroups (if someone does not want
to cancel anything in some newsgroups) or not taking into account
notices from "john" or of a given type, except for a subset of newsgroups.
Post by Ray Banana
Post by Julien ÉLIE
I think some newsgroups should be marked as allowing binaries or HTML.
in linux.kernel.git was caught in the Bot-misplaced_binary filter but
looks like a valid article.
My filter makes use of the is_binary () function in Cleanfeed, which in
turn relies on some configuration variables. The problem in the case of
the linux.kernel.git messages is that some of them have a Content-Type
of multipart/mixed with the PGP signature included as a Base64 encoded
attachment.
Is it an issue to open upstream to Cleanfeed, to fix the is_binary()
function?
Or do you have a lower max_base64_lines default value, which makes it
match PGP signatures?
--
Julien ÉLIE

« Tous les champignons sont comestibles. Certains, une fois seulement. »
Ray Banana
2024-04-03 12:02:35 UTC
Permalink
Thus spake Julien ÉLIE <***@nom-de-mon-site.com.invalid>
[...]
Post by Julien ÉLIE
Sounds great with a one-line list of newsgroups, separated with a
space, thanks.
Done now.
Post by Julien ÉLIE
FYI, it will be useful with the perl-nocem program shipped with the
next release of INN (2.7.2) as I have added the possibility to only
process a subset of Message-IDs within a notice, according to specific
rules by the news admin (sort of a local function called like in
cleanfeed.local). Having the whole list of newsgroups will permit for
instance to process Message-IDs of articles posted to a newsgroup
actually carried by the server. Or more complex cases like processing
NoCeM notices for only a subset of newsgroups (if someone does not
want to cancel anything in some newsgroups) or not taking into account
notices from "john" or of a given type, except for a subset of
newsgroups.
Is that the -i option in perl-nocem (I'm using INN 2.8 snapshots)?
Post by Julien ÉLIE
Post by Ray Banana
Post by Julien ÉLIE
I think some newsgroups should be marked as allowing binaries or HTML.
in linux.kernel.git was caught in the Bot-misplaced_binary filter but
looks like a valid article.
My filter makes use of the is_binary () function in Cleanfeed, which in
turn relies on some configuration variables. The problem in the case of
the linux.kernel.git messages is that some of them have a Content-Type
of multipart/mixed with the PGP signature included as a Base64 encoded
attachment.
Is it an issue to open upstream to Cleanfeed, to fix the is_binary()
function?
Cleanfeed from Github does not handle Content-Type: multipart/mixed
except for HTML, so it was my own fault, obviously. Quick fix applied
now, is_binary() still misses lots of binary attachments encapsulated in
separate entities.
I think I will make Cleanfeed more Mime-aware (MIME::Parser) and add
local config variables for allowed/disallowed mime types when I find the time.
--
Пу́тін — хуйло́
https://www.eternal-september.org
Julien ÉLIE
2024-04-04 11:55:22 UTC
Permalink
Hi Wolfgang,
Post by Ray Banana
Post by Julien ÉLIE
Sounds great with a one-line list of newsgroups, separated with a
space, thanks.
Done now.
Thanks. I'll also add support for that in perl-nocem as its legacy
behaviour is to only take into account the first newsgroup in such a
list. (It already coped with the syntax with several continuation lines.)
Post by Ray Banana
Post by Julien ÉLIE
FYI, it will be useful with the perl-nocem program shipped with the
next release of INN (2.7.2) as I have added the possibility to only
process a subset of Message-IDs within a notice, according to specific
rules by the news admin (sort of a local function called like in
cleanfeed.local). Having the whole list of newsgroups will permit for
instance to process Message-IDs of articles posted to a newsgroup
actually carried by the server. Or more complex cases like processing
NoCeM notices for only a subset of newsgroups (if someone does not
want to cancel anything in some newsgroups) or not taking into account
notices from "john" or of a given type, except for a subset of newsgroups.
Is that the -i option in perl-nocem (I'm using INN 2.8 snapshots)?
Exactly. There's an example of how to use it in the manual page.

Before the final release, I plan on adding two other features: a flag to
save nocemized articles (like what saveart() does in Cleanfeed), and a
flag to activate in daily Usenet reports the mention of notices which
were unprocessed. This way, a news admin will have a way to find out
possible new issuers or types.
Do you see other things which would be worthwhile having in perl-nocem
while I'm working on it?
Post by Ray Banana
Post by Julien ÉLIE
Is it an issue to open upstream to Cleanfeed, to fix the is_binary()
function?
Cleanfeed from Github does not handle Content-Type: multipart/mixed
except for HTML, so it was my own fault, obviously. Quick fix applied
now, is_binary() still misses lots of binary attachments encapsulated in
separate entities.
I think I will make Cleanfeed more Mime-aware (MIME::Parser) and add
local config variables for allowed/disallowed mime types when I find the time.
Thanks for your work, that sounds a great move!
--
Julien ÉLIE

« – Poussez pas derrière !
– Pas si vite devant ! » (Astérix)
Loading...