Re: to rat1964, and anyone interested in parsing older headers (alt.binaries.documentaries)

2018/08/23 04:14

Wow...
That was an education...

Thank you for taking the time to enlighten me.
I had drafted a few more questions but you answered them all and threw
some fresh fuel on the fire!

I'll do my best to digest and remind myself to lend a hand when I can.
Thanks again for sharing your knowledge so freely!

Your a beacon in the smoke-show!

Cheers!

On Fri, 17 Aug 2018 10:29:32 -0400, m2 <m2@somewhere.com> wrote:

>On Fri, 17 Aug 2018 01:59:17 -0700, rat1964 <rat1964@gmail.com> wrote:
>
>>To m2:
>>I'm getting good retention back to mid-2016 but not much prior to
>>that.
>>Can you clarify which  newsserver your refering to?
>>(Forte?...)
>
>Yes, correct.  Forte is the company that developed Agent, and also
>provides a usenet service.  Here is their website for this:
>
>http://www.forteinc.com/apn/index.php
>
>You say you are using Agent 2.0?  In any case, it is easy to check
>exactly how far back your news provider's retention goes with Agent.
>
>But first, let me explain a thing or two.  There are lots of little
>issues with Agent, and one of them has to do with downloading new
>headers.  Long ago I noticed that sometime I get new headers in a few
>seconds, and other times it seems to take very very long.  The
>difference is whether or not you still have existing headers in the
>group when you click to get new ones.   If the folder is completely
>empty on Agent, it gets them in a flash.  If you have some headers
>still in the group, it can take a VERY LONG time to get new ones.  If
>you dont believe me, try it yourself.  It has to do with editting the
>data files associates with the group.  Empty data files can update
>fast because Agent doesn't have to edit the new headers into existing
>headers which can take hours, if the data files are gigabytes.
>
>Anyhow, my point is that you should always process ALL the headers you
>want before getting any new ones, as a general rule, so the group is
>completely empty when you start.  There are 3 ways to get headers with
>Agent:  Get New, Get All, and  Get a sample.  To check your retention
>depth, you simply click "Get All..." and wait perhaps a minute,
>watching the process from the Task Manager tab (Tools  Menu).   You
>need to wait until Agent has started the download.  You can tell,
>because there will be a progress bar.  Once the progress bar is there,
>that's when you wait a few seconds or a minute, just to get headers
>starting from the oldest.  Stop the task, then check the group, sorted
>by date, and you'll see exactly how old the oldest headers are.
>
>If you want to parse the entire backlog of headers, first of all, you
>dont try to do it all in one pass.  That was my mistake the first time
>I did this.  It is much easier if you do it in managable slices, but
>you will need to keep a log to remember what you have covered and
>where to start the next slice.  This is when you would use the "Get
>Sample" of the headers, where you need to do some backward math to
>calculate the number of days to slightly overlap what you got before.
>It helps to write down the number of days each time, and also, to sort
>by date after each download of headers, so you can write down where to
>begin the next slice.  I overlap at least one day, maybe a couple, to
>pick up parts that were incomplete in the previous slice.  It's a
>learning process, but it isnt very hard once you try it.
>
>So, you start with "Get All" on the first pass, and just get a minute
>or two.  Process all those headers first, downloading all you may
>want, deleting garbage or things you dont want, and adding kill
>filters for bulk spammers.  By adding the kill filters early, each new
>slice of headers you get will be cleaner.
>
>I learned a new trick the last time I did this, with xvid group.
>Downloading can take some time, so the trick is to not download from
>the group you are parsing, but rather save the headers that you want
>into an NZB file (keystroke: Alt-F, I, Z).  I have a separate
>folder/tab dedicated for NZB files, so I can download what I want on
>the side, and continue parsing the headers in the group of interest.
>Or, you can just save the NZB's in a folder for possible use later, or
>to search for content.  I still have gigabytes of nzb data from xvid
>that I search first, if I'm looking for something.
>
>You only do the "Get All Headers" on the first pass, then do "Get
>Sample" for the rest, and again, dont wait for the progress bar to
>move very much before stopping it.  The reason I suggest this is that
>the more headers you have in a particular group, the more slow and
>difficult it becomes to move around inside Agent and process them.  A
>million headers at one time is not fun.
>
>FYI, I mostly sort by Author when doing this, as the majority of
>postings are done by a small set of folks, so you can quickly save the
>NZB's of the major uploaders.  I do have to shift between Author sort
>and Date sort, and just want to mention another Agent bug.  There is a
>checkbox in the Lines tab, for threaded sort.  There is something
>wrong with how this works, so that is why I recommend you change back
>to Author sort before saving NZB files or doing much, because
>sometimes Agent will hide things inside the threads, that wont make
>any sense, but downloads can be missing pieces.  Just be aware that
>this is yet another bug that will never be fixed, and better to
>uncheck it if you sort by date.  It turns off by itself when sorted by
>Author.
>
>There are terabytes of great content in the a.b.documentaries group,
>at least on Forte's news server, and it's not that hard to extract it
>with Agent.  Other tools may do the same, I'm just not familiar with
>them.
>
>hope i caught all the typo's, but ask if something is confusing.

Follow-ups:

Prev.

Article List

Favorite