Author Topic: NSP Article Size Limits  (Read 7277 times)

Isis

  • Just Arrived
  • *
  • Posts: 4
NSP Article Size Limits
« on: 30 April 2011, 00:14:06 »
In a discussion regarding the limits various NSP's place on the size of the articles they will accept, Red Dwarf made the following comment:
 
"I don't know the reason for this limit in PowerPost. But I do have a question for you. You don't care if some servers refuse your posts because the individual articles are too large? That might limit the audience for your posts."
 
It is important to realize that once an article is posted, no one will be unable to download it because of its size, regardless of which NSP they might be using.
 
More importantly, there is a direct correlation between the size of the article and the impact it will have on the group to which it is being posted. Newsgroups only exist on your computer; they are headers that have been extracted, while leaving the content on the NSP.
 
Think of a newsgroup as a giant, open-ended NZB file to which many contribute. The amount of data your post occupies within that group is generally the size of your NZB for that post. Thus, the smaller your NZB file, the smaller negative impact your post will have on that group.
 
So if you triple your article size, your post will only require 1/3 the space on any particular group. My settings result in posts that are about 20% in size of the Usenet average.
 
That means that a group that has about 2 GB of headers and is very sluggish (long refresh times) could contain the same amount of content, but be only 400 MB in size.
 
Since August of 2008 Newsgroup retention has been growing in real time. Article sizes have not kept pace.
 
Isis
 
« Last Edit: 30 April 2011, 00:21:53 by Isis »

Red Dwarf

  • Forum expert
  • *****
  • Posts: 2339
Re: NSP Article Size Limits
« Reply #1 on: 30 April 2011, 11:48:22 »
If the server software refuses an article because the server software does not support articles of that size, you can't download that particular article from that particular server, right? You're saying NSP servers have to accept every article from its peers and/or the backbone? I'd say the software on a server can't exceed the limits of that software, like powerpost apparently can't retain the settings you wish to use. If you post an article of 30000 lines on the GigaNews servers, is that article available for the users of Astraweb, that you say does not support posting of articles that size? If the answer to that question is yes, than I suppose there it does not have  a large impact on the users. On the other hand if I can't download a post at all the impact is huge.

There is a large impact on the number of headers, I will concede that. That might make a big difference to the servers/index sites/NZBs, but the impact on us users is minor, I think. It might be different if you limit the length of an article to 100 lines.  :d When talking about only consuming 20% of the space, you're talking about headers not about the number of lines needed for the actual content of a post.
« Last Edit: 30 April 2011, 11:50:28 by Red Dwarf »

jaapf

  • Forum expert
  • *****
  • Posts: 3040
Re: NSP Article Size Limits
« Reply #2 on: 01 May 2011, 12:39:18 »
Hello Isis,
Like Red Dwarf already pointed out you are making a large error in the way you are thinking.

The article size is the size including the header. The header of a usenet article is everything between the first character and the first empty line and contains info like title, group, sender, encoding, etc.
After the first empty line follows the message itself. I don't have the exact statistics to confirm, but I suspect a typical header will be someting between 128 and 256 bytes which will roughly translate to 1-2 lines.


In a posting using 3000 lines this will be only 2/3000 =  0,002%
Changing the number of lines from 3000 to 9000 will then only give you a reduction of 0,006% which is almost nothing.
An other consideration is in why servers limit the number of lines. To understand that you need to know the background of usenet. Originally usenet was only meant for exchanging (text)messages, not binaries. The binarie posting is nothing less then a text-representation of code. The file posted is posted as text (hence the line-terminology). And for a text message to be 9000 lines long is just rediculous. It does not reflect reason for usenet.



You are comparing a usenet group to an open ended nzb-file to which many contribute. Although the comparison could be made, again you are making a large mistake in what usenet is. Usenet is much older then NZB and as such your comparison in the wrong way around. NZB's are used to make the mis-use (I call it that because usenet was not originally intended for binary postings) easier for the novice user that has no understanding of the underlying concept (read file-sharer).



I will not make a classification in right or wrong, but I am afraid that the NZB technology in the end will largely contribute in killing usenet because it will make illegal postings available to a large group of people, thus making it interesting for copyright holders to pursue.

Isis

  • Just Arrived
  • *
  • Posts: 4
Re: NSP Article Size Limits
« Reply #3 on: 01 May 2011, 19:19:25 »
Hello jaapf,
 
You state: "Like Red Dwarf already pointed out you are making a large error in the way you are thinking."

Well, Isis certainly doesn't want that, now does She? (Thinking errors are among the very worst kind!)
 
You go on: "The article size is the size including the header. The header of a usenet article is everything between the first character and the first empty line and contains info like title, group, sender, encoding, etc.
After the first empty line follows the message itself. I don't have the exact statistics to confirm, but I suspect a typical header will be someting between 128 and 256 bytes which will roughly translate to 1-2 lines."

I am using the term 'header' to include everything that is in the Powerpost header, e.g., filename, file count, file total, and comments. The header is what you download when you subscribe to a newsgroup; the content, whether MIME or Binary remains at the NSP where it was originally posted.
 
You continue: "In a posting using 3000 lines this will be only 2/3000 =  0,002%
Changing the number of lines from 3000 to 9000 will then only give you a reduction of 0,006% which is almost nothing.
An other consideration is in why servers limit the number of lines. To understand that you need to know the background of usenet. Originally usenet was only meant for exchanging (text)messages, not binaries. The binarie posting is nothing less then a text-representation of code. The file posted is posted as text (hence the line-terminology). And for a text message to be 9000 lines long is just rediculous. It does not reflect reason for usenet."

Now who said anything about text messages? I would imagine that most text messages would be posted using a newsreader. Powerpost is a posting application for binaries.
 
I love to vacation, and I always have my videocam at the ready. When I return I always have many hours of my adventures on video that I am anxious to share with my friends on Usenet.
 
Let's say I have 5 GB of such adventures. If I post at yEnc 30000 (~ 3.84 MB Articles) there will be about 1250 Articles in my post.
 
Let's say I'm cheap, and instead of Giganews, I have Astraweb. Now I must post at about yEnc 10000 (~ 1.28 MB Articles) and there will be 3750 Articles to my post.
 
Each Article has a header that must be downloaded when you subscribe to a newsgroup. You don't see them all, because the newsreader consolidates the 'Articles' into 'Parts'. The (*/100) after each part indicates the number of Articles in that Part is 100. You can see them all if you use your newsreader to split the Part into its component Articles.
 
So going from 30000 yEnc to 10000 yEnc raises the number of Articles from 1250 to 3750. That's 2500 more server confirmations you must wait for!
 
So regardless of how large you claim a header to be, having three times as many is going to take three times more room. Quite elementary, I would think.
 
The Astraweb example is not even the worst case: The 1950 yEnc default in Powerpost results in only a 256 KB Article size. In the above example we would be over 15000 Articles! That would be an extra 13750 server confirmations I would need to wait for and those headers would take 15 TIMES MORE ROOM ON THE NEWSGROUP!!!
 
Then you begin to editorialize: "You are comparing a usenet group to an open ended nzb-file to which many contribute. Although the comparison could be made, again you are making a large mistake in what usenet is. Usenet is much older then NZB and as such your comparison in the wrong way around. NZB's are used to make the mis-use (I call it that because usenet was not originally intended for binary postings) easier for the novice user that has no understanding of the underlying concept (read file-sharer)."

Well, if you really feel that binaries don't belong on Usenet, then you must be very upset that Binaries4All would have any association with Camelsystem's Powerpost. After all, Powerpost isn't for MIME!
 
So to reiterate: A newsgroup that contains 2 GB of header data where the line average is 6000 yEnc could be reduced to 400 MB with a line average of 30000 yEnc.
 
QED
 
Isis
 

jaapf

  • Forum expert
  • *****
  • Posts: 3040
Re: NSP Article Size Limits
« Reply #4 on: 01 May 2011, 21:07:02 »
Isis,


An article with n-lines will have a article size of 128 * n bytes.
The total number of lines will be your 5GB divided by 128. The number of articles needed will be the total number of lines divided by  then number of lines per article.

The total number of lines of a posting will only be affected by the header overhead (whether it's posted with power post or any other program), the header being everything that's not the encoded part (yenc, mime, or any other encoding). Like i stated, this is very small percentage.


About text-message: the nature of usenet is that anyting posted on usenet (being readable text or a binary posting) is posted as text. If you don't believe me: download a article without decoding it and view it in notepad.


About your claim about size: I am sorry to disappoint you. I do know how usenet works, how newsreaders work (I date from the very early years of usenet, from before the sophisticated newsreaders there are today, hell, in the begining we had to manually decode the files!).
Your are right about the number of conformations. You are right about the storage needed to store the headers. However, conformation time might effect a poster (server time), but does not effect needed storage. Yes, more storage is needed to store more headers (and more headers need to be downloaded). But it does not effecting the needed storage for the complete posting. You wont still will need about 5GB of server storage whether you use 20.000 or 40.000 articles (the 40.000 articles will need about 2-4MB extra storage serverside).


If you want you can compare it like this: How much more storage will you need if you make your rar's 25MB in stead of 50MB? That's not a whole lot. Only the rar-overhead.


Finally, I never said that binary postings don't belong to usenet. I only stated that usenet was not designed for it and that the limit for article size originates from the original use. I certainly do have no problem at all with binary postings itself, we started this site years ago to make usenet, and binary postings in particular, more known and more easily accessible to the general public!


Although the original standard does not dictate a maximum number of lines (see: http://www.freesoft.org/CIE/RFC/Orig/rfc1036.txt), whether you like it or not, currently it's a fact not many usenet servers support large article sizes (and throughout history gigabyte has been a odd player on the field not fully following nntp standards and/or agreements). When not all servers propagate large article sizes your large posting will be only available on server that do.


NZB and indexing site's have been invented because of the large number of headers in groups. Since they existed you don't need to download headers in a conventional newsreader if you don't want to. You can simply search online at an indexing site to find where you are looking for.


To summarize: You are right if you only take header data in account. If you look at the total needed storage of a posting the extra header-data of posting is not that much, and if you use NZB's anyway not relevant at all.


Isis

  • Just Arrived
  • *
  • Posts: 4
Re: NSP Article Size Limits
« Reply #5 on: 01 May 2011, 23:51:18 »
Hello jaapf,
 
Frankly, I am more than a bit stunned that you're not comprehending this.
 
Let's take it again from the top:
 
"Isis,

An article with n-lines will have a article size of 128 * n bytes.
The total number of lines will be your 5GB divided by 128. The number of articles needed will be the total number of lines divided by  then number of lines per article."
 
Yes, of course. I thought I made that quite clear when I stated that 10000 yEnc is a 1.28 MB Article & 30000 yEnc is a 3.84 MB Article.
 
jaapf: "The total number of lines of a posting will only be affected by the header overhead (whether it's posted with power post or any other program), the header being everything that's not the encoded part (yenc, mime, or any other encoding). Like i stated, this is very small percentage."
 
Here's where I think you're going off-track. We are not talking about the size of the post. We are talking about THE SIZE OF THE NEWSGROUP!
 
All a newsgroup has is article headers. Those headers with the same name are grouped by the newsreader, followed by (*/100) or similar.
 
All of the newsgroups on my computer go back to August 2008, providing they were in existence back then. The .d groups go back much further, of course.
 
The largest newsgroup on my computer is 4.15 GB. All it contains is article headers. There is no content, since that is flushed daily on Forte Agent via Purge & Compact.
 
On any given size yEnc binary post, the larger the article size the fewer the number of articles in the post. And the fewer the articles, the fewer the number of article headers.
 
This means that a group can initially be downloaded faster, and subsequent header refreshes are much faster. Doubling the size of a group from, say, 500 MB to 1 GB more than doubles the header refresh rate due to the way sorting algorithms work.
 
If the difference wasn't so substantial I wouldn't even be mentioning it, but when nearly all binary newsgroups could be 80% smaller while still  holding the same amount of content, I feel that it is more than worth mentioning.
 
The group I mentioned that is 4.15 GB in size contains 13.8 TB of content. Studies I have conducted indicate that a poster using 10000 yEnc lines (1.28 MB articles) will post 1 MB of article header data to a group in order to address 10 GB of content. I can address that same 10 GB with about 260 KB. Most posters will require 2 - 4 MB.
 
Btw, that same 4.15 GB group was probably less than 1 GB in August 2008. This assumes that posting traffic was similar and so its increased size is solely the result of going from 200 days retention to more than 900 days.
 
jaapf: "About text-message: the nature of usenet is that anyting posted on usenet (being readable text or a binary posting) is posted as text. If you don't believe me: download a article without decoding it and view it in notepad."
 
Of course. I read NZB's all the time. (Did you know my NZB's are only 1/3 the size of whomever comes in second?)
 
jaapf: "When not all servers propagate large article sizes your large posting will be only available on server that do."
 
It's not a matter of propagation; the limit is for accepting articles for storage. There isn't much difference between an NSP and a Search Engine; just that the search engine won't accept your posts for storage.
 
Normally BinSearch takes 30-40 minutes to go through a refresh cycle. Just think how fast it could be if it had 80% fewer headers to sort through!
 
As a last thought: I have been posting 3.84 MB articles for several years. Many, like you, complain that they are too large and they won't be able to download them. But then they go ahead and download them anyway. (And I post VERY few PAR2's!)
 
So that's the news from here. Looks like you'll go your way, and Isis will stay on track with what actually works.
 
Isis
 

jaapf

  • Forum expert
  • *****
  • Posts: 3040
Re: NSP Article Size Limits
« Reply #6 on: 02 May 2011, 07:51:10 »
Isis,
I can see where you go wrong now.
The size of the newsgroup is the size of all postings combined. Not just the headers of those postings.


Like i stated in my previous post, if you only take het headers in account you are right. If you are talking about a complete newsgroup, you are wrong.


If your settings are working for you that's fine. And if your posting server (giganews) accepts large articles you should have no problem. The number of needed pars is not releated to article size but more related to the server someone downloads from.


Almost all newsreader (forte agent) only download header information initially (thank god!). Only when you select the articles you really want the full body(s) will be downloaded and decoded. I see you only take headers in account not the full article. The group itself however does contain the complete posting (it's not relevant whether your provider uses a indexed server for headers only or not).


That it saves you local storage for the headers is clear. But you are hosting a index of articles. Not the newsgroup itself. Once no server hosts the files itself anylonger you will not ever be able to retrieve them again.


That it will save time retrieving headers if there are fewers headers is evident. That's why indexing (nzb)-sites have popped up. I don't disagree with you that the exploding amount of content combined with the very large retention offered nowadays most definitely will cause more and more problems now and in the near future. Again that is a problem caused by the design of usenet and it's principle use.


So it's a question of terminology. A usenet group is a group containing posts, both headers and body's, period.


I don't like throwing date's around, but if you are interested: I've been using usenet since 2000. In the time we had 14k4 modems and broadband internet did just begin to become available for the general public (be it with magnificent speeds of 256/64 kbit  :d ). 


When par2 arrived we were very happy because it meant we did no longer have to download full par's for each file missing just one segment! I think you can understand the time it took just to download an extra 5 or 10MB on a posting (fortunataly no DVD's back then!).

boolean

  • Just Arrived
  • *
  • Posts: 1
Re: NSP Article Size Limits
« Reply #7 on: 01 October 2014, 03:59:22 »

Normally I would bash people like you around, but since I like your dedication to binaries4all, I'll be mild on you. Nonetheless, as a moderator you should be ashamed for not understanding the basic principles of Usenet. I stumbled upon this thread while searching for today's largest accepted article size by common NSP's, and found your input very disturbing.


I can confirm Isis is 100% right and there isn't another way to explain it to you. He already expressed himself in such a way a 3yo would be able to understand the logic behind it, so I don't see how you fail to comprehend. You should make this thread a sticky (and remove your posts) to help other people understand yEnc encoding. This is by far the most accessible explanation I've come across.


If you're still not convinced, please make a support ticket at Astraweb and I'll make sure you get a lengthy explanation. ;-)