tag:blogger.com,1999:blog-1626022758977387817.post9062947464335130580..comments2010-03-10T16:55:20.036+01:00Comments on Abaababa: Will Wikipedia collapse under its own weight?semmelweisnoreply@blogger.comBlogger9125tag:blogger.com,1999:blog-1626022758977387817.post-52105462381188318492009-09-07T07:49:25.477+02:002009-09-07T07:49:25.477+02:00cheap wow gold buy wow power leveling cheap cheap ...cheap <a href="http://www.watchrolexshop.com/" rel="nofollow">wow gold</a> buy <a href="http://www.wowpowerleveling.me/" rel="nofollow">wow power leveling</a> cheap <a href="http://www.watchrolexshop.com/" rel="nofollow">cheap wow gold</a>cheap wow goldhttp://www.blogger.com/profile/08481579231882702220noreply@blogger.comtag:blogger.com,1999:blog-1626022758977387817.post-35806237842097008902009-09-03T11:13:24.579+02:002009-09-03T11:13:24.579+02:00cheap wow gold buy wow power leveling cheap cheap ...cheap <a href="http://www.watchrolexshop.com/" rel="nofollow">wow gold</a> buy <a href="http://www.wowpowerleveling.me/" rel="nofollow">wow power leveling</a> cheap <a href="http://www.watchrolexshop.com/" rel="nofollow">cheap wow gold</a>cheap wow goldhttp://www.blogger.com/profile/08481579231882702220noreply@blogger.comtag:blogger.com,1999:blog-1626022758977387817.post-34516368618067376622008-03-16T14:07:00.000+01:002008-03-16T14:07:00.000+01:00I wonder how much space a git repos of it will tak...I wonder how much space a git repos of it will take.brammohttp://www.blogger.com/profile/01148115422974448828noreply@blogger.comtag:blogger.com,1999:blog-1626022758977387817.post-32779940749335120122008-02-19T23:52:00.000+01:002008-02-19T23:52:00.000+01:00So we agree that the worst case space complexity i...So we agree that the worst case space complexity is quadratic.<BR/><BR/>Now in practice, that quadratic behaviour occurs during the initial life period of an article where it grows.<BR/><BR/>Keeping an article whose size is stabilized at n bytes is still proportional to n times time because of random mutations (trolls) that get reverted but that still cost O(n) bytes each.<BR/><BR/>The current Wikipedia storage scheme is not sustainable. They have to store the diffs and maybe a few snapshots now and then to accelerate differences between arbitrary revisions. Or base themselves on a proper RCS.semmelweishttp://www.blogger.com/profile/02637848845905304106noreply@blogger.comtag:blogger.com,1999:blog-1626022758977387817.post-5256019708019706412008-02-19T20:33:00.000+01:002008-02-19T20:33:00.000+01:00Okay, I see what you're doing. Under your assumpt...Okay, I see what you're doing. Under your assumptions, storage is indeed quadratic in the number of revisions.<BR/><BR/>You're assuming, though, that the size of a page grows linearly in the number of revisions. I don't see that as a realistic assumption over the long term (e.g., 100 years). I would assume that an article on, say, Millard Fillmore would reach a fleshed out size fairly early and then grow slowly from that point, if at all. I'd assume a constant size, or perhaps at most logarithmic growth.tutufanhttp://www.blogger.com/profile/17265797060358625981noreply@blogger.comtag:blogger.com,1999:blog-1626022758977387817.post-62533745870216629872008-02-19T19:18:00.000+01:002008-02-19T19:18:00.000+01:00OK so I have to get formal.Let p_1, p_2, ..., p_m ...OK so I have to get formal.<BR/>Let p_1, p_2, ..., p_m be m revisions of a page. The size of the i-th revision is |p_i|. Assume p_{i+1} is obtained by adding one letter to p_{i}, and that |p_1| is 1. Hence the size of the i-th revision is i. However, the size required to store the first i revisions under the current Wikipedia scheme is 1 + 2 + ... + i = i(i+1)/2 which is in O(i^2). Note that use input is, save for copy/paste, proportional to the number of keypresses.semmelweishttp://www.blogger.com/profile/02637848845905304106noreply@blogger.comtag:blogger.com,1999:blog-1626022758977387817.post-43293006081268319712008-02-19T19:11:00.000+01:002008-02-19T19:11:00.000+01:00Hmm--it appears that one of us doesn't understand ...Hmm--it appears that one of us doesn't understand your example. :-)<BR/><BR/>1 revision, 10 megabytes<BR/>2 revisions, 20 megabytes<BR/>3 revisions, 30 megabytes<BR/><BR/>This is a linear relation.tutufanhttp://www.blogger.com/profile/17265797060358625981noreply@blogger.comtag:blogger.com,1999:blog-1626022758977387817.post-30465303311464523152008-02-19T19:04:00.000+01:002008-02-19T19:04:00.000+01:00> If Wikipedia stores one snapshot per revision, t...> If Wikipedia stores one snapshot per revision, then its storage requirements would be linear in the number of revisions, not quadratic.<BR/><BR/>Nope, because the stored size of a snapshot is not constant, since the whole page is stored, and not the diff. Hence if you change one letter on a 10 megabyte page, you add 10 megabytes.semmelweishttp://www.blogger.com/profile/02637848845905304106noreply@blogger.comtag:blogger.com,1999:blog-1626022758977387817.post-47944052102316734472008-02-19T18:57:00.000+01:002008-02-19T18:57:00.000+01:00If Wikipedia stores one snapshot per revision, the...If Wikipedia stores one snapshot per revision, then its storage requirements would be <I>linear</I> in the number of revisions, not quadratic.<BR/><BR/>(One might consider the overall requirement to be "quadratic", if one assumes that aggregate page size will grow without bound, but that's a different thing.)tutufanhttp://www.blogger.com/profile/17265797060358625981noreply@blogger.com