I assume that the initial example mixes up a linear with an exponential increase.
The Scenario of a Pandemic Spread of the Coronavirus SARS-CoV-2 is Based on a Statistical Fallacy
Declaration of conflicts of interestThe author declares that he has no conflict of interest
Corresponding author email@example.com
Lead author countryGermany
Lead author institutionUniversity of Regensburg
Ethics statementAll data is publicly available, no data on participants were collected by the author.
This article makes strong accusations against one of the leading German institutes, the Robert Koch Institute. Therefore, I think it is of utmost importance that the used methodology is either well established and supported by citations or at least sound in all of its details. There is already a big problem with the example, which the author claims gives evidence of a statistical fallacy. This example itself is a statistical fallacy and compares completely unrelated things. First of all, the example doesn't extend much further than the series 1,2,4, to be precise looking at the series 1,2,4,8,16 with the used methodology 16 out of 10 eggs would be found. This, however, is not the only problem with the example. Even assuming there would be enough eggs to find, the described situation does not transfer to the case of corona virus testing. It hides the underlying assumption, that the only deciding factor for the number of new cases on a certain day is the rate between positive tests and total tests done. This is absurd, because it implies that if, we doubled the tests at any point we would be able to find double the number of positive cases.
The author makes strong assumptions: The samples are from a (1) homogenous population and that (2) it is only the number of tests matters. (1) is obvious wrong if you look in the situation in many countries with clusters. Without knowledge on the spatial distribution of the tests, any analysis on an aggregated level carries huge uncertainties. (2) The number of tests was limited in the beginning, but with a growing number of laboratory capacities, it is rather the health care instructure (test centres, doctors) that matter. Even in areas with increasingly infected people, this infrastructure would need to grow as much as the tests. Without showing that the medical staff was regionally redistributed to the clusters and tests were then primarily taken in these clusters, the claimed "fallacy" of the author rather fire backs as "fallacy of spatial stationarity", well known in many disciplines, but perhaps not that much in psychology. Finally, it is surprising that a psychologists ignores the possibility that irrespective of testing and associated trends the infections already decreased before the lockdown because people responded with changed behaviour before the states acted. Overall, the paper seems rather a one-sided interpretation ignoring contrasting evidence and interpretation. He assumes that the virus is evenly
Agreeing that the reported numbers of freshly known infections do not exactly show the real numbers of each and every new infection, I disagree that the article's method could possibly help to get a clearer view on the situation. _____ 1. The problem of too little tests in the article's example would exist as long as there are not enough testsets to test every single suspected case. "The more you test the more you find" would only work until you do enough tests to find every infectetd person in your sample. If you already did, and you do more tests anyway, you wouldn't find more cases - unless the test itself produces a certain range of false positives. The PCR tests used aren't infamous for doing so. Speaking in terms of the article's example: If the children are allowed to search for Easter eggs in the garden on the fourth day as long as they want, they would not find more than 10 eggs (except there were 10 eggs per day hidden [=40], seven already found, then they would find max. 33). But no single additional egg. No matter how many children would search how long. However, the charts in Fig.1 show for every single day that there were more tests than finds. So the presented data proves my thoughts here - and disproves the article's theory of misinterpretation. "More here and more there" is a correllation, but no proof of causality.
2. Dividing the total of conducted tests through the number of finds would only work with representative samples. But the number of testsets is limited, and so is the capacity to perform mass testings. Each and every day the persons being tested are (mostly) different people from different groups for different reasons. So this method can't work. Example: Covid-19 reached Daisy Town, the number of infections rises daily. (a) We start testing coughing people going to the doctor, (b) next day adding five times as many people that are exposed to high infection risk, (c) and on the third day we would test everybody else in Daisy Town. We would find an increasing number of infected people, we would see "more tests, more finds", while the quotient (daily tests through finds) is nose-diving dramaticly. According to the article's method we would say: "New infections are decreasing in Daisy Town" - while they actually grow. _ _ _ _ The suggested "estimated true courses of new infections" do NOT show real relations in population (nor changes there) but only daily relations of "total to finds". So the offered "quotient-method" is NOT useful for to estimate the infected part of a population, nor to show changes there. But it seems to be a popular trick to play the problem down.
3. Comparing newly reported infections with the death toll is a difficult task. Not only because of the time lag between death and report (and therefore the need of exact times of death for each country compared). Different people suffer a different length of time until they die. The red lines in Fig.4 indicate that either the article's "theoretically expected course of the number of deaths" is wrong in all six cases, or the respective death-count is. Well, for Germany I can report that the number of covid-associated deaths doesn't distinguish if someone who had been tested positive on the virus was directly killed by it, if it just helped other deseases to accomplish their mission of shortening lifeime a little bit faster, or if a killer's knife pierced their victim's heart. So one could argue that the number of deaths might be too high for this or that reason. But this question would miss the point when it is up to comparing ups and downs of courses over a timespan. The reason for the shown similarities in "infection" and "death" courses is simple: The later you test, the more similar courses you will get. Where people stay untested until they are hospitalized (or found dead), we will see courses that are very close together. What Fig.4 really shows is how difficult it is to estimate the real "case fatality rate" (CFR) with too little data. As shown, comparing new reported infections and reported deaths does not help to assess the current situation. Thefurthermore politicians don't regard the death count much, but freshly reported infections instead. For the main problem about covid-19 is not graveyard's space -- hospital beds, intensive care units and ventilators are. So, when the "death toll" isn't decisive, why does the article include this part?
4. Result. The main thesis of the article is: "The scenario of a pandemic spread of the coronavirus SARS-CoV-2 is based on a statistical fallacy". Accordingly, the figures for both new infections and deaths would paint an incorrect picture due to statistical fallacy. Nevertheless it would be possible to get a realistic picture of the situation by converting reported test numbers and finds. However, the article does not convincingly present any statistical error nor the alleged statistical fallacy. The offered calculation model (tests/finds) can not provide a realistic picture of the situation. As shown by my Daisy Town example, it can even claim the opposite of the actual development. So the article does not support it's thesis conclusively and does not provide a better model.
5. Opinion. Alleged statistical errors, an unsuitable calculation method combined with vague doubts about the way in which fatalities are counted, seem to aim to create confusion and raise doubts with seemingly "scientific" arguments backed up by spurious arguments. This impression is matched by the fact that the author of this article has published a similar article (in German and with further arguments whose evidence refutes his own theses) in several online media. Both of these publications state that this article was "submitted for publication to a professional journal and has already appeared as an un-reviewed preprint". The according link refers to this page. I have the strong impression that the purpose of the publication here is merely to give both versions of the article a "scientific" touch. _____ Finally I apologize for my poor Englisch. I'm not a native speaker.  https://scilogs.spektrum.de/menschen-bilder/von-der-fehlenden-wissenschaftlichen-begruendung-der-corona-massnahmen/  https://www.heise.de/tp/features/Von-der-fehlenden-wissenschaftlichen-Begruendung-der-Corona-Massnahmen-4709563.html?seite=all
I thank everybody very much for the comments. I have uploaded an updated version of the manuscript where the statistical background is discussed in more detail in the introduction (the updated version should be made available by the administrator soon). As pointed out in the comments, the method used to correct for the increase in number of tests reliably estimates the true increase in new infections if several preconditions are met. These preconditions are now discussed in the introduction in detail (see page 7, 2nd paragraph, continued on the next pages). In fact, as discussed, these preconditons seem to be all met regarding the testing for the coronavirus. I look forward to further comments!