AUTHOR=Trafimow David , Amrhein Valentin , Areshenkoff Corson N. , Barrera-Causil Carlos J. , Beh Eric J. , Bilgiç Yusuf K. , Bono Roser , Bradley Michael T. , Briggs William M. , Cepeda-Freyre Héctor A. , Chaigneau Sergio E. , Ciocca Daniel R. , Correa Juan C. , Cousineau Denis , de Boer Michiel R. , Dhar Subhra S. , Dolgov Igor , Gómez-Benito Juana , Grendar Marian , Grice James W. , Guerrero-Gimenez Martin E. , Gutiérrez Andrés , Huedo-Medina Tania B. , Jaffe Klaus , Janyan Armina , Karimnezhad Ali , Korner-Nievergelt Fränzi , Kosugi Koji , Lachmair Martin , Ledesma Rubén D. , Limongi Roberto , Liuzza Marco T. , Lombardo Rosaria , Marks Michael J. , Meinlschmidt Gunther , Nalborczyk Ladislas , Nguyen Hung T. , Ospina Raydonal , Perezgonzalez Jose D. , Pfister Roland , Rahona Juan J. , Rodríguez-Medina David A. , Romão Xavier , Ruiz-Fernández Susana , Suarez Isabel , Tegethoff Marion , Tejo Mauricio , van de Schoot Rens , Vankov Ivan I. , Velasco-Forero Santiago , Wang Tonghui , Yamada Yuki , Zoppino Felipe C. M. , Marmolejo-Ramos Fernando TITLE=Manipulating the Alpha Level Cannot Cure Significance Testing JOURNAL=Frontiers in Psychology VOLUME=9 YEAR=2018 URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2018.00699 DOI=10.3389/fpsyg.2018.00699 ISSN=1664-1078 ABSTRACT=

We argue that making accept/reject decisions on scientific hypotheses, including a recent call for changing the canonical alpha level from p = 0.05 to p = 0.005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable alpha levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and sample size much more directly than significance testing does; but none of the statistical tools should be taken as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold of 0.05, 0.01, 0.005, or anything else, is not acceptable.