This is a draft as a contribution to a discussion to a response to a discussion in the Facebook Page Psychological Methods Discussion Group.
The reason regression analyses aren’t a useful tool to determine the relative relevance of each behavioral determinant has three components.
[ Note: this is a first draft, a preprint of a blog post so to speak 🙂 ]
A recent 72-author preprint proposed to recalibrate when we award the qualitative label ‘significant’ in research in psychology (and other fields) such that more evidence is required before that label is used. In other words, the paper proposes that researchers have to be a bit more certain of their case before proclaiming that they have found a new effect.
The paper met with resistance, and although any proposal for change usually is, what’s interesting is that in this case, the resistance came in part from researchers involved in Open Science (the umbrella term for the movement to mature science through openness, collaboration and accountability). Since these researchers often fight for improved research practices ‘at all costs’ this resistance seems odd.
Thus ensued the Alpha Wars.
[Image by Silver Blue, https://flickr.com/photos/cblue98/]
[These are some thoughts that I’ll eventually work into a paper, so it may be a bit rough/drafty]
Psychology is characterized by an interesting paradox. On the one hand, it’s a very popular topic. After all, everybody’s a person, and the most important influences in most people’s worlds are other people. Who doesn’t love learning about oneself, one’s loved ones, one’s boss, and the leaders of one’s country? People are endlessly complex, so psychology and psychological research provide a veritable fount of knowledge.
On the other hand, that complexity of the human psychology is tenaciously denied. It is almost as if that complexity is seen rather like a spiritual entity, safe to invoke whenever it’s convenient to stare in wonder at the awesome quirks of nature and never-ending weirdness of people, but blissfully disregarded whenever it it threatening or gets in the way of day-to-day activities.
[ UPDATE: a commentary based on this blog post has now been published in the Journal of Informetrics at http://www.sciencedirect.com/science/article/pii/S1751157717302365 ]
Recently a preprint was posted at ArXiv to explore the question “Can the Journal Impact Factor Be Used as a Criterion for the Selection of Junior Researchers?“. The abstract concludes as follows:
The results of the study indicate that the JIF (in its normalized variant) is able to discriminate between researchers who published papers later on with a citation impact above or below average in a field and publication year – not only in the short term, but also in the long term. However, the low to medium effect sizes of the results also indicate that the JIF (in its normalized variant) should not be used as the sole criterion for identifying later success: other criteria, such as the novelty and significance of the specific research, academic distinctions, and the reputation of previous institutions, should also be considered.
In this post, I aim to explain why this is wrong (and more, how following this recommendation may retard scientific progress) and I have a go at establishing a common sense framework for researcher selection that might work.
Wow, good question and points!!!
Based on a PsyArXiv preprint with the admittedly slightly provocative title “Why most experiments in psychology failed: sample sizes required for randomization to generate equivalent groups as a partial solution to the replication crisis” a modest debate erupted on Facebook (see here; you need to be in the PsychMAP group to access the link, though) and Twitter (see here, here, and here) regarding randomization.
John Myles White was nice enough to produce a blog post with an example of why Covariate-Based Diagnostics for Randomized Experiments are Often Misleading (check out his blog; he has other nice entries, e.g. about why you should always report confidence intervals over point estimates).
I completely agree with the example he provides (except that where he says ‘large, finite population of N people’ I assume he means ‘large, finite sample of N people drawn from an infinite population’). This is what puzzled me about the whole discussion. I agreed with (almost all) arguments provided; but only a minority of the arguments seemed to concern the paper. So either I’m still missing something, or, as Matt Moehr ventured, we’re talking about different things.
So, hoping to get to the bottom of this, I’ll also provide an example. It probably won’t be as fancy as John’s example, but I have to work with what I have 🙂
In statistics, one of the first distributions that one learns about is usually the normal distribution. Not only because it’s pretty, also because it’s ubiquitous.
In addition, the normal distribution is often the reference that is used when discussion other distributions: right skewed is skewed to the right compared to the normal distribution; when looking at kurtosis, a leptokurtic distribution is relatively spiky compared to the normal distribution: and unimodality is considered the norm, too.
There exist quantitative representations of skewness, kurtosis, and modality (the dip test), and each of these can be tested against a null hypothesis, where the null hypothesis is (almost) always that the skewness, kurtosis, or dip test value of the distribution is equal to that of a normal distribution.
In addition, some statistical tests require that the sampling distribution of the relevant statistic is approximately normal (e.g. the t-test), and some require an even more elusive assumption called multivariate normality.
Perhaps all these bit of knowledge mesh together in people’s minds, or perhaps there’s another explanation: but for some reason, many researchers and almost all students operate on the assumption that their data have to be normally distributed. If they are not, they often resort to, for example, converting their data into categorical variables or transforming the data.
[ primary audience: behavior change intervention developers ]
Threatening communication is a popular behavior change method used tobacco packaging, to promote seatbelt use and discourage substance use. However, much research also suggests that it is not the best weapon of choice when the goal is to really change behavior, or even when the goal is to raise awareness or educate people.
How is that paradox possible? This blog post will answer that question.
This post is a response to a post by Daniel Lakens, “One-sided tests: Efficient and Underused“, whom I greatly respect and, apparently up until now, always vehemently agreed with. So this post is partly an opportunity for him and others to explain where I’m wrong, so dear reader, if you would take this time to point that out, I would be most grateful. Alternatively, telling me I’m right is also very much appreciated of course 🙂 In any case, if you haven’t done so yet, please read Daniel’s post first (also, see below this post for an update with more links and the origin of this discussion).