Skip to content

What is p-hacking?

May 24, 2016

Even peer-reviewed literature has its flaws. Authors will tweak data to fall within the “significant” window of p<.05. This p-value is considered the line at which results are no longer reasonably expected by chance alone.

Ways to p-hack include:

Excluding certain groups- to counteract this, always look for intent-to-treat values, including those lost to follow-up being reported, not only the patients that made it to the end of the study.

Running an experiment for a large number of outcomes, but reporting only those that are significant- so always check the “boring” section of materials and methods and look for “primary efficacy or primary safety outcomes. This is easier for authors to tweak for in vitro studies and animals studies since there are no internal review boards reviewing study methods for these. This may sometimes explain why drugs or “natural products” that kill cancer in test tubes and rats don’t end up making it through clinical trials. Just looking at the diminishing science pipeline and you can see how some investigator subconscious bias can affect studies. Those p-values may vanish quickly in the light of large-scale, prospective, controlled randomized clinical trials.

An example of this may be frankincense, which has been touted as the new cancer cure by some. Although in-vitro results seemed promising, a recent randomized, controlled, clinical trial was terminated early when results clearly showed no added effect with the addition of Boswellia serrata to patients who received treatment with standard of care for glioblastoma multiforme.

Here, one paper uses a microarray to screen for the components that might reach a significance of P<.05. They use only 3 samples for each, and their paper focuses on only the “positive” results. The results look promising, but definitely need to be replicated with more than 3 samples, and a control needs to be added
Another reason for p-value mistakes; just bad statistics- so always look for “raw data” how many people/lab rats/samples out of total saw effect Does it intuitively look right to you that this was enough to be “significant”?

Very small numbers <50 and very large numbers >10,000 can sometimes skew the p-value, even if the study is sound.

*Bottom line, p-value of >.001 is stronger evidence than p<.05.*

I suggest looking for effect sizes or, R-values (relative risk; at 1 it’s equal, if below than the intervention is reducing risk: if above 1, then control group has lower risk.)

Confidence intervals, the smaller the better.

I do want to add that doing a proper subgroup analysis is valid; certain drugs work better in a subgroup of patients that have a specific genetic make-up or biomarkers. But again, these results, in particular, must be repeated in a full-scale prospective randomized clinical trial with that subgroup as the primary population. A great example of doing this correctly  is Abraxane’s efficacy in a very specific type of non-small-cell lung cancer, called squamous cell lung cancer. Celgene noted an increase in efficacy in this subgroup, as seen on the Abraxane prescribing information, and they are pursuing larger clinical trials to find out the true extent of this difference. Kudos to them for investigating this difference!

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: