How to Spot Bad Science #5: Know What Good Science is #3

So, a few days ago, I listed six elements of good science and then more recently I described why both Large-N and Time were important and how having small-Ns and short time horizons made making statements about causality more difficult. Today I want to discuss the role of randomness in setting up an experiment.

  • Large Sample Sizes
  • Random Selection
  • Random Assignment
  • Placebo
  • Blindedness
  • Time

Basically, the point of randomness is that there are a lot of things that we won’t be measuring in any given experiment and many times we won’t be wholly aware of what some of those things are. What?!?! Let me explain.

In a typical story on a scientific discovery, it is pretty common to read something like “The researchers controlled for sex, educational level, income, and race.” What this means is that scientists observed and noted these characteristics and then used a statistical procedure to systematically erase any effect from that factor from the analysis. One, not entirely accurate, way to think about it is like this: Let’s assume that what the scientist wants to know is the effect of a medicine on sex drive. The hypothesis is something like, “If we give subjects Libidinex, then they will report more interest in sex.” But let’s assume we know from a previous survey that both men and young people already report higher than average interest in sex than the “average person.”  So the scientists would want to make sure there were men and women in experimental and the control groups and that there were  young adults, adults, and older adults in both groups as well. Then they could make sure that the effect of the drug was not limited to simply those groups that were more prone to be interested in sex. Basically, the scientist can look at all the known groups separately to see if the drug has an effect above and beyond any effect of the known variables.

The problem of course is that we cannot measure everything. What if the drug only works if you take it precisely two hours after waking up, or what if it works best on people who live at higher altitudes. What if it’s best on humans who have a DNA combination that, to date, is completely unknown to have an effect on this kind of medicine? Scientists pick the usual suspects that we know tend to exert an influence on the type of intervention they want to learn about. But they can’t know or test all the other factors that might also affect the results.

That’s where randomness comes in.

There are two places randomness shows up in an experiment. First there is just picking who is going to be in the experiment altogether. This is random selection. And then there’s randomly assigning those folks into the various test groups: the control group and the experimental group(s).

The second, random assignment, is more common than the first. To do true random selection, you would literally have to have some method of choosing random people from a population and forcing them to participate. This does happen, but it’s pretty rare. Typically, some method of ensuring randomness is concocted (e.g., we will ask every 6th person that passes us by to take part in our survey) and then a response is determined ahead of time to handle what happens when a person so chosen refuses to take part. The fact is however, that you can never get rid of the fact that is will always be possible that the people who choose to take part systematically differ (somehow) from those that choose not to. This problem is compounded when recruitment consisted of an ad in a paper or from posters hung up around campus. In that case you have narrowed your participants to people who both (1) read that paper/live or work on campus and (2) agree to participate. This is even further confounded if your subjects are people with a specific health or behavioral issue in which case you even further limit the participants to (3) people who know they have the condition and (4) where severity may cause people act as an inducement then you may only test the worst cases (which can influence your findings by displaying better results than they would have on “the average” person with that condition.

Random assignment takes care of some of the effects of lack of random assignment. It can’t entirely erase the selection bias, but if we think of all the factors that might drive someone to participate, the least we can do is make sure those get spread around evenly into all the required test groups. And again, random assignment is the method of trying to equally distribute all those factors we don’t know to measure (as well as the ones we do know to measure).

Where random assignment is often lacking are in the clinical applications of medical interventions. To give you an example from my field:

  • 95% of smokers that try to quit, will try to quit without any help at all. 5% of them will be successful.
  • 5% of smokers that try to quit will use nicotine replacement therapy to help them (like the nicotine gum or patch). Of these about 25% will successfully quit.

Now the question before us is this: is there some systematic difference between those that choose to use NRT and those that do not. For example: NRT costs money and we know that there is a correlation between income levels and smoking rates. We also know that lower income folks have a harder time quitting. So perhaps the use of NRT is proxying for income to some degree and it makes it look like NRT is more effective that it would be if we just gave it (for free) to the 95% of people that would otherwise try to quit unaided. Maybe the use of medicine is selecting out certain at-risk minority groups that distrust Western medicine. Perhaps it’s selecting out older folks who have dental problems (the gum is hard to chew) but who may be more addicted to nicotine (because of how long they’ve been smoking). Maybe using NRT is something that more educated people know about–or maybe the instructions are hard to read (they are) and so less literate individuals shy away from it. Any of these factors are possible. And we can answer this question if we simply (1) randomly assign (2) smokers (3) who are seeking help (4) to one of two groups one (4a) where they get the NRT or another (4b) where they get a placebo. This random assignment can help us determine if NRT “works” on “everyone” or just those who are inclined to use this kind of help. [It does.]

The general idea behind randomness (and you can get your PhD in studying how randomness works, so consider this a really basic intro) …the general idea is that if you get two large enough groups constituted of people selected randomly, they will resemble each other in just about every factor you can think of–and the assumption then is that they also resemble each other in the all the way you can’t think of. So if an intervention has an effect on the experimental group, it can be assumed it would have a measurable effect on any group of random people. This is what we call generalizability…or external validity. The combination of these two common terms does a pretty good job of explaining what you can do if you have random selection and random assignment–and a pretty good idea of what you cannot do if you don’t: You can generalize the findings to groups of people outside the test group.

Basically most experiments are missing one or both of these features. So any conclusions from them should be limited to precisely the groups they studied, i.e., freshmen and sophomore, men and women, at university, in one of the Rust Belt states. If young, moderately educated, mostly white adolescents from middle class Midwestern families lose weight when they eat chocolate, it probably shouldn’t be interpreted as a rule for a 45 year old black man from northeast Mississippi to follow.

And a key point is that this problem does not go away with large-Ns. It’s often the case that really large Ns happened because of very successful efforts to achieve randomness, but this is not logically necessary. If a random survey was conducted among participants in the Million Man March the N could be very large, and they could be “random” in the sense that each individual was chosen from a very specific algorithm designed to achieve random participation. But in that case, the locale itself has already done some selection for you. You would not necessarily want to use that survey to predict anything about the general population. Similarly a very large survey of the “Residents of Minneapolis/St. Paul” will only reliably be able to tell you what the residents of Twin Cities feel on the topics questioned. Residency in a town has already done some of the selection for you and N doesn’t matter even if you achieved a 100% response rate; even if you asked the same questions for 30 years.

 

How to Spot Bad Science #4: Know What Good Science is #2

Yesterday, I listed 6 criteria that make good science:

  • Large Sample Sizes
  • Random Selection
  • Random Assignment
  • Placebo
  • Blindedness
  • Time

Beginning today, I want to run through them briefly to explain why each is important, beginning with large sample sizes and time. I’m moving time up because it’s easy to discuss, practically self-explanatory. I’m also moving it up because Random Selection and Random Assignment should be discussed together as should Placebo and Blindedness.

Sidenote: I am skipping a discussion on the notion of “correlation does not equal causality.” This is extremely important to know, but in my opinion, follows the discussion of these six criteria.

Large-n

The sample size should be large. In scientific studies the sample size is referred to as “n.” So if there were 60 people in the study we would say that “n=60.” Studies with lots of observations are called “large-n studies” and those with few observations are called “small-n studies.” I only bring that up because at one point I will exhort you to “get the article” and you will see “n” show up in the abstract.

After I wrote the bit about sample sizes yesterday I had to be careful to say “less than 20 is almost always problematic.” When the effect size is large, you can get away with smaller sample sizes. But sometimes authors don’t report effect sizes, and they often don’t provide enough detail to calculate them on your own. More importantly, who has the time? As a general rule of thumb, most effect sizes are pretty small and require a lot of observations to make any firm pronouncements on the association. So when you see a small sample size (and less than 20 is a good rule of thumb here) it is almost always grounds to say that causality could not be determined from that study.

It doesn’t rule it out. Theoretically, you could have an experimental group of 1 and a control group of 1 and observe a difference that is related to your intervention. The problem is being sure. Which leads me to the other main reason you want to see studies with larger ns. With smaller sample sizes, there’s a greater chance that what is observed is related to some unobserved variable and not the intervention itself. The idea behind large sample sizes is that the scientist is able to eliminate (or diminish) the role of chance. If I am the experimental subject in a group of one and my friend Joe is the control group and you give him a placebo and you give me a pill that’s supposed to ruin my ability to concentrate and then I spend the night drinking–well–I may have trouble focusing the next day; and the drug may be the cause, but how would we know? The more people in the group, the less it matters if some of them don’t follow the directions exactly or have some genetic condition that effects the results. And if both groups are randomly selected the more likely it is that such differences will show up in both groups and cancel each other out.

Time

 

I cut nearly 500 words on time even after saying that it’s “practically self-explanatory.” Here’s the briefer summary: We have to remember that time is always a confounding variable. If that makes intuitive sense to you, then you can stop reading. If it doesn’t, and it didn’t to me at first, then read on.

Sick people, in time, frequently get better. Systems, even in absence of an intervention, change. People mature and change their minds about stuff…frequently in ways that we would call “better” in regard to their ability and willingness to make “good” decisions, for example. Elements decay. That’s one reason time is important.

We also need to make follow-ups on our interventions to determine if they have any real, lasting effect.  Many interventions will demonstrate short term benefits, but no lasting effect. Or worse, many interventions with short-term benefits have unexpected, long-term harms. We need to know this before we start touting the next “miracle cure.” More than likely, we just don’t know because long-term follow-ups is expensive and logistically difficult.

Finally, and somewhat related to the first point, timing matters. In public health, when an intervention is applied just before a group of people would naturally start getting better, it’s been called “riding the epi curve.” It looks like the intervention made everybody better, but in reality people were getting better anyway. Think of a massive surge in applications of the flu vaccine in February just as flu cases hit their maximum. There was nowhere to go but down. Separating the effect of the flu vaccine from the natural decline in cases is hard to do.

 

Time is a little strange as one of the six criteria. The other five are classic considerations of experimental design. Time is a constraint often forced on the experimenter from the outside, e.g, “The grant will fund us for two years; what can we do in two years?” or “We want to know how many freshmen given this intervention will go on to college; so the experiment runs for six weeks on year with follow-up 3-4 years later.” But as a reader who wants to know how important an intervention is likely to be, the question of time is just as important as the others.

(whew! Shaved about 200 words out of there.)

Tomorrow we move on to the role of randomness in experimental design.

How to Spot Bad Science #3: Know What Good Science Is #1

The goal of science is to try and prove which things cause other things to happen. I wouldn’t use that as an answer for a short essay question, but it will suffice for the duration of this post. Now, I’ve blogged before that I don’t think the randomized controlled trial should be held up as “the gold standard” mostly because an RCT is not always feasible. And, it also creates an impression that any RCT should be treated as “good science,” when, in fact most RCTs fail to be meet the gold standard in design, even more fail during execution, and more still fail during analysis. You don’t get a free pass just because you designed an RCT.

That said, at least a cursory understanding of what an RCT is and why it is largely considered to be “the gold standard” of the scientific enterprise, is a great place to begin developing critical reading skills in regard to understanding published science.

Although it chronologically comes first, a good understanding of the what a hypothesis is, how one is created, and its role in the study design is an important part of scientific literacy, I’m going to skip it for now, because the the biggest bang for your buck comes in understanding the characteristics of an experiment after the hypothesis and the measure of its effects have been decided.

The gold standard experiment will have the following characteristics.

  • Large Sample Sizes: There are formulas for experimenters to use to help them determine how big a sample needs to be to be able to reasonably capture the effects of an intervention. We won’t go into those. It will suffice to say only that bigger is better and less than 20 is almost always problematic.
  • Random Selection: The subjects of the experiment will have been randomly selected from the widest possible version of the population the scientist wants to say something about.
  • Random Assignment: Once a random group of people have been chosen to participate in the experiment, then they must be randomly assigned to receive the intervention or something else (or in some cases, varying doses of the intervention).
  • Placebo: The “control group,” the group that does not receive the intervention, could receive nothing at all, but it’s far better to receive a fake version of the intervention.
  • Blindedness: At the very least, the subjects should not know if they are receiving the placebo or the real thing. But since we’re talking about ideal experiments, the person that engages with the subjects should also not be privy to this information. If neither the subject nor the experimenter know, this is double blind. And double blind is where the real action is at.
  • Time: The experiment should continue taking measurements for as long as possible even after the experiment has ended.

Most studies will fail to achieve at least one of these six criteria. It is almost impossible, for example, to use random selection, especially in nutrition or pharmaceutical tests. And most experiments will have woefully short time horizons. Some experiments are difficult to design a placebo for. For example, if the hypothesis is “Students who are smacked hard across the face just prior to an exam will perform less well than students who are not slapped.” There is no placebo slap. In cases where placebos are hard to develop, blindedness also frequently suffers. An experimenter likely knows whom she has slapped. And in many medical contexts, random assignment is often unachievable. As we move deeper into “clinic-based evidence” I see more and more interventions that lack several of these features.

Tomorrow, I will discuss in slightly more detail why failing to implement each of these, regardless of the reason, limits the conclusions that can be derived from an experiment. For now, simply be on the lookout for these characteristics and if you see them missing know that causality remains a question and any expression of certainty should be scaled back.

How to Spot Bad Science #2: It isn’t about the Headlines.

The most famous tool at this point for helping lay readers understand the science they’re reading and evaluate whether it’s good or not is this one. It starts with this advice: Be wary of any article with a “sensationalized headline.” This is a fairly common warning. This article in Wired advises readers to “Ignore headline grabbers” (#3). The tenth piece of advice in this article says to be aware of “Sensation!” While not strictly about headlines, this article on a fitness website, tells its readers to be aware of any science reporting that seems to be giving its readers “a magic bullet.” And this one tells science journalists to focus on evidence not on “rhetoric or narrative.”

This advice is bad advice for three reasons. It’s not “bad advice” in the sense that following it will end badly. I’m saying it’s bad advice in the sense that you can’t actually follow it. It’s circular logic; it begs the question. It attempts to cure a basic human survival mechanism. And it doesn’t constitute actionable advice–it doesn’t provide the reader anything to do.

Imagine this: Imagine a headline that screams,

Extract from newly discovered plant shown to eliminate 95% of malignant tumors!

That’s pretty sensational. But what if it were also true? I read a book about knights once when I was in elementary school and it described the knight’s code. A knight was not permitted to brag, said this book, but a knight could truthfully describe any of his own exploits, provided they were true. It ain’t braggin’ if it’s true. In other words a headline is only “sensational” if we already know that what it’s claiming isn’t quite as…well…sensational…as the headline has led us to believe.

In addition to question begging, this particular advice does something that I think deserves being called out. It places too much emphasis on “the system.” The fact is that we’re humans. We’re highly evolved biological machines that require energy to do things. It is in our basic construction that we seek to make things easier–to require less energy. Scientists are looking for magic bullets, press releases tout them, journalists regurgitate them because that’s what you’re interested in. Even those of who have trained ourselves to look sideways at “magic bullet”-style claims, those of us that are immediately suspicious of “too good to be true” deals are still nevertheless intrigued by them. Sometimes we just want to be the one to find the hole that exposes the fraud, but there’s always a part of us that hopes that we can’t find it, that this one is the real deal.

Finally, this piece of advice is non-actionable. Let us assume you have read the above headline and determined that it is sensationalized and should be considered suspect. In what way is it suspect? Is the plant not newly discovered? Does it decrease the size of 95% of tumors but not actually eliminate them? Is it 85% and not 95%? Is it 95% of all tumors which includes benign tumors as well and less than 95% malignant ones? Do the tumors have to be caught at a preternaturally early stage well before most screening tests would identify the tumors? Is the extract dangerous to the point of toxicity meaning it is useless as medicine? The questions go on and on. The fact that you should be wary offers no advice on how you can still read the article and get some use out of it. And, in fact, the advice “Be wary of non-sensationalized headlines” is just as true and just as useless.

From here on I will offer actionable advice on how not to be duped by bad science and/or bad science reporting.

How to Spot Bad Science #1: Science is Hard!

I read science all day most days. I read scientific studies across an array of fields. For my day job I read articles from psychology, biology, chemistry, genetics, public health, medicine, pharmaceutical research, and public policy. My educational background is almost exclusively from the “soft sciences”: political science, sociology, criminal justice, international relations, and economics (mostly in the sense of “political economics”). And among my hobbies is studying beer, where I tend to gravitate toward the psychological aspects of tasting and marketing, evolutionary studies of yeast strains, basic beer chemistry, and market analyses. I have read a lot of different types of science and I’ve become literate in various scientific idioms, including the methods used and what the limitations are of those methods.

That’s all prelude to this:

We live in a transforming world. Science is gaining a level of popular attention that in many ways rivals what was happening during the Scientific Revolution. Of course a lot has changed in the last 400 or so years. There’s a lot more science now, a lot of new methods, and full understanding of all that’s happening is more impossible now than ever. We require good curators,  editors, and popular writers to tell us what is important and why. At the same time we need them to place the newly gained knowledge into the context of what has happened before and what the likely implications of that knowledge are. But this is extremely difficult to do! In addition to the intellectual strains it places on our science journalists, science stories don’t always cater to the needs (or desires) of profit-seeking newspapers and websites or enrollment seeking universities (which write the press releases that science journalists use as a resource). And, the limited impact of many new discoveries don’t really serve the interest of grant- and tenure-seeking scientists.

Basically, reading science is a difficult but not impossible skill to learn. But because it is difficult–both knowing how to read it, and also knowing what to read in the first place–many people will elect to get their science from newspapers, evening news, and blogs like IFLScience which pick which studies to highlight and also summarize them in a language that’s easy and fun to read. Because not all scientists are equally good (or honest) at presenting their findings and because press release writers are incentivized to sensationalize their organization’s contribution to the literature, and because news outlets are incentivize to get readers’ attention, reading science news is also requires a specific skill. However, many people are not aware this skill is required and thus have never developed it.

Thankfully, there are tools out there to help concerned readers develop the skills they need to get the most from science reporting.

Unfortunately, they are all wrong. Over the next few days (weeks?) I’m going to walk you through the real knowledge and skills you need to be a truly scientifically literate human being in the 21st century.

Left-Wing Media Bias?

I have been following the science of determining (or disproving) the “left wing bias” of the media for a long time. I have seen lots and lots articles on the subject with many many charts and graphs in them. And they all basically look like the one here [PDF]. I’m linking to this one because it’s new and…if you’re following this whole “Lying Liar Political Scientist Published an Article with Lies” story, it has the benefit of being relevant to that discussion as well.

If you are following the whole LaCour fiaso read the whole thing. Basically, LaCour probably faked data on a 2nd article as well (and this essay is written by the guy that LaCour jocked the data from). If you’re only interested in the “media’s left wing bias” angle, then direct your gaze to page 5. What that chart is telling you is that, while there does seem to be an apparent “leftish slant” this is far from certain and could actually be flat wrong.

  1. Most of the studied channels appear to be inbetween the moderate left hump and non-partisan 0.
  2. There are two channels to the left of moderate left and no channels to the right of moderate right.

However, and this is critical, with the exception of Lou Dobbs’s show, all the confidence intervals (Bayesian credible intervals) overlap 0…which means they aren’t statistically different from “centrist”…and some of them could be “conservative.” Those CIs are huuuuuugggeeee. They basically run the entire gamut of the base dataset. I’ll also mention that most of the CIs (all but three) run most of their difference to the right. I think that’s difficult to interpret, but the consistency of this right skew across channels implies meaningfulness.

Some of this is a limitation to the types of methodologies available to this kind of research. But some of this is likely due to the fact that…well….most channels aren’t interested in 100% alienating 50% of their potential audience and thus…at the very least…attempt to appear neutral if not being, in fact, neutral. In the words of the author:

In the replication, only one show, Lou Dobbs Tonight, has a credible interval that does not overlap zero. Applying LaCour’s criterion, Lou Dobbs Tonight would be classified as “conservative news,” while all the remaining shows would be classified as “centrist news.

CBD Maybe not a Miracle Cure for Childhood Seizure Disorders

One of the stories that got a lot of media time during the fight to make recreational marijuana a legal reality was the one of Haleigh–a young girl who, prior to adopting a CBD regimen to treat her seizures, was having 200 episodes a day and now is closer to 10. That is a huge success. We read a lot out here about all the families that are moving to Colorado to obtain marijuana for their children’s seizure treatment. I would say that, aside from treating chronic pain…especially chronic pain in association with chemotherapy…marijuana’s miraculous power in treating seizures in children is the driving narrative of why marijuana should be reschedule, why every state without a medical marijuana law should get one, and why recreational marijuana should be far more widespread.

I have no doubts that Haleigh’s story is a true. In fact, a recent study (abstract here) found that 33% parents of children taking CBD reported their  children’s seizures drop by half. That’s huge.

However, that same study found that 44% of children taking CBD were suffering negative health outcomes from the treatment including (and this is important) increased seizures.

In a more objective measure of seizure-related brain health, only 3 of the children in the study showed any actual improvement (which indicates that some portion of those 33% of children who saw their seizure decrease by half, may have have been over-reporting the benefit, or may be operating under some placebo effect).

In any case, those last two points are crucial and they will no doubt be completely, and tragically, overlooked.

Medicine is weird; it works for whom it works. The way we determine if a medicine works is to get two groups. Individuals are randomly selected into one of the two groups. One group receives the treatment, the other receives a placebo. If possible, even the administrators don’t know which individuals are receiving which medication and which the placebo. Then we look for improvements in both groups. For various reasons, some people in both groups will get worse and some people will get better and some people may see no change in their condition at all. Ideally we will see more people get better in the medication group and less people getting worse. In either case, the differences observed within each group will be compared to the differences observed in the other group to determine if those change are “statistically significant.” That’s a real rough description of the “double blind, randomized, placebo-controlled, trial.”  Basically if the improvements in the medication group are better than the natural improvements seen in the control group, researchers conclude that the medicine “worked.”

But something peculiar happened up there. Something that researchers know about but doesn’t always get translated out. The medicine didn’t seem to have any effect at all on some people. And worse, some got worse.

Some of those differences might be chalked up to an inability to precisely measure changes in the condition. We tend to think of illness as a thing you either have or you don’t, rather than thinking of it in terms of how much of it you have. So it might be that very minor improvements in condition were beneath a threshold where those improvements were observed and reported to researchers. It’s also possible that declining condition in the medicine group would have been more pronounced had the medicine not been present.

But it’s also entirely possible that the medicine helped some people in the medicine group and harmed other people in the medicine group. Humans are all different. Diseases manifest differently in different people, medicines react differently to different people. So when we claim that a medicine “worked” it’s not always entirely clear what is meant. So the medicine works for whom it works and not on anyone else. Ideally, we would give medicine that works only to people that it works on…and maybe even prescribe medicine that “doesn’t work” to the people it works on as well even if it doesn’t “work” any better than chance at the group level.

I’m not saying that the story linked here proves that marijuana is a crappy medicine for children with seizures. It’s entirely possible that marijuana is very good for some children, and maybe not quite as good and possibly harmful for others. The problem is that the rhetoric of these “miraculous” cures is problematic. Parents desperate to find any cure for their children may overlook problems with marijuana in their child, they may see signs of improvements that don’t exist. And worse, they may forego other, effective, treatments while hoping that the marijuana miracle works out.

I think this is an area where we definitely need more research. And marijuana is cheap and, compared to many traditional treatments, safe. So I’m glad that it’s available for parents to try. But I also wish that the conversation we were having didn’t involve the word “miracle” quite so often. And that the potential for no- or bad outcomes was more appreciated.

Where’s the Rest of The Rest of the Story?

Generally I am a fan of Dr. Siegel, but I do not always agree with him. Here is one example. I am not a fan of the kind of analysis offered in yesterday’s post.

Siegel has picked an enemy…The Campaign for Tobacco-Free Kids, and he grabs a hold of a recent press release of theirs where they are encouraging states to raise the minimum age to buy tobacco products to 21. The central component of his rant is that the Campaign supports this law even though it was the Campaign itself that helped prevent the FDA from gaining that power. He calls them disingenuous. That’s it. A charge of hypocrisy and nothing more substantive.

  1. There are two things to point out here:
    Either raising the minimum age to buy tobacco products is a good policy or a bad one. There are good arguments on both sides, but Siegel presents none of them. He seems to support it, so he should ally himself with organizations that also support it, regardless of how they felt about it last week, last year, or ten or twenty years ago. When organizations or individuals come around to your way of thinking, that should be applauded, not used as another opportunity to vilify them. This raises the specter of ulterior motives on Siegel’s part, which makes his stance appear to be the disingenuous one. I’m not making any accusations here. I wouldn’t know what accusations to make. I normally consider Siegel an authentic and committed voice in tobacco control. This doesn’t feel like what’s happening here.
  2. If it is a good policy, that does not imply that it is equally good if implemented by either the states or the federal government. In a federalized system like the US has, one could easily advocate for all 50 states adopting a policy but combat the federal government adopting that same policy or overseeing its administration. There’s no hypocrisy there. I don’t know if the Campaign’s stance was that this would be better as a state decision or not–it doesn’t really make any sense in this particular instance to support state-level decision making over federal action–but it’s possible and without exploring that possibility, Siegel performs a disservice.

Siegel always calls the second half of his posts “The Rest of the Story,” but I think this is an instance where The Rest of the Story needs its own The Rest of the Story.

Is Red Wine the Miracle Fat-Burning Miracle Cure You’ve Been Waiting For?

r0eThere is a debate among scientists, researchers, and journalists that goes something like this:

  • Scientific community: We need better science reporters.
  • Science journalists: Don’t blame us, science is pretty freaking tough and we rely on good press release writers to give us the skinny.

I think there’s some truth to both of these statements, but they are almost certainly incomplete. A recent flurry of new stories on the benefits of wine drinking is a good example of where it falls short.

In a recent press release from Oregon State University they claim: “The findings suggest that consuming dark-colored grapes, whether eating them or drinking juice or wine, might help people better manage obesity and related metabolic disorders such as fatty liver.” Which, of course, earned OSU and the scientists in question many accolades and clogged up my Twitter and Facebook feeds for days with people claiming they can finally start skipping the gym.

But is that what they said? While the headline writers seemed to concentrate on the “fat burning” sentence above, the reporters were pretty keen on adding this line from later in the press release. “‘These plant chemicals are not a weight-loss miracle,’ cautions [study team member Neil] Shay. ‘We didn’t find, and we didn’t expect to, that these compounds would improve body weight,’” he said.”

I have a tendency myself to take science reporters to task, but here’s a good example of science reporters more or less getting it right. Of course, if the story was truly understood from the get-go, it’s hard to see it getting much coverage. So it’s clear that something went wrong, but what? And what has been people’s takeaway? Most of my friends took to Facebook and Twitter in jest, but it was a jest that relied on a misunderstanding of the truth of that first sentence. Can we say that the reporter’s were “at fault” for this misunderstanding?

A part of me says, sure. If a failure to communicate is large and consistent, then it must be the communicator’s fault to some degree. On the other hand, the reporting seems accurate enough that the misreading seems willful, and no reporter should be held accountable for willful misreading.  And in this case, even the press release was not explicitly overblown. The bolded sentence above, the cautionary corrective to the italicized sentence above that, was drawn directly from the press release.

In this case, there was potentially an effort on the part of the press release writer to deliberately persuade reporters to lead with the “fat burning” bit–and to use that in the headlines as well. And this is what seems to have occurred, deliberate or not. (It was deliberate.) In this case then, maybe there’s some room for criticizing the reporters for falling prey to their incentive to make a splash with the headline–an incentive played to by the press release author.

To say “we need better science journalists” in an instance like this, is to say, “we need people who work for newspapers to not care about the incentives of the newspaper business.” It’s true, but it’s also extremely infantile a wish. Most of science isn’t truly headline worthy, in the sense that newspapers look for headlines. To cover science at all we must expect errors like this. It’s up to non-reporters to carry the weight of explaining what is going on and spread the word. Speaking of which…

Thomas Lumley’s post on Stats Chat can help shed some light on what was actually found in this study.

How Many is That? Pot Smokers in Colorado

These two sentences appear back to back in a WaPo article on a report on marijuana usage in Colorado. Both are inaccurate out of context.

Adult residents either smoke pot (relatively) few times a month or nearly every day—there are few in the middle.

More than half of all adult resident users consume the drug in some form fewer than six times a month.

The author, Niraj Chokshi, knows, but does not say in these sentences, that only 9% of Coloradans had used marijuana, in any form, 12 times in the last year. He cites this statistic in the paragraph just above the first sentence. But as sentences are supposed to capture a complete idea, leaving out the statistic that places the idea in context is almost certain to confuse some folks.

I don’t think Chokshi means to do this. But taken out of context these sentences imply:

  1. All adult residents consume the drug a little everyday at least.
  2. More than half of all adult residents consume marijuana fewer than 1-6 times a month, the rest use more often.

In reality the relevant universe here are the set of individuals who are

  1. adult
  2. residents of Colorado
  3. and who use marijuana at least 12 times per year

So not “more than half” of our more than 5 million residents, but “more than half” of 9% of that number. Using just Choksi’s out of context sentences you might think there are millions of pot smokers in Colorado. There are, in fact, (according to the cited report’s very good methodology, btw) just under 500,000 smokers. So, the “more than half” here is talking about a couple hundred thousand smokers. That’s not nothing, but it ain’t millions either.

It’s an unforced error and it makes this issue harder to understand given the limited amount of cognitive power people can be expected to expend on any one article.

The important part of this report, btw, is that few who use smoke everyday or most days (which includes nearly all of the *medical marijuana* folks) account for nearly 70% of all the pot consumed in Colorado. Choksi doesn’t miss this point, but he does bury it a little.