The goal is raise TWO-HUNDRED AMERICAN DOLLARS between now and December 31. If I do, I will make a mess of my beard and likely mine or someone else’s house by traipsing around on New Years Day with a Glitter Beard.
I can hear you asking, “How can I make a shiny mess of some random Denver bar/One of Jim’s Friend’s Houses?” And the answer is “It’s EASY!!!”
- Go to THIS LINK and hit the Donate Now button on the right.
- Pick an amount.
- In the Comments section, please reference “glitter” or “glitterbeard” in some way. I plan on having several fundraising things happening at once and to make sure your funds go to the appropriate thing, I’ll need *some* way of knowing where you want your funds to go.
- Funds raised between today (11-23) and December 30th and labeled with any reference to glitter, will be applied to the $200 goal.
First to donate $50 or more gets to pick the color. (After that, larger donations can outbid for color. For example, first $50 bidder calls “gold,” a later bidder at $51 can choose “red” instead and, if not outbid, red it stays.) Put the color in the comments as well.
To get a taste of what this entail, check out the video below.
Can I share something with you? I am bad swimmer. I mean really bad. Full disclosure: for someone who works out as much as I do, rides my bike and hikes as much as I do, and runs as much as I do, I am a miserable athlete altogether. But I am the most miserable of all the miserable swimmers out there.
Let me prove it to you.
Some of you have heard this story, but it’s a good one and everyone likes a story that makes a pompous, intellectual dilettante like myself look foolish. So heal yourself with laughter at my expense.
My very first triathlon—in fact my very first “competitive” event of any sort was in 2010. I had just recently started dating the wonderful human who is now my wife. She is an anxious sort who likes running for its therapeutic effects (sort of, I’ll leave her to define her relationship with running). I don’t remember what possessed her to do a triathlon, but I do know that I like to think of myself as the kind of person that runs triathlons. I was 35 years old.
Thirty-five, for those of you who are math deficient, is one year younger than 36…which is the beginning of the second age bracket in the Mighty Mississinewa Sprint Triathlon. So when I showed up on race day I was pleased to find out I would be a member of the first wave: 18-35 year olds and those competing at the “elite” level.
Oh. I should mention that I worked out extremely hard so they I would not qualify for “Clydesdale” status. What is a “Clydesdale?” you ask. That’s the name for somebody who is racing at over 200 pounds. You can put “Clydesdale” on your race status so that those who see your abysmal ranking will know to convert their snorts of derision into sympathetic sighs. I weighed 198 on race day.
The way the waves work is that one cadre of, in my case, extremely fit young men and seasoned racers (and me) enter the Mississinewa Reservoir at the sound of a starting pistol. Every few minutes, the starting pistol cracks again and a new wave of progressively less competitive swimmers enters the reservoir. A sprint triathlon is not very long. The swim is 500 yards (for MMS), or .4k if you’re metric. Point Four K. The average swimmer should be able to complete POINT EIGHT K in less than 14 minutes. So that means, worse case scenario, I should have heard my own wave’s pistol, the pistol for the wave behind, and the wave behind them just as I was about to exit the lake. Which makes that silver capped matron who swam over my sinking body an 20 minutes later…disconcerting.
I don’t know how many waves there were. But I know there were at least 2 waves of men behind me and at least 3 waves of women. I know that the last wave of women were women over 50, denoted by (and I am not making this up) silver swim caps.
That’s right, readers. The average swimmer…not the elite crowd with which my nearly horse-sized body entered the lake …should swim that length in, at most, 14 minutes. But I was nearly drowned by the forceful strokes of a woman who entered the lake no less than 10 minutes later and who caught up with me, presumably no earlier than 9 minutes after that. Indeed, later records show that I finished the race in a staggering 21:54. Which is a score so miserable that I came in last, not only in my own group of the elite and young, but also in the group of all males 30-44. In fact, dear readers, in a field of 197 racers, men and women of all ages, I was 192nd. The only people too come in behind me were a 43 year old, a 49 year old, two 53 year olds, and a 55 year old. The guy (or gal) in 191st place was 60.
I also want to take a moment to reiterate. Some woman in a silver cap swam over my body. I had gone into a backstroke because, basically, after 21 minutes in the pool at 490 yards (give or take) my body just couldn’t take it anymore. I needed to rest and another human being swam over me like I was …jeez…I don’t really know…you don’t really swim on top of anything the way she swam on top of me.
I went under. I literally thought that this was how I was going to die: 10 yards from shore, in a reservoir in northern Indiana, during a race I clearly had no business in.
I’m not sure if I fully appreciate the distance between “humbling” and “humiliating,” but I got closer to understanding that day.
Ladies and Gentlemen, dear readers, I need what Team-in-Training is offering. No joke. And Team-in-Training needs your support. How about following this here hyperlink and helping me raise money to end blood cancers (and also not ruin an otherwise beautiful bay with my drowned body in April).
So, a few days ago, I listed six elements of good science and then more recently I described why both Large-N and Time were important and how having small-Ns and short time horizons made making statements about causality more difficult. Today I want to discuss the role of randomness in setting up an experiment.
Large Sample Sizes
- Random Selection
- Random Assignment
Basically, the point of randomness is that there are a lot of things that we won’t be measuring in any given experiment and many times we won’t be wholly aware of what some of those things are. What?!?! Let me explain.
In a typical story on a scientific discovery, it is pretty common to read something like “The researchers controlled for sex, educational level, income, and race.” What this means is that scientists observed and noted these characteristics and then used a statistical procedure to systematically erase any effect from that factor from the analysis. One, not entirely accurate, way to think about it is like this: Let’s assume that what the scientist wants to know is the effect of a medicine on sex drive. The hypothesis is something like, “If we give subjects Libidinex, then they will report more interest in sex.” But let’s assume we know from a previous survey that both men and young people already report higher than average interest in sex than the “average person.” So the scientists would want to make sure there were men and women in experimental and the control groups and that there were young adults, adults, and older adults in both groups as well. Then they could make sure that the effect of the drug was not limited to simply those groups that were more prone to be interested in sex. Basically, the scientist can look at all the known groups separately to see if the drug has an effect above and beyond any effect of the known variables.
The problem of course is that we cannot measure everything. What if the drug only works if you take it precisely two hours after waking up, or what if it works best on people who live at higher altitudes. What if it’s best on humans who have a DNA combination that, to date, is completely unknown to have an effect on this kind of medicine? Scientists pick the usual suspects that we know tend to exert an influence on the type of intervention they want to learn about. But they can’t know or test all the other factors that might also affect the results.
That’s where randomness comes in.
There are two places randomness shows up in an experiment. First there is just picking who is going to be in the experiment altogether. This is random selection. And then there’s randomly assigning those folks into the various test groups: the control group and the experimental group(s).
The second, random assignment, is more common than the first. To do true random selection, you would literally have to have some method of choosing random people from a population and forcing them to participate. This does happen, but it’s pretty rare. Typically, some method of ensuring randomness is concocted (e.g., we will ask every 6th person that passes us by to take part in our survey) and then a response is determined ahead of time to handle what happens when a person so chosen refuses to take part. The fact is however, that you can never get rid of the fact that is will always be possible that the people who choose to take part systematically differ (somehow) from those that choose not to. This problem is compounded when recruitment consisted of an ad in a paper or from posters hung up around campus. In that case you have narrowed your participants to people who both (1) read that paper/live or work on campus and (2) agree to participate. This is even further confounded if your subjects are people with a specific health or behavioral issue in which case you even further limit the participants to (3) people who know they have the condition and (4) where severity may cause people act as an inducement then you may only test the worst cases (which can influence your findings by displaying better results than they would have on “the average” person with that condition.
Random assignment takes care of some of the effects of lack of random assignment. It can’t entirely erase the selection bias, but if we think of all the factors that might drive someone to participate, the least we can do is make sure those get spread around evenly into all the required test groups. And again, random assignment is the method of trying to equally distribute all those factors we don’t know to measure (as well as the ones we do know to measure).
Where random assignment is often lacking are in the clinical applications of medical interventions. To give you an example from my field:
- 95% of smokers that try to quit, will try to quit without any help at all. 5% of them will be successful.
- 5% of smokers that try to quit will use nicotine replacement therapy to help them (like the nicotine gum or patch). Of these about 25% will successfully quit.
Now the question before us is this: is there some systematic difference between those that choose to use NRT and those that do not. For example: NRT costs money and we know that there is a correlation between income levels and smoking rates. We also know that lower income folks have a harder time quitting. So perhaps the use of NRT is proxying for income to some degree and it makes it look like NRT is more effective that it would be if we just gave it (for free) to the 95% of people that would otherwise try to quit unaided. Maybe the use of medicine is selecting out certain at-risk minority groups that distrust Western medicine. Perhaps it’s selecting out older folks who have dental problems (the gum is hard to chew) but who may be more addicted to nicotine (because of how long they’ve been smoking). Maybe using NRT is something that more educated people know about–or maybe the instructions are hard to read (they are) and so less literate individuals shy away from it. Any of these factors are possible. And we can answer this question if we simply (1) randomly assign (2) smokers (3) who are seeking help (4) to one of two groups one (4a) where they get the NRT or another (4b) where they get a placebo. This random assignment can help us determine if NRT “works” on “everyone” or just those who are inclined to use this kind of help. [It does.]
The general idea behind randomness (and you can get your PhD in studying how randomness works, so consider this a really basic intro) …the general idea is that if you get two large enough groups constituted of people selected randomly, they will resemble each other in just about every factor you can think of–and the assumption then is that they also resemble each other in the all the way you can’t think of. So if an intervention has an effect on the experimental group, it can be assumed it would have a measurable effect on any group of random people. This is what we call generalizability…or external validity. The combination of these two common terms does a pretty good job of explaining what you can do if you have random selection and random assignment–and a pretty good idea of what you cannot do if you don’t: You can generalize the findings to groups of people outside the test group.
Basically most experiments are missing one or both of these features. So any conclusions from them should be limited to precisely the groups they studied, i.e., freshmen and sophomore, men and women, at university, in one of the Rust Belt states. If young, moderately educated, mostly white adolescents from middle class Midwestern families lose weight when they eat chocolate, it probably shouldn’t be interpreted as a rule for a 45 year old black man from northeast Mississippi to follow.
And a key point is that this problem does not go away with large-Ns. It’s often the case that really large Ns happened because of very successful efforts to achieve randomness, but this is not logically necessary. If a random survey was conducted among participants in the Million Man March the N could be very large, and they could be “random” in the sense that each individual was chosen from a very specific algorithm designed to achieve random participation. But in that case, the locale itself has already done some selection for you. You would not necessarily want to use that survey to predict anything about the general population. Similarly a very large survey of the “Residents of Minneapolis/St. Paul” will only reliably be able to tell you what the residents of Twin Cities feel on the topics questioned. Residency in a town has already done some of the selection for you and N doesn’t matter even if you achieved a 100% response rate; even if you asked the same questions for 30 years.
Yesterday, I listed 6 criteria that make good science:
- Large Sample Sizes
- Random Selection
- Random Assignment
Beginning today, I want to run through them briefly to explain why each is important, beginning with large sample sizes and time. I’m moving time up because it’s easy to discuss, practically self-explanatory. I’m also moving it up because Random Selection and Random Assignment should be discussed together as should Placebo and Blindedness.
Sidenote: I am skipping a discussion on the notion of “correlation does not equal causality.” This is extremely important to know, but in my opinion, follows the discussion of these six criteria.
The sample size should be large. In scientific studies the sample size is referred to as “n.” So if there were 60 people in the study we would say that “n=60.” Studies with lots of observations are called “large-n studies” and those with few observations are called “small-n studies.” I only bring that up because at one point I will exhort you to “get the article” and you will see “n” show up in the abstract.
After I wrote the bit about sample sizes yesterday I had to be careful to say “less than 20 is almost always problematic.” When the effect size is large, you can get away with smaller sample sizes. But sometimes authors don’t report effect sizes, and they often don’t provide enough detail to calculate them on your own. More importantly, who has the time? As a general rule of thumb, most effect sizes are pretty small and require a lot of observations to make any firm pronouncements on the association. So when you see a small sample size (and less than 20 is a good rule of thumb here) it is almost always grounds to say that causality could not be determined from that study.
It doesn’t rule it out. Theoretically, you could have an experimental group of 1 and a control group of 1 and observe a difference that is related to your intervention. The problem is being sure. Which leads me to the other main reason you want to see studies with larger ns. With smaller sample sizes, there’s a greater chance that what is observed is related to some unobserved variable and not the intervention itself. The idea behind large sample sizes is that the scientist is able to eliminate (or diminish) the role of chance. If I am the experimental subject in a group of one and my friend Joe is the control group and you give him a placebo and you give me a pill that’s supposed to ruin my ability to concentrate and then I spend the night drinking–well–I may have trouble focusing the next day; and the drug may be the cause, but how would we know? The more people in the group, the less it matters if some of them don’t follow the directions exactly or have some genetic condition that effects the results. And if both groups are randomly selected the more likely it is that such differences will show up in both groups and cancel each other out.
I cut nearly 500 words on time even after saying that it’s “practically self-explanatory.” Here’s the briefer summary: We have to remember that time is always a confounding variable. If that makes intuitive sense to you, then you can stop reading. If it doesn’t, and it didn’t to me at first, then read on.
Sick people, in time, frequently get better. Systems, even in absence of an intervention, change. People mature and change their minds about stuff…frequently in ways that we would call “better” in regard to their ability and willingness to make “good” decisions, for example. Elements decay. That’s one reason time is important.
We also need to make follow-ups on our interventions to determine if they have any real, lasting effect. Many interventions will demonstrate short term benefits, but no lasting effect. Or worse, many interventions with short-term benefits have unexpected, long-term harms. We need to know this before we start touting the next “miracle cure.” More than likely, we just don’t know because long-term follow-ups is expensive and logistically difficult.
Finally, and somewhat related to the first point, timing matters. In public health, when an intervention is applied just before a group of people would naturally start getting better, it’s been called “riding the epi curve.” It looks like the intervention made everybody better, but in reality people were getting better anyway. Think of a massive surge in applications of the flu vaccine in February just as flu cases hit their maximum. There was nowhere to go but down. Separating the effect of the flu vaccine from the natural decline in cases is hard to do.
Time is a little strange as one of the six criteria. The other five are classic considerations of experimental design. Time is a constraint often forced on the experimenter from the outside, e.g, “The grant will fund us for two years; what can we do in two years?” or “We want to know how many freshmen given this intervention will go on to college; so the experiment runs for six weeks on year with follow-up 3-4 years later.” But as a reader who wants to know how important an intervention is likely to be, the question of time is just as important as the others.
(whew! Shaved about 200 words out of there.)
Tomorrow we move on to the role of randomness in experimental design.
The goal of science is to try and prove which things cause other things to happen. I wouldn’t use that as an answer for a short essay question, but it will suffice for the duration of this post. Now, I’ve blogged before that I don’t think the randomized controlled trial should be held up as “the gold standard” mostly because an RCT is not always feasible. And, it also creates an impression that any RCT should be treated as “good science,” when, in fact most RCTs fail to be meet the gold standard in design, even more fail during execution, and more still fail during analysis. You don’t get a free pass just because you designed an RCT.
That said, at least a cursory understanding of what an RCT is and why it is largely considered to be “the gold standard” of the scientific enterprise, is a great place to begin developing critical reading skills in regard to understanding published science.
Although it chronologically comes first, a good understanding of the what a hypothesis is, how one is created, and its role in the study design is an important part of scientific literacy, I’m going to skip it for now, because the the biggest bang for your buck comes in understanding the characteristics of an experiment after the hypothesis and the measure of its effects have been decided.
The gold standard experiment will have the following characteristics.
- Large Sample Sizes: There are formulas for experimenters to use to help them determine how big a sample needs to be to be able to reasonably capture the effects of an intervention. We won’t go into those. It will suffice to say only that bigger is better and less than 20 is almost always problematic.
- Random Selection: The subjects of the experiment will have been randomly selected from the widest possible version of the population the scientist wants to say something about.
- Random Assignment: Once a random group of people have been chosen to participate in the experiment, then they must be randomly assigned to receive the intervention or something else (or in some cases, varying doses of the intervention).
- Placebo: The “control group,” the group that does not receive the intervention, could receive nothing at all, but it’s far better to receive a fake version of the intervention.
- Blindedness: At the very least, the subjects should not know if they are receiving the placebo or the real thing. But since we’re talking about ideal experiments, the person that engages with the subjects should also not be privy to this information. If neither the subject nor the experimenter know, this is double blind. And double blind is where the real action is at.
- Time: The experiment should continue taking measurements for as long as possible even after the experiment has ended.
Most studies will fail to achieve at least one of these six criteria. It is almost impossible, for example, to use random selection, especially in nutrition or pharmaceutical tests. And most experiments will have woefully short time horizons. Some experiments are difficult to design a placebo for. For example, if the hypothesis is “Students who are smacked hard across the face just prior to an exam will perform less well than students who are not slapped.” There is no placebo slap. In cases where placebos are hard to develop, blindedness also frequently suffers. An experimenter likely knows whom she has slapped. And in many medical contexts, random assignment is often unachievable. As we move deeper into “clinic-based evidence” I see more and more interventions that lack several of these features.
Tomorrow, I will discuss in slightly more detail why failing to implement each of these, regardless of the reason, limits the conclusions that can be derived from an experiment. For now, simply be on the lookout for these characteristics and if you see them missing know that causality remains a question and any expression of certainty should be scaled back.
The most famous tool at this point for helping lay readers understand the science they’re reading and evaluate whether it’s good or not is this one. It starts with this advice: Be wary of any article with a “sensationalized headline.” This is a fairly common warning. This article in Wired advises readers to “Ignore headline grabbers” (#3). The tenth piece of advice in this article says to be aware of “Sensation!” While not strictly about headlines, this article on a fitness website, tells its readers to be aware of any science reporting that seems to be giving its readers “a magic bullet.” And this one tells science journalists to focus on evidence not on “rhetoric or narrative.”
This advice is bad advice for three reasons. It’s not “bad advice” in the sense that following it will end badly. I’m saying it’s bad advice in the sense that you can’t actually follow it. It’s circular logic; it begs the question. It attempts to cure a basic human survival mechanism. And it doesn’t constitute actionable advice–it doesn’t provide the reader anything to do.
Imagine this: Imagine a headline that screams,
Extract from newly discovered plant shown to eliminate 95% of malignant tumors!
That’s pretty sensational. But what if it were also true? I read a book about knights once when I was in elementary school and it described the knight’s code. A knight was not permitted to brag, said this book, but a knight could truthfully describe any of his own exploits, provided they were true. It ain’t braggin’ if it’s true. In other words a headline is only “sensational” if we already know that what it’s claiming isn’t quite as…well…sensational…as the headline has led us to believe.
In addition to question begging, this particular advice does something that I think deserves being called out. It places too much emphasis on “the system.” The fact is that we’re humans. We’re highly evolved biological machines that require energy to do things. It is in our basic construction that we seek to make things easier–to require less energy. Scientists are looking for magic bullets, press releases tout them, journalists regurgitate them because that’s what you’re interested in. Even those of who have trained ourselves to look sideways at “magic bullet”-style claims, those of us that are immediately suspicious of “too good to be true” deals are still nevertheless intrigued by them. Sometimes we just want to be the one to find the hole that exposes the fraud, but there’s always a part of us that hopes that we can’t find it, that this one is the real deal.
Finally, this piece of advice is non-actionable. Let us assume you have read the above headline and determined that it is sensationalized and should be considered suspect. In what way is it suspect? Is the plant not newly discovered? Does it decrease the size of 95% of tumors but not actually eliminate them? Is it 85% and not 95%? Is it 95% of all tumors which includes benign tumors as well and less than 95% malignant ones? Do the tumors have to be caught at a preternaturally early stage well before most screening tests would identify the tumors? Is the extract dangerous to the point of toxicity meaning it is useless as medicine? The questions go on and on. The fact that you should be wary offers no advice on how you can still read the article and get some use out of it. And, in fact, the advice “Be wary of non-sensationalized headlines” is just as true and just as useless.
From here on I will offer actionable advice on how not to be duped by bad science and/or bad science reporting.
I read science all day most days. I read scientific studies across an array of fields. For my day job I read articles from psychology, biology, chemistry, genetics, public health, medicine, pharmaceutical research, and public policy. My educational background is almost exclusively from the “soft sciences”: political science, sociology, criminal justice, international relations, and economics (mostly in the sense of “political economics”). And among my hobbies is studying beer, where I tend to gravitate toward the psychological aspects of tasting and marketing, evolutionary studies of yeast strains, basic beer chemistry, and market analyses. I have read a lot of different types of science and I’ve become literate in various scientific idioms, including the methods used and what the limitations are of those methods.
That’s all prelude to this:
We live in a transforming world. Science is gaining a level of popular attention that in many ways rivals what was happening during the Scientific Revolution. Of course a lot has changed in the last 400 or so years. There’s a lot more science now, a lot of new methods, and full understanding of all that’s happening is more impossible now than ever. We require good curators, editors, and popular writers to tell us what is important and why. At the same time we need them to place the newly gained knowledge into the context of what has happened before and what the likely implications of that knowledge are. But this is extremely difficult to do! In addition to the intellectual strains it places on our science journalists, science stories don’t always cater to the needs (or desires) of profit-seeking newspapers and websites or enrollment seeking universities (which write the press releases that science journalists use as a resource). And, the limited impact of many new discoveries don’t really serve the interest of grant- and tenure-seeking scientists.
Basically, reading science is a difficult but not impossible skill to learn. But because it is difficult–both knowing how to read it, and also knowing what to read in the first place–many people will elect to get their science from newspapers, evening news, and blogs like IFLScience which pick which studies to highlight and also summarize them in a language that’s easy and fun to read. Because not all scientists are equally good (or honest) at presenting their findings and because press release writers are incentivized to sensationalize their organization’s contribution to the literature, and because news outlets are incentivize to get readers’ attention, reading science news is also requires a specific skill. However, many people are not aware this skill is required and thus have never developed it.
Thankfully, there are tools out there to help concerned readers develop the skills they need to get the most from science reporting.
Unfortunately, they are all wrong. Over the next few days (weeks?) I’m going to walk you through the real knowledge and skills you need to be a truly scientifically literate human being in the 21st century.
I have been following the science of determining (or disproving) the “left wing bias” of the media for a long time. I have seen lots and lots articles on the subject with many many charts and graphs in them. And they all basically look like the one here [PDF]. I’m linking to this one because it’s new and…if you’re following this whole “Lying Liar Political Scientist Published an Article with Lies” story, it has the benefit of being relevant to that discussion as well.
If you are following the whole LaCour fiaso read the whole thing. Basically, LaCour probably faked data on a 2nd article as well (and this essay is written by the guy that LaCour jocked the data from). If you’re only interested in the “media’s left wing bias” angle, then direct your gaze to page 5. What that chart is telling you is that, while there does seem to be an apparent “leftish slant” this is far from certain and could actually be flat wrong.
- Most of the studied channels appear to be inbetween the moderate left hump and non-partisan 0.
- There are two channels to the left of moderate left and no channels to the right of moderate right.
However, and this is critical, with the exception of Lou Dobbs’s show, all the confidence intervals (Bayesian credible intervals) overlap 0…which means they aren’t statistically different from “centrist”…and some of them could be “conservative.” Those CIs are huuuuuugggeeee. They basically run the entire gamut of the base dataset. I’ll also mention that most of the CIs (all but three) run most of their difference to the right. I think that’s difficult to interpret, but the consistency of this right skew across channels implies meaningfulness.
Some of this is a limitation to the types of methodologies available to this kind of research. But some of this is likely due to the fact that…well….most channels aren’t interested in 100% alienating 50% of their potential audience and thus…at the very least…attempt to appear neutral if not being, in fact, neutral. In the words of the author:
In the replication, only one show, Lou Dobbs Tonight, has a credible interval that does not overlap zero. Applying LaCour’s criterion, Lou Dobbs Tonight would be classified as “conservative news,” while all the remaining shows would be classified as “centrist news.
One of the stories that got a lot of media time during the fight to make recreational marijuana a legal reality was the one of Haleigh–a young girl who, prior to adopting a CBD regimen to treat her seizures, was having 200 episodes a day and now is closer to 10. That is a huge success. We read a lot out here about all the families that are moving to Colorado to obtain marijuana for their children’s seizure treatment. I would say that, aside from treating chronic pain…especially chronic pain in association with chemotherapy…marijuana’s miraculous power in treating seizures in children is the driving narrative of why marijuana should be reschedule, why every state without a medical marijuana law should get one, and why recreational marijuana should be far more widespread.
However, that same study found that 44% of children taking CBD were suffering negative health outcomes from the treatment including (and this is important) increased seizures.
In a more objective measure of seizure-related brain health, only 3 of the children in the study showed any actual improvement (which indicates that some portion of those 33% of children who saw their seizure decrease by half, may have have been over-reporting the benefit, or may be operating under some placebo effect).
In any case, those last two points are crucial and they will no doubt be completely, and tragically, overlooked.
Medicine is weird; it works for whom it works. The way we determine if a medicine works is to get two groups. Individuals are randomly selected into one of the two groups. One group receives the treatment, the other receives a placebo. If possible, even the administrators don’t know which individuals are receiving which medication and which the placebo. Then we look for improvements in both groups. For various reasons, some people in both groups will get worse and some people will get better and some people may see no change in their condition at all. Ideally we will see more people get better in the medication group and less people getting worse. In either case, the differences observed within each group will be compared to the differences observed in the other group to determine if those change are “statistically significant.” That’s a real rough description of the “double blind, randomized, placebo-controlled, trial.” Basically if the improvements in the medication group are better than the natural improvements seen in the control group, researchers conclude that the medicine “worked.”
But something peculiar happened up there. Something that researchers know about but doesn’t always get translated out. The medicine didn’t seem to have any effect at all on some people. And worse, some got worse.
Some of those differences might be chalked up to an inability to precisely measure changes in the condition. We tend to think of illness as a thing you either have or you don’t, rather than thinking of it in terms of how much of it you have. So it might be that very minor improvements in condition were beneath a threshold where those improvements were observed and reported to researchers. It’s also possible that declining condition in the medicine group would have been more pronounced had the medicine not been present.
But it’s also entirely possible that the medicine helped some people in the medicine group and harmed other people in the medicine group. Humans are all different. Diseases manifest differently in different people, medicines react differently to different people. So when we claim that a medicine “worked” it’s not always entirely clear what is meant. So the medicine works for whom it works and not on anyone else. Ideally, we would give medicine that works only to people that it works on…and maybe even prescribe medicine that “doesn’t work” to the people it works on as well even if it doesn’t “work” any better than chance at the group level.
I’m not saying that the story linked here proves that marijuana is a crappy medicine for children with seizures. It’s entirely possible that marijuana is very good for some children, and maybe not quite as good and possibly harmful for others. The problem is that the rhetoric of these “miraculous” cures is problematic. Parents desperate to find any cure for their children may overlook problems with marijuana in their child, they may see signs of improvements that don’t exist. And worse, they may forego other, effective, treatments while hoping that the marijuana miracle works out.
I think this is an area where we definitely need more research. And marijuana is cheap and, compared to many traditional treatments, safe. So I’m glad that it’s available for parents to try. But I also wish that the conversation we were having didn’t involve the word “miracle” quite so often. And that the potential for no- or bad outcomes was more appreciated.