Practical Stats: 2013

Sunday, September 15, 2013

Under what conditions can you claim that you have found a cause and effect relationship between variables?

The answer to this question involves two issues: (a) the level of the study, and (b) the "causality requirements," as I'll call them. My source is Oleckno, William A., Essential Epidemiology: Principles and Applications (2002), Long Grove, Illinois, Waveland Press, Inc. Although Oleckno wrote about epidemiology, the same principles hold for any statistical analysis related to a phenomenon among people. In other words, epidemiology can be understood quite broadly.

A. The Level of the Study

Here is a ranked list of the major types of studies in descending order of likelihood that the results might demonstrate a causal association:

Randomized Controlled Trial
Randomized Community Trial
Prospective Cohort Study
Retrospective Cohort Study
Case-Control Study
Cross-sectional Study
Ecological Study
Descriptive Study

The higher up your study type is on the list, the more likely it is that the relationship you have found will later prove to be a cause-effect relationship: later meaning after you have met the causality requirements described below.

B. Causality Requirements: Criteria for Assessing Causation

All of the following criteria must be met before you can write about your variables have a causal relationship:

Correct temporal sequence (exposure to the independent variable must precede incidence of or change to the dependent variable)
Strength of association (high relative risk or rate ratio, high r or tau).
Consistency of association (requires followup studies or studies performed by other researchers that show the same or similar results)
Dose-response relationship (more exposure to the independent variable leads to greater incidence, relative risk, or odds ratio for the dependent variable)
Biological plausibility (Does the association make sense? Biological includes psychological)
Experimental evidence (For instance, if experiments have shown that microwaves affect living tissue, then you have a stronger chance of proving that microwaves caused some sort of physical outcome among your study participants should you find one)

Consider both A and B when you begin to write that people should change their behavior based on your study results. You can still make recommendations, but it will be important that you note that you have not proven a cause-effect relationship unless your study (a) is a high-level test and (b) met all of the above causality requirements.

Tuesday, June 25, 2013

The Disastrous Effects of Forgetting that Correlation is not Causation

At Anovisions, we look at relationships in data all the time. One constant concern is how we describe those relationships. The fact that a variable "predicts" an "outcome" variable in a regression reflects only how we set up the regression. We could swap the predictor and the outcome variable in the equation and view the same relationship from the other end, only this time the original "outcome" variable would be the "predictor." The two terms are mathematical in nature and they do not reflect cause and effect.

Cause and effect is extremely hard to prove. All we can do is identify that relationships exist, at least initially. A long way down the road from an initial study, after scientific findings from all sorts of other experiments have validated our efforts, after scientific theory has provided a reasonable model for understanding the relationships in question, after the same findings have been repeated in a large variety of populations—only then can we say, "This causes that." But the smartest among us will continue to keep the question open, assuming nothing.

For more information about how we can help interpret your results, visit us at our website, www.anovisions.com.

Tuesday, June 18, 2013

Ode to Those Who Think Sideways

For Dr. William Jefferys

Shall I introduce you to Bayes?

His inferences always amaze.

It's not what you think—

without a stiff drink—

But it will end your frequentist phase.

No, really, it is what you think

—unless to you orange is pink.

For without a good prior

You'll never get hired

And maybe should find a good shrink.

If Bayesian stats make you cuss,

Or your Bayesian book you did toss,

Please don't give up hope:

If you really can't cope,

Just try this new book by Laplace.

Monday, June 17, 2013

I got data! . . . Now what?

So you're a small business, maybe with an online store. You made a beautiful website with Google sites or another hosting service like GoDaddy and tied it to Google Analytics (or Omniture, WebTrends, CoreMetrics, or VisualSciences), because that's what everybody does. Now you have all kinds of charts and graphs and maps and even this thing called a "site overlay." These charts tell you about the behavior and location of the people who come to your website. The charts and numbers might even reveal seasonal trends in certain activities or products on your site—except you have no clue how to interpret the data.

Lets say you are getting thousands of hits on your website but you don't know what that means for sales. Is there a correlation? Is the web traffic causing the sale or is the sale causing the web traffic? If you were to contract us at Anovisions we might look at you data and run a Pearson's (or Spearman's) correlation analysis followed by a regression analysis that would tell us that web traffic is a driver of sales and it makes sense to invest more in web advertising. Or not. This inexpensive service pays for itself very quickly as you implement the step-by-step plan we provide for you.

Check out www.anovisions.com or email sheila@anovisions.com to learn more.

Sunday, June 2, 2013

Lies, damn lies, and statistics

How much can you trust the studies that you hear about in the news every day? If somebody says, "A new study shows that weight gain causes earlier deaths," should you believe it?

Here are a few rules of thumb to follow when deciding what to believe:

If the word "cause" is anywhere in the sentence, delete it from your memory. It is very hard to prove cause and effect. As a matter of fact, weight gain does not "cause" earlier death. The study that hit the presses years ago and made us all afraid has been debunked because of some interesting choices in the study design. I won't go into details: This is Practical Stats, after all.
If the reporter (or whoever) uses the words "associated" or "correlated," and then goes on to conclude that you should behave differently, scoff and ignore the story. What many people who repeat study findings simply don't know (or acknowledge) is that weight gain and early death, for instance, might both be caused by a third phenomenon, such as exposure to cheap unhealthy food. It may not be the weight gain that is "causing" early death, but rather the chemical processes set off by certain types of diets. That chemical process may (may) cause both weight gain and early death. So don't rush out to go on a diet if you gain a few pounds before you talk to your doctor about it.
Remember that all studies are biased by the cultures in which they are produced. For example, Americans are committed as a nation to early independence for children and to a certain style of education. So when you hear someone say that "enrolling children in day care centers early is linked to better adaptation to school" you might conclude that you need to enroll your children right away in a day care center so they will behave well in school. But let's think this through. First, that little phrase "is linked to" implies that the one (day care) causes the other (better behavior in school). And second, is better behavior in school the end game for your children? Behind the entire "finding" is the hidden implication that behaving well in school is the best possible thing, that behaving well in school predicts that your children will lead happy lives (or do we prefer "successful lives"?). Note that school is nothing like life, where you are exposed to people of various ages and must get along with them every day, where you have the choice of working in a lab or at home with your computer, where you can make money leading mountain climbing expeditions or fixing what's broke, where fewer and fewer of us spend our days in the factories after which the American school system was modeled, and (this is the important bit) where your skills at being in a family matter more than anything else. Ask the question, "Has anybody studied the link between early time at home and happiness in family life in adulthood?" Look for the cultural bias when you hear about a study. Ask yourself what your goals are and then figure out whether the hidden cultural bias in a finding applies to you. Then follow your heart, not the numbers, and do the right thing for yourself and those you love.

Wednesday, May 29, 2013

APA Heading 3 Doesn't Hang Out by Itself

A lot of people do this:

Here is my heading 3 just like APA tells me to have it (Really?).

And here is the beginning of the next paragraph, and I go on thinking that I've correctly used APA's 3rd level heading—but I haven't.

You should do this:

Here is my heading 3 just like APA tells me to have it. And here is the beginning of my paragraph, and I go on feeling rather smug because this is the right way and I know it.

Here is an illustration straight from the sample paper that the APA editors have kindly provided for us.

Useful Sources about First Person and APA 6th (Or: You and I are not the only ones)

I'm out to change the world about this subject because I'm ridiculously committed to accuracy and clarity in writing--not to mention I don't like being put to sleep by an over-reliance on the passive voice. So if you still have doubts, check out these links:

More from Anovisions on the Subject

More from an APA Editor on the Subject

The Other Post on the Same Subject

Monday, May 27, 2013

The Slaughter of Good Ideas (Warning! A bit crotchety)

I am amazed and saddened by the number of studies out there based on analysis that doesn't fit the data. It's like watching the slaughter of an army of good ideas, a slaughter that could have been avoided if only the messenger hadn't taken a wrong turn.

Every single statistical test is based on assumptions. This means that, when you write your proposal, you need one simple little clause and an extra (rather wordy) sentence for each analysis you plan to do. Instead of writing

"I will analyze the data using a linear regression analysis."

you should write

"I will analyze the data using a linear regression analysis if the data meet the assumptions for that test. If they do not meet the assumptions, then I will either transform the data so that they meet the assumptions or perform an equivalent non-parametric test, a test that is robust to unequal variances, or a test that otherwise fits the data."

"Non-parametric" is short for "test that works even on data that are not normal." And yes, you can quote me.

If you don't have these bits of writing in your proposal and your data doesn't meet the assumptions for your chosen analysis, then you are in trouble. Some committees actually require students to run a test that shouldn't be run simply because they said they would, resulting in the student not being able to answer the research questions. I have seen this happen. You may not be allowed to use the appropriate non-parametric equivalent of your chosen analysis. That means that you won't be able to find an effect that may actually be there. It also means that somewhere down the line, like when the dean's office reviews the dissertation, or when you want to become a famous researcher, someone may take a look at your dissertation and find out that the data didn't meet the requirements for the analysis you did—and disrespect you.

So safeguard yourself. If you care about your future career, if you want to play with the big, famous, knowledgeable, ethical researchers out there in the real world, talk about assumptions in your proposal and provide a backup plan in case they are not met.

Later, I'll provide a list of tests and the assumptions upon which they rest. You can also find such lists if you search the internet for them.

First Person in APA Style: Can we use it? Yes I can!

Most people think that using the first person is anathema in academic writing and substitute awkward anthropomorphisms ("the study seeks to show..."), passively constructed sentences ("the study was designed to..."), or, worst of all, deceptive sleights of word reminiscent of the Wizard of Oz ("The researcher designed the study to...").

The APA 6th edition style guide's editors are practical people who eschew anthropomorphisms, unnecessarily passive wording, and writers who pretend they aren't there. Rather than commit any of these sins of style, just come out with it and say, "I designed the study to accomplish..." or "I selected a sample consisting of..." What you don't want to do is make the study all about you. Don't say, "Next I discovered that the t test was the best approach, and when I saw the results, I decided to ..." This isn't about you. It's about the research—the research that you did.

So be brave, take a deep breath, and tell it like it is.

Wednesday, May 15, 2013

Assumptions make an ??? of you and me—or not

Each statistical test rests on assumptions. For example, a t test rests on normality in the underlying population, similar variance in both groups, and independence of both groups.

What?

Okay, here are the steps to take to make sure your t test is valid.

Generate a histogram for the dependent variable (height, weight, or whatever). Does it look relatively normal? Good. Does it look like it comes from a normally distributed underlying population, or do you have prior knowledge or proof in the literature that it does? Also good. Does it look flat or in some other way non-normal? You need to look at another form of the t test that is "robust" to non-normality, or in other words, accurate even though the data are not normal. I'll write about tests of this kind later.
Provided your histogram was normal, go ahead and run the t test. Among the output tables is one that has a few columns for "Levene's Test." Check the significance of the Levene's test. If p > .10 (Do notice the direction of the sign. This is one of the few tests where having p < .05 is bad. You want a p that is GREATER THAN .05), then you have equal enough variances among the two groups. Either way, SPSS helpfully provides you with the results for two types of t test: one that works if the variances are not equal, and one that works only if the variances are equal. Pick the most appropriate one, and there is the result for your t test. I could go on and on about these two different types of t test, but that wouldn't be practical stats.
This assumption should probably be #1, but like most people, I don't think about it much. In fact, if you don't have independent groups, your t test might not even work. It's very important. Take the situation where a single teacher is answering survey questions about classrooms with only boys and classrooms with boys and girls. For each question, they slide a bar to a point between "total chaos" and "orderly studious behavior." The two bars are stacked on top of each other, the one for boys-only classrooms first and the one for mixed-gender classrooms second. Can you do a t test comparing mean total answer values for the boys-only classrooms and mean total answer values for the mixed-gender classrooms? Are the two groups (boys vs. boys and girls) independent of each other in this methodology? No. Not at all. They are linked by the fact that the same teacher is answering both questions, and further linked by being presented in a format that encourages comparison between them. Fortunately, you have an option: the paired (or dependent) t test, comparing each teacher's answer for boys with the same teacher's answer for boys and girls. That's what I would use in most situations where the two groups were not independent of each other.

A trick: if you don't have indepence among three or more groups and you had really, really wanted to do an ANOVA, you're out of luck. You have to do several paired t tests, comparing each group with each group.

So when you write your proposal, do add a clause somewhere to explain that if the assumptions for your chosen method are not met, you will substitute an appropriate equivalent analysis. You will save yourself from numerous headaches.

Please check out our website at www.anovisions.com for more help.

Monday, May 13, 2013

How many groups? t test vs. ANOVA

The rule is simple, on its face: If you have 3 or more groups that you want to compare for a certain outcome (like height, weight, body mass index, score, depth, or other such continuous or scalar variable), you use the ANOVA. If you have only two groups, or a bunch of groups but you want to compare them by twos, then use a t test.

That simple rule of thumb can save you a lot of time. The information in the next entry will help you hone this idea.