Monday, June 25, 2018

But Wait a Sec . . . Is that really the Best Way?

 A Less-than-Ideal Example

After I wrote the previous blog post, I had a good think about a couple of things. One was that my situation with Anovisions provides, occasionally, less-than-ideal entry points into studies.I often enter a study long after many of the critical first decisions about data gathering, data type, and even research questions have been made by the people who have decided to hire a statistician. My methods for getting my head around a study, which I have started to describe in the previous blog post, are therefore rather ad hoc and perhaps not the best methods for researchers in environments where they have more control over the methods.

At this point in the conversation, therefore, I'm going to go in a couple of different directions. Let's have a look at a process that is more or less ideal for a data explorer, and after that, let's go back to how to manage when you are required to be the statistics expert entering in the middle of an exploration that has already been begun by people who are savvy in their businesses but not necessarily trained as data scientists or statisticians.

An Ideal Example

In this example, you have just been handed data and you have been asked to see what you can learn from it. It's a data scientist's dream: you can follow a process that allows you to find whatever the data has to offer.

Choose Your Tools

Roger Peng, PhD of Johns Hopkins is an articulate
proponent of the R Programming Language. 
We have our pick of tools, so let's use R to do the analysis. It's powerful, flexible, free, open-source data analysis software and I love it. So does Roger Peng, PhD of Johns Hopkins University, and he says why in this video that introduces a really fun course on R programming that you might want to take some day. I did and it was excellent.

Installing R and RStudio

To learn how to install R and RStudio on your own computer so you can follow along with the data analysis step by step, please see this video.
To learn how to install R and RStudio on a Mac, see this video. Dr. Peng talks fast, so you may need to pause frequently as you follow along on your computer.

We will continue in this vein for our next few posts.

This time I'll let somebody else do the talking.

Every now and again you come across an answer to a question that is just so clear that you want to bookmark it so that you can find it again if something like the original question you were googling comes up again. That just happened to me. I was googling "How are linear regression models, ANOVAs, and ANCOVAs related?" because it is a long time since I have had to do those equations by hand and all I remembered was that I learned them together because they relied on the same basic formulae. What those formulae were, I had completely forgotten.

Thanks, SAS, SPSS, Python, and R, for doing all the work for me and letting my brain atrophy.

Here is an answer to my question that is so good that I'm giving it space on my blog. Thanks, Karen Grace-Martin, for tapping this out so the rest of us could have the benefit of your explanation.

Sunday, April 1, 2018

Designing a Study: First Steps with our Horse and Cart

The following is a completely fake study that I've made up out of thin air.

Let's say we have 38 teachers and 400 adult students in a military school that teaches survival skills in hot combat situations. We want the students to learn everything they can in a very short amount of time (training soldiers is expensive), and we have begun to wonder whether some of the school's pedagogical methods are detracting from or adding to how much and how quickly the trainees learn.

Here are some of our questions:
  • Do students learn better in mixed experience-level classes than they do in classes where all the students are at the same level?
  • Do students learn better in classes with certain types of activities?
  • Do teachers who have seen the video How to Avoid Death by Powerpoint tend to produce better-performing students?
  • At what time of day do students learn best?
How exactly do you go about answering these questions? The world is at your fingertips. You can create surveys for teachers and surveys for students; you can access any data source collected by the institution, including grades, instructor comments about students, student evaluations of instructors, and (I am making this up) injury and death rates for previous students.

This may seem like a morass of information to slog through for quite a complicated study, and it is. I have come across numerous data sets, sets of questions, and study designs in my work for Anovisions. Over the years, I have developed a method for getting my head around a study like this, and you might find it useful as you take your study from research question to final design. 

Here are the first steps:
  1. Find out what the entity paying for the study has as constraints in terms of time and money. What are the deadlines? How much of your analysis time can they afford? In this case, we have three months and a budget that limits us from doing any qualitative (read: "expensive and time consuming") analysis.
  2. Define the main research question(s). In this case, the main research question is How can we change our pedagogical methods to produce better outcomes for our students? All possible answers should be explored, including the answer, "None of your considered changes will help much." 
  3. Find out about any variables your client may be considering. In this case, our client is not thinking, "Hm, I wonder about this variable or that variable?" Clients rarely think like that. But they often have excellent questions that you can convert into variables. In this case, we have a few variables that come to mind:
    • Grades (we'll call this a scalar variable for now)
    • Survival time in the field (scalar)
    • Number of injuries in training (scalar)
    • Number of injuries in the field (scalar)
    • Student experience level (scalar or categorical, to be decided once our plan is further developed)
    • Instructor status regarding How to Avoid Death by Powerpoint (binary, Y/N)
    • Types of class activities (categorical)
    • Instructor evaluations (we'll need to find a way to take lots of string data and compress it to a numerical value, so let's call this scalar for now)
    • Time of day for class
    • . . . 
"But wait," you say. "What kind of study are you doing? A randomized trial? A case study? Survival analysis, even? No, a cohort study, or--yes!  It's a cross-sectional study, right?"

"Ah," say I. "You have recently been to graduate school."

Actually, no, I don't say that, at least not out loud. I say, "Don't put the cart in front of the horse," which adage, though frequently used and perhaps stale, is difficult to replace with a more modern pithy saying while retaining the same wisdom. What I mean is, if you try to define the study type before you thoroughly examine all the client's questions and available data, you will never be able to deliver what the client needs (in the cart, ha ha). 

At this point in your process, it is essential that you engage in one of the most overlooked and important steps in any study design.

You walk away, take a break, have a coffee, call someone you love, and otherwise stop thinking about it for a few minutes or even overnight (the process of sleep does wonders for analytical tasks). When you come back, you will perform better for having taken your break.

So now I'll take mine, and we'll continue this subject in the next blog post. Don't forget to sign up for notifications so you don't miss it. 

Friday, March 9, 2018

Cool Transcription Tool that's Free AND Reliable!

I don't usually do free ads for folks, but I just have to mention oTranscribe, which I've been using for the past couple of weeks to do a transcription for a qualitative analysis. It's free to use oTranscribe on a small scale (I'm doing a single hour and 45 minute interview).

oTranscribe is one of those tools that seems to have read your mind about what you need. They save your work for you so that all you have to do is go to "" and there is your project, right where you left off. If you're paranoid about saving things, like I am, they have an export function. I have multiple copies of the transcription that I've saved all along in my own folder, but that's just doubling up, because oTranscribe has also saved each of my copies in its "History" function. They do not store your audio recording, so you have to upload that each time, but I don't want them storing my audios anyway. oTranscribe also has keys for starting and stopping (the same key, a toggle, which goes back one second each time you pause), for slowing the playback, for speeding it up, for dropping in a time stamp, and many other nice features that you may or may not need.

The feature I like best is that I can re-map my function keys. Instead of using F1 to slow down playback, I use down arrow. Instead of F2 for "go back a few seconds," I use left arrow. It's incredibly easy to use.

I highly recommend this product. You can find it at

Friday, October 10, 2014

How to Get the Most from Your Statistics Professional

Here are a few tips for making the most of your hard-earned (or painfully borrowed) money when you hire a statistics or editing professional to help you with your project:

  • Communicate via brief emails when you can, phone if you must. Almost without exception, phone calls take more time than is necessary to answer brief questions. The emails are also a great reference when you need to prepare for a presentation or defense.
  • Once you have turned over your project for editing or statistics recommendations, stop working on it. Controlling document versions is a great way to make sure your professional doesn't have to backtrack or spend precious time integrating your work with theirs.
  • Answer questions quickly so the project doesn't lose momentum.
  • Make sure due dates for specific tasks are clearly delineated. If you are using Basecamp or something like it, put your deadlines on the project's calendar yourself and invite your consultant so they can see your expectations. 

Friday, January 3, 2014

Common Errors People Make in their Research: Power

About 80% of what crosses my desk for editing contains either no power analysis or the wrong type of power analysis.

What's power? Put simply, it's your odds of finding something that's actually there.

Why should you publish it? Because if you have found nothing in your analysis (p is too high for you to claim that you have a statistically significant finding), then you can evaluate the power and either

  • If power was low, explain that, due to a low N, you may have missed something that is actually there
  • If you had low power in your research, use this point in the discussion section to call for more research into your areaespecially if p was less than, say, .25.
  • If power was high, fortify your claim to have found something important.
Does power relate to the entire study or to a particular analysis? The latter.

How do I figure out power before I begin? Do an a priori power analysis.

How do I figure it out after I've finished? Not surprisingly, do a post hoc power analysis?

What the heck does that mean? "A priori" = "before." "Post hoc" = "after."

How are they different?  Usually if you are doing an a priori analysis you are trying to figure out how many subjects you need. And usually if you are doing a post hoc analysis you are trying to figure out, given the number of subjects you had, how much power you had. I usually include both.

What's the best way to do a power analysis? Download G*Power. It's my favorite tool for power analysis. 

As always, if you need help with your power analysis, contact us at 802-382-7349 or visit us online at  

Sunday, September 15, 2013

Under what conditions can you claim that you have found a cause and effect relationship between variables?

The answer to this question involves two issues: (a) the level of the study, and (b) the "causality requirements," as I'll call them. My source is Oleckno, William A., Essential Epidemiology: Principles and Applications (2002), Long Grove, Illinois, Waveland Press, Inc. Although Oleckno wrote about epidemiology, the same principles hold for any statistical analysis related to a phenomenon among people. In other words, epidemiology can be understood quite broadly.

A. The Level of the Study

Here is a ranked list of the major types of studies in descending order of likelihood that the results might demonstrate a causal association:

  1. Randomized Controlled Trial 
  2. Randomized Community Trial 
  3. Prospective Cohort Study 
  4. Retrospective Cohort Study 
  5. Case-Control Study 
  6. Cross-sectional Study 
  7. Ecological Study 
  8. Descriptive Study 
The higher up your study type is on the list, the more likely it is that the relationship you have found will later prove to be a cause-effect relationship: later meaning after you have met the causality requirements described below.

B. Causality Requirements: Criteria for Assessing Causation 

All of the following criteria must be met before you can write about your variables have a causal relationship: 

  1. Correct temporal sequence (exposure to the independent variable must precede incidence of or change to the dependent variable) 
  2. Strength of association (high relative risk or rate ratio, high r or tau). 
  3. Consistency of association (requires followup studies or studies performed by other researchers that show the same or similar results) 
  4. Dose-response relationship (more exposure to the independent variable leads to greater incidence, relative risk, or odds ratio for the dependent variable) 
  5. Biological plausibility (Does the association make sense? Biological includes psychological) 
  6. Experimental evidence (For instance, if experiments have shown that microwaves affect living tissue, then you have a stronger chance of proving that microwaves caused some sort of physical outcome among your study participants should you find one) 
Consider both A and B when you begin to write that people should change their behavior based on your study results. You can still make recommendations, but it will be important that you note that you have not proven a cause-effect relationship unless your study (a) is a high-level test and (b) met all of the above causality requirements.