Datapede

Tuesday, May 20, 2014

Migration of posts to QS-2

Current and future datapede posts will be on Quantitative Scientific Solutions, LLC.

www.QS-2.com

"Quantitative Scientific Solutions LLC is a technical consulting and data analytics firm based in Washington, D.C.

QS-2 believes that complicated problems require innovative ideas and fresh strategies, and leverages strong technical capabilities to provide comprehensive and creative solutions to our client’s most challenging needs.

Building off of our deep expertise in the technical consulting sector, hard scientific and data driven approaches are at the heart of every service that we provide to our clients. QS-2 services some of the world’s leading institutions, working together to provide support and guidance in addressing the most interesting and challenging problems they face. We are focused on innovation, and work with our clients in both the development and utilization of advanced, disruptive technologies."

Posts will also appear on the QS-2 Facebook Page:

https://www.facebook.com/quantitativescientific

Thursday, March 27, 2014

Part 1z. The P-value: A surviving 'mosquito'

Extremists do not see the world in black or white.

Prior datapede posts have included discussions on the P-value: its origin, its inconvenient marriage with hypothesis testing, and its misconceptions.

This week, I read a paper titled "Scientific method: Statistical errors". This article sheds light on how the "P-value was never meant to be used the way it's used today" and how we should be very aware of the limits of conventional statistics.

The following points are a 'selective' summary mixed with additional details:

On reproducibility: Reproducibility is like the ghost that will always come back to haunt you. Most published findings have shown to be false, and scientists who have tried to reproduce results have found it immensely challenging. The article starts off with a common example of a psychology student wanting to prove a hypothesis that extremists quite literally see the world in black and white. The P-value with the initial data was < 0.01, very significant. Upon replication with additional data the P-value dramatically changed to 0.59. The investigators ended up not publishing their findings, but instead wrote an article about Scientific Utopia.
On Fischer: Fischer really did not intend for the P-value to be a definitive test. It was just part of a "non-numerical" process and it was inconveniently married to hypothesis testing to settle feuds of rivalry between Fischer and Neyman to develop a working mechanism for scientists.
On confusion: Yes, the P-value can be confusing. A significant P-value can perplex our thinking, where we simply get too excited and forget to look at the actual effect size. Does that < 0.05 really matter when the effect size is small? The author gives an example of the study which concluded that the "internet is changing the dynamics and outcomes of marriage itself". This study showed that those who meet their spouses online are less likely to divorce and more likely to have high marital satisfaction (of course with very significant P-values). However, the effect size was very very small where happiness, for example, barely moved from 5.48 to 5.64. So, do not sign up for match.com thinking that you may be happier with your spouse.
On the future: The future should hold a change of culture. In the interim, the following measures may help: (i) always report the effect size and confidence interval because the P-value does not; (ii) take advantage of Baye's rule; (iii) disclose all your methods in the paper, the assumptions used, the manipulations and all the measures; (iv) two-stage analysis or what is newly known as 'preregistered replication'. The first stage is called exploratory where investigators perform their study and preregister their ideas in a public database and how they plan to confirm findings. The second stage would include performing the replication study and publishing it along with the exploratory study. I really hope this becomes the norm.

------------------------------------------------------------------------------------------------------------
References and additional reading:
Nuzzo R. Scientific method: statistical errors. Nature. 2014 Feb 13;506(7487):150-2.

Nosek, B. A., Spies, J. R. & Motyl, M. Perspect. Psychol. Sci. 7, 615–631 (2012).

Cacioppo, J. T., Cacioppo, S., Gonzagab, G. C., Ogburn, E. L. & VanderWeele, T. J. Proc. Natl Acad. Sci. USA 110, 10135–10140 (2013).

Monday, February 17, 2014

The color green: Where incidence meets prevalence

Figure 1. Money Flow

Does Figure 1 (right) look familiar?
The majority should relate to it as a continuously encountered event.

This similar image has been used to describe incidence and prevalence and the relationship between the two (pebbles as representation). Incidence measures the frequency of events (such as the onset of illness) while prevalence measures the proportion of people who have the illness right now.

The relationship between prevalence and incidence is related to duration or time, where prevalence would approximate incidence when the duration of disease is short.

Prevalence ≈ (incidence rate) × (average duration of illness).

So, if the duration of disease is short (like the common cold) prevalence approximates the incidence rate. Specifically, the inflow of disease approximates the outflow. Outflow is usually due to two main reasons: death or cure.

Now let's apply this concept to money flow into individual bank accounts, where after some pondering the prevalence equals incidence concept surprisingly could be very applicable.

Figure 2. Bank account before payday

The majority of people live from paycheck-to-paycheck, and most of our balance accounts (including mine) look like Figure 2.

Using the following assumptions:

Dollar Incidence: Number of new dollars into balance on paycheck date (new disease). The incidence rate is usually calculated as the number of new cases within a specified time divided by the population at risk. I am not sure what the $$ at risk would be here or how to even think about computing it, so calculating a rate would be quite challenging, but for the sake of this analogy let's call new dollars as incidence (the unit of measurement is individual accounts).

Dollar Prevalence: Is basically how much your bank account has at this moment. It is usually calculated by comparing the number of people who have a condition with the total number of people studied. Again, I will not dwell on whether it is possible to calculate a proportion here. The unit we are looking at is individual bank account...So, let us consider again that the amount of $$ right now is called prevalence.

Applying the formula of prevalence equals incidence above, inflow is your paycheck being posted and outflow could be, again, due to main reasons: expenses or investments. As such, a labeled Figure 1 would look like (Figure 3):

Figure 3.

Given that expenses strike the day of, if not the very day after, your paycheck, the duration in your account is really short. In individual bank accounts for the average person, dollar prevalence would approximate dollar incidence at that point in time.

Is it true that "it doesn't matter how fast color travels it is how fast you can see it"? I am not sure about the source of this saying, but green I think is very fast.

Thursday, November 21, 2013

The Intraclass Correlation Coefficient (ICC) and the Z Fisher Transformation for the 95% Confidence Interval

Case Scenario

Suppose that there are 10 subjects with eyelid measurements from three different observes (making the total sample = 30), who used a new computerized technique to get eyelid images and later measured the parameters using image J software. What would you do to test for reliability of measurements and how would you go by performing this analysis? Please note that the dependent variable (eyelid measurement) is a continuous variable. This is a real example.

Approach

The best measure of reliability for continuous data is the Intraclass Correlation Coefficient (ICC). The ICC would inform you about the inter-rater reliability, where the higher the ICC the less unique information each additional measure provides.

P-value: A double-edged sword? (Part 1)

On Tuesday July 23rd, I posted a rough schematic of a 0.05 P-value cut off point on a curve and a hypothesis testing table, with a title and legend of "marriage of inconvenience". This is Part 1 of potentially a series of narratives which introduces some of the reasoning of why this phrase may be a good description of the relationship between the two and the alternatives. When searching if "marriage of inconvenience" has been applied in some other context, I found out that indeed it has. In 1963, Lionel Gelber (a Canadian Diplomat who writes about foreign affairs) wrote an essay on the relationship between Europe and the US describing it as a marriage of inconvenience, "a union in which partners who are incompatible in many respects yet are welded indissolubly together". When thinking about it, this applies well to the relationship between P-values and hypothesis testing, especially in biomedical research.

One of the very first questions reviewers, of biomedical research papers to clinical journals, would ask is whether your results are statistically significant (usually it is the fixed P-value <0.05)? [I actually find this peculiar especially when the big picture is missed]. What does that really mean, and how important is it to differentiate between statistical and clinical/meaningful significance, or even significance of the research question asked? This topic is by all means old (and I mean very old) where writing about misconceptions of P-values have appeared in the literature for decades and people still either do not believe them or simply don't know what the alternative is.

The P-value is a double-edged sword, great to have but potentially a trickling problem if not interpreted properly (which is the case most of the times) An article by Steven Goodman in 2008 lists twelve misconceptions of the P-value (calling it a Dirty Dozen, listed below), and I agree they are "dirty"! While I thought I would be the first to compare statistical testing using p-values, hypothesis testing (null and alternative) with fixed error probabilities, and posterior probabilities, I am not. James O. Berger published an article in Statistical Science titled "Could Fisher, Jeffreys, and Neyman have agreed on Testing?" discussing these methods. There are several other articles that explain in details the confusion that the P-value in significance testing or hypothesis testing has generated. Different articles blame different people for the fixed value of 0.05 (Lehmann 1993, Berger 2003). Regardless of who came up with it, it is important to understand the uses of P-values and the available alternatives. The question becomes what would a well-intentioned researcher do? Is it a mixed approach of p-values and/or Type I and Type II errors and/or Bayesian measures? Would different methods be used in different contexts (e.g. Type I and Type II errors for screening)?

------------------------------------------------------------------------------------------------------
Twelve P-value Misconceptions: (Taken from Table 1. In: Goodman S, 2008):

If P = 0.05, the null hypothesis has only a 5% chance of being true.
A nonsignificant difference (eg, P >.05) means there is no difference between groups.
A statistically significant finding is clinically important.
Studies with P -values on opposite sides of .05 are conflicting.
Studies with the same P value provide the same evidence against the null hypothesis.
P = 0.05 means that we have observed data that would occur only 5% of the time under the null hypothesis.
P = 0.05 and P <.05 mean the same thing.
P-values are properly written as inequalities (eg, “P <0.02” when P = 0.015)
P = 0.05 means that if you reject the null hypothesis, the probability of a type I error is only 5%.
With a P = 0.05 threshold for significance, the chance of a type I error will be 5%.\
You should use a one-sided P value when you don’t care about a result in one direction, or a difference in that direction is impossible.
A scientific conclusion or treatment policy should be based on whether or not the P value is significant

-------------------------------------------------------------------------------------------------------------
Readings:

Gelber L. A Marriage of Inconvenience. http://www.foreignaffairs.com/articles/23478/lionel-gelber/a-marriage-of-inconvenience. January 1963 (last accessed 9/20/2013).

Goodman S. A dirty dozen: twelve p-value misconceptions. Semin Hematol. 2008 Jul;45(3):135-40.

Lehmann EL. The Fischer, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two? J Am Stat Assoc. J Am Stat Assoc. 1993 Dec; 88(424):1242-1249

Berger JO. Could Fischer, Jeffreys and Neyman Have Agreed on Testing? Statistical Science
2003, Vol. 18, No. 1, 1–32. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.167.4064&rep=rep1&type=pdf (last accessed 9/20/2013).

Wednesday, August 21, 2013

"It's all about the content. Get better content or use sparklines". Edward Tufte

Yesterday I went to Edward Tufte's one-day course on data visualization: Presenting data and information (http://www.edwardtufte.com/tufte/courses).

I am sharing below some rough take home notes I took:

New methods of presenting will be sans PowerPoint. Provide readings, discuss, and explain with visuals as you go along.
Get better content

Use sparklines. Sparklines are datawords: data-intense, design-simple, word-sized graphics. They have applications for financial and economic data by tracking changes over time. Sparklines also reduce recency bias and may aid in better decision making (it was one of the first times I hear about a bias termed "recency"). You can easily create sparklines using excel, the sparkline feature was added in Microsoft excel 2010. I created my sparkline satisfaction rate with Tufte's style and lecture over the course of the day (Figure).

Data points varied over the course of the day. The First data point was 8.89 at 10:15 am and my last data point was 9.99 at 16:15 pm. The lowest point is highlighted in red because the course stopped and I had to go find food. The highest point is highlighted in green because the course ended 2 minutes early and satisfaction was high given the feeling of wanting more. Galileo and Euclid were also mentioned which also contributed to a higher satisfaction, given my fascination in both. Note that this is just an example to illustrate a sparkline, satisfaction is subjective. Also Tufte strongly encouraged links to raw data so I am pasting the original table at the end.

Datapede leisure time

These past two weeks were spent with family and friends. During this leisure time, the first Datapede tote bag was developed at T-Gallery, a custom T-shirt store in Greenwich village in New York. I recommend it if you are ever interested in designing t-shirts and/or tote bags.

On another note, I will be attending a one day course tomorrow with Edward Tufte on the visual display of quantitative information (http://www.edwardtufte.com/tufte/courses). Edward Tufte is a statistician and professor emeritus of statistics, political science, and computer science at Yale. He is a pioneer in data visualization. I will be sharing take home notes from the course. Additionally, upcoming posts on Datapede will be about:

P-value: A double-edged sword?
More of the confidence interval

I have also gotten some requests to discuss more epidemiological principles. I will weave those in future posts.