Monday, May 20, 2013

A most unusual paper

In 1974, the Journal of Applied Behavior Analysis published a most unusual manuscript.  The journal received the manuscript on 25 October 1973, and published it without revision.  The manuscript contained not a single word of text, except for the title, name of the author, his affiliation, the subtitle "References", and a brief acknowledgement.  

There were no equations, no figures, and no references.  Essentially, the manuscript was a blank page, authored by Dennis Upper of Veteran's Administration Hospital of Brockton, Massachusetts.  

When the manuscript was published, the journal also published a reviewer's comments.  Here is what the reviewer had to say: 

"I have studied this manuscript very carefully with lemon juice and X-rays and have not detected a single flaw in either design or writing style.  I suggest it be published without revision.  Clearly it is the most concise manuscript I have ever seen --- yet is contains sufficient detail to allow other investigators to replicate Dr. Upper's failure.  In comparison with the other manuscripts I get from you containing all that complicated detail, this one was a pleasure to examine.  Surely we can find a place for this paper in the Journal --- perhaps on the edge of a blank page."


The paper was titled: "The unsuccessful self-treatment of a case of writer's block".  Since publication, it has been cited 29 times.

D Upper (1974) The unsuccessful self-treatment of a case of writer's block.  Journal of Applied Behavior Analysis 7(3):497.

Saturday, April 6, 2013

A brief history of the impossible


Some years back I sat with my son as he did his math homework.  Looking over his shoulder, I saw that he was working on the function 
After he finished plotting it, I thought, let’s try another one, 
I began by plotting the function for various integers of x and got the black dots in the graph below.  Then, naively, I connected the dots:


Looking at the plot, I realized that this could not possibly be right.  How could this function cross the x-axis?  That would mean that there are some values of x for which f(x) = 0.  But there were no such values.  After all, you could not raise a number to a power and get zero.  So what’s going on?

What is going on is that our function 
spends most of its time outside of my piece of paper, in an ‘imaginary’ world that lies above and below the plane of my paper.  The points that I had plotted in the figure (the black dots) are the points that I can see when this function crosses the plane of the paper.  The rest of the time the function is outside my plane.  Here is what our function really looks like:


In the above figure, the plane of the paper is colored green.  Our function is like a cork-screw, winding itself around the x-axis.  I wondered, how did humans discover that in addition to the “real” world (the plane of my paper), there must also exist an “imaginary” world?  What was the origin of the idea of imaginary numbers?

In a wonderful little book called “An imaginary tale: the story of square root of -1", Paul Nahin recounts the journey.  Surprisingly, the discovery has little to do with quadratic equations, and everything to do with cubics.

Roots of equations
In the 16th century (and for a century or two after that), mathematicians were very much concerned with geometric meaning of equations.  So if you asked one what is the root of the following equation
he or she would think about it in terms of the function 
and ask where this function crosses the x-axis.  Here is what this function looks like:




Our quadratic function never crosses the x-axis, and so our 16th century mathematician would respond by saying that the equation
 
is impossible because
 
never touches the x-axis.  That would be the end of the conversation.  Indeed, as Nahin explains, this is why the origin of imaginary numbers did not start with quadratic equations.  Rather, “impossible” numbers like

had their origin in cubic equations.

Depressed cubics
Scipione del Ferro was a 16th century Italian mathematician working on cubic equations of the form:
(1)  
These are called depressed cubics because they are missing the quadratic x term.  His objective was to find the roots of this equation, which translates into finding the value or values of x for which this equation is true.  This means finding the value of x for which the function
 
crosses the x-axis.  A cubic function will always have at least one location at which it will cross the x-axis, so del Ferro knew that there must exist at least one value of x for which this equation is true.

He started by assuming that the solution could be written as the sum of two number, u and v: 
If we put this into our cubic equation we get:
(2)  
Expanding it we have:
(3)
We can pick u and v arbitrarily (as long as x = u + v), and so del Ferro picked u and v such that



This implies that the second term in Eq. (3) is zero, and so we have:
(4)  
To solve the above equation set
 
and so we have
Del Ferro knew how to solve quadratic equations.  We have
 
which means that:
(5)  
From Eq. (4) we had 
and so
(6)  

The way to understand Eqs. (5) and (6) is as follows: u can take on two values, one given by the plus term, and the other given by the minus term.  When u is given by the plus term, v is given by the minus term, and so on.  Now if we write the solution x = u + v, we end up with the expression:
(8)   
When p and q are positive the right side of Eq. (8) will become the third root of a negative number, which can be uncomfortable to deal with, and so let us re-write it by noting that
Using this we can re-write Eq. (8) as:
(9)  
del Ferro had found a solution to a cubic, something that had eluded man for 2000 years, ever since Babylonian times.  

This was a remarkable achievement indeed.  However, del Ferro knew that in his equation lied a deep mystery: when p and q were both positive his equation gave the correct answer, but when one or the other was negative, his equation gave an impossible answer.  He did not know why this formula seemed to fail in some cases.  This, it turns out, is the key mystery that led to discovery of imaginary numbers.

The impossible equation        
Consider the cubic equation
When we plot this equation, we have:  



The equation crosses the x-axis at x=2, and so 2 is one of the roots of this equation (indeed, 2 is the only real root).  Using del Ferro’s formula (Eq. 9) and a calculator we find that the rather hairy calculation produces an answer that is, remarkably, exactly 2.  So far so good.

Next, let us try the cubic
When we plot this equation, we have:  





We see that x=4 is a solution.  In fact, our cubic crosses the x-axis three times, and one of those times is at x=4 (this cubic has three real solutions).  But now let us try del Ferro’s formula.  From Eq. (8) we have:
(10)  
But if del Ferro’s formula is correct, then the following must be true:
(11)  


And so we arrive at the mystery: we know that x=4 is a solution to this cubic, and we know that del Ferro’s formula is correct.  Yet, when we use it, we get what appears to be an impossible equation (Eq. 11): we have two instances of a square root of a negative number, which at del Ferro’s time were thought to be meaningless, and yet when these two numbers add, they produce a real number!  How could that be?

It took another 50 years of thinking, and the result was a book entitled Algebra (1572), by Rafael Bombelli, a mathematician that received no college education.  He was the first to see that Eq. (11) required existence of a whole new set of numbers, called imaginary numbers.

He proposed that perhaps Eq. (11) is true because each of the third roots produce something that is partly real, and partly imaginary, and the sum causes the two imaginary parts to cancel, leaving only a real part.  That is, he proposed that:
(12)  

We proceed by cubing the two sides of Eq. (12):
 (13)  
To solve for a and b, we set:
 (14)  
And we find that the solution is a = 2, and b = 1.  So Bombelli showed that: 
 (15)  
And therefore Eq. (11) is true because the imaginary parts of the third roots cancel, leaving a real number. 

The origin of imaginary numbers was in cubic equations.  These equations always have at least one real root, clearly crossing the x-axis, yet del Ferro’s equation that was supposed to give that root instead gave an expression that included square root of negative numbers.  Bombelli showed that those “impossible number” were things that could be handled by introduction of what we now call imaginary numbers.  For that accomplishment, there is a crater on the moon named after Rafael Bombelli.
 

Sunday, February 17, 2013

Painful memories, and effortful actions

How does the brain evaluate a painful episode?  When you look back at an unpleasant episode of your life, how does your impression of it now relate to the actual experience that you had during the episode?

Surprisingly, when we recall a painful experience we seem not to evaluate it based on its duration, or its temporal integral, or its mean pain.  That is, it does not matter very much if one experience was on average more painful than another, nor does it matter that one experience was longer than another.  Rather, we seem to evaluate the totality of a painful experience using two factors: magnitude of the peak of the pain, and the magnitude of the pain as the episode ended.  Here, I will describe the basic experiments that led to these ideas, and then suggest a new interpretation of rather puzzling results regarding how the brain evaluates effort in simple motor control tasks.

Cold water bath

In 1993, Kahneman and colleagues asked 32 volunteers at University of California Berkeley to put both their hands in a cold water bath for 5 seconds.  Next, one hand was chosen at random and placed in cold water for 60 seconds (or 90).  After a brief rest period, the other hand was placed in cold water for 90 seconds (or 60).  In these two episodes the temperature of the water was the same for the first 60 seconds (21 degrees Centigrade).  However, in the last 30 seconds of the 90 second episode, the temperature was increased by 1.1 deg.  So in the 90 second episode one hand always experienced a longer period of discomfort, but the episode for that hand ended with slightly warmer water. 

During the time that their hand was in water the subjects used their other hand to adjust a knob to continuously indicate their discomfort.  As you would expect, the discomfort increased immediately as the hand was placed in the cold water, reached a peak at around 60 seconds, and then declined for the next 30 seconds. 

After the two episodes were completed, the subjects were told that they would need to put their hand in cold water one more time but that they could choose which episode they wanted.  The main dependent variable was the subject’s choice for this third episode.  Logically, no one should pick the episode that lasted 90 seconds.  But remarkably, most subjects (22 of 32, 69%) preferred to repeat the longer episode.  Indeed, most subjects indicated that the longer episode had caused less overall discomfort!

This suggested that when people evaluate painful episodes, what matters is not the duration, but rather the magnitude of the pain as the episode ended.  However, a potential confound with the cold water experiment is that we know that memory fades with time, and so perhaps evaluating the pain of an episode relies more on the ending because the memory of the early parts have faded.  Perhaps if the subjects were asked to remember the episode a few days later, they would not recall it the same way as a few minutes after the end of the episode.  Was this temporal decay the reason for the seemingly illogical choice?  To test for this, Kahneman and colleagues performed a new experiment.

The perceived pain of a medical procedure

Redelmeier and Kahneman (1996) asked patients that were undergoing colonoscopy (n=154) or lithotripsy (a procedure to destroy hardened masses, n=133) to give assessment of their pain by pointing to a scale at one minute intervals.  The colonoscopy lasted from 4-67 minutes, and the lithotripsy lasted from 18-51 min.  One hour after the procedure the patients were asked to judge the total amount of pain experienced using the same scale. 

To check for reliability of the evaluations, some of the patients were asked to recall the experience 6 months (colonoscopy) or 1 year (lithotripsy) later and again evaluate the total pain.  The retrospective ratings at 6 months and 1 year were correlated at r=0.77 and r=0.54 for the two groups.  For the colonoscopy group the ratings at 6 months had the same mean as at 1 hour, for the lithotripsy group the average ratings at 1 year were 15% higher than at 1 hour. 

In the colonoscopy procedure the pain intensity was higher at start than at end, whereas in the lithotripsy procedure pain intensity was low in the first few minutes and ended higher. 

Having collected these data, the investigators asked what aspect of the painful experience was a predictor of the immediate ratings at 1 hour, or the follow-up ratings at 6 months or 1 year.  Duration of the procedure was not a predictor of the immediate or follow-up ratings.  Rather, peak pain was the most powerful predictor of both ratings (r=0.6 for each), and end pain was the second most powerful predictor (r=0.4 for each).  These correlations held for both of the procedures.  The combination of the two factors increased the correlations to about 0.67 and 0.65 for immediate and follow-up ratings.

So people’s impression of the relative pain they endured during an episode remained fairly consistent at 1 hour and at many months after the episode.  Their impressions were predicted by two aspects of their actual experience: magnitude of the peak of the pain, and magnitude of the end pain.  Duration of the episode played little or no role.  

When we remember a painful episode, the most salient aspects of that episode seem to be the peak of the pain, and how it ended.  To improve our perception of a difficult episode, it may be more beneficial to prolong it and gradually reduce the pain, rather than shorten it and abruptly end the pain. 

Perception of effort

This idea of peak-end perception of pain may help us understand a rather puzzling result in the field of motor control.  One of the fundamental questions in motor control is how the brain evaluates effort.  The variables of interest are force and time, and the question is with regard to our perception of effort as a function of these variables.

In 2004, +Konrad Kording+Daniel Wolpert and colleagues performed an experiment in which volunteers held a robotic arm and experienced a sinusoidal-like force profile of peak F and duration T.   Next, they experienced another force pattern of peak F’ and duration T’.  They then asked their volunteers which force they would like to experience again.  They were told that they should choose the force that required the least effort.  In this way, the investigators estimated indifference curves, i.e., curves along which the subjects were indifferent to changes in peak force and duration.

The rather unexpected result was that as the duration of a force pattern increased (beyond about 200ms), the indifference curve also increased.  This means that given a choice between some peak force and short duration, vs. the same peak force and longer duration, the subjects picked the longer duration!  

How could a longer duration of an effortful task be preferable to a shorter duration?

A close look at how the force patterns were produced provides a possible answer.  The forces were sinusoidal with a period that depended on T.  So as the duration increased, the rate at which the force changed decreased.  This means that for a longer duration force, the forces gradually came to an end, whereas for a short duration force, the forces rapidly came to an end.  People preferred the gradually ending force, despite the fact that they would be producing the forces for a longer amount of time.

The peak-end hypothesis of pain perception may have relevance to how the brain measures effort.

Acknowledgements: I am grateful to +Alaa Ahmed of University of Colorado for discussions regarding these ideas.

References

Kahneman D, Fredrickson BL, Schreiber CA, and Redelmeier DA (1993) When more pain is preferred to less: adding a better end. Psychological Science 4:401-405.
Kording KP, Fukinaga I, Howard IS, Ingram JN, and Wolpert DM (2004) A neuroeconomic approach to inferring utility functions in sensorimotor control.  PLoS Biology 2:e330.
Redelmeier DA, and Kahneman D (1996) Patients’ memories of painful medical treatments: real-time and retrospective evaluations of two minimally invasive procedures.  Pain 66:3-8.

Monday, January 21, 2013

Why are gun rights proponents more politically active?


In January of 2013, about a month after the horrific shootings of children in Newtown, Connecticut, the Pew Research Center released a survey of gun-related political leanings of people in America.  They first asked the respondents to identify themselves as either gun rights proponents, or gun control proponents.  They then asked the respondents questions about their political activity: did they contribute money to organizations that took a position on gun policy?  Had they contacted a public official to express an opinion on gun policy?  Had they signed a petition on gun policy?  Etc.  The results indicated that those who prioritized gun rights were 1.7 times more likely to have been politically active (i.e., participated in one or more of these activities) than those who prioritized gun control.  Why should gun rights advocates be almost twice as likely to be politically active than gun control advocates?

To understand this behavior, it is useful to consider how the human brain makes choices when faced with gains and losses. 

In 1990, Kahneman and colleagues performed an experiment in which they selected some participants and gave them a coffee mug as a gift.  They then asked them to assign a minimum price on the mug that they were willing to sell it.  These participants asked for about $7.  They then took another group of participants and showed them the same mug and asked how much they would be willing to pay to own it.  They responded around $3.  Knetsch (1989) found that people who are given a chocolate bar want $1.83 to sell it, but will pay only $0.90 to buy it.  The difference in the two prices is explained by loss aversion: the sellers evaluate the choice of giving up something that they already own by viewing it as a psychological loss.  In order to compensate for that loss, they request a lot of money.  Buyers, on the other hand, evaluate the choice as a psychological gain.  They are willing to pay much less for the pleasure that they perceive in owning it. 

In general, the pleasure that you feel if someone was to give you an item tends to be much less than the pain you feel if you were to own that item and were to lose it.  This is called an endowment effect. 

Carmon and Ariely (2000) explain this behavior by suggesting that when faced with loss of something (e.g., selling), people focus on their sentiment toward surrendering the item (and not the money that they are gaining), whereas when faced with gain of something (e.g., buying), people focus on their sentiment toward what they forgo (typically money, and not the item they are gaining).

Now let us consider the question of why gun rights proponents are more politically active than gun control proponents.  The current political climate is one in which the President and the Congress are considering laws that would limit gun rights.  This is viewed as a loss to gun rights proponents.  In contrast, the same laws are viewed as a gain for gun control advocates. 

The gun rights proponents (but not the gun control proponents) are under the influence of the endowment effect because if the proposed laws are enacted, it would result in a loss of what they already ‘own’.  For them, the proposed laws carry a negative psychological value.  If we could generalize from behavioral economics literature, we would speculate that this negative value is about twice as large as the positive psychological value that would be gained from the perspective of gun control proponents.  This may be the reason why the gun rights proponents are about twice as likely to be politically active as the gun control proponents. 

The deeper idea is that any change from the status quo will meet with much stronger resistance by those who view the change as a loss, as compared to the enthusiasm that it fosters in those who view the change as a gain.

References
Carmon, Z. and Ariely, D. (2000) Focusing on the forgone: How value can appear so different to buyers and sellers.  Journal of Consumer Research 30:15-29.
Kahneman D., Knetsch J., and Thaler R. (1990) Experimental tests of the endowment effect and the coase theorem.  Journal of Political Economy 98:1325-1348.
Knetsch J. (1989) The endowment effect and evidence for nonreversible indifference curves. American Economic Review 79:1277-1284.

Thursday, January 10, 2013

How to find an outlier


How do we know when a data point is an outlier?  Take a look at the figure below.  It represents 15 data points that were gathered in some experiment.  Would you say that the left-most point is an outlier? 


Maybe the instrument that collected this data point had a malfunction, or maybe the subject that produced that data did not follow the instructions.  If we have no other information than the data, how would we decide?

When we say a data point is an outlier, we are saying that it is unlikely that it was generated by the same process that generated the rest of our data.  For example, if we assume that our data was generated by a random process with a Gaussian distribution, then there is only a 0.13% chance that we would collect a data point that is 3 standard deviations from the mean.  So what we need to do is try to estimate the standard deviation of the underlying process that generated the data.  Here I will review two approaches, and then show how successful they are in labeling outliers.

Median Absolute Deviation (MAD)                                          
Hampel (1974) suggested that we begin with finding the median of the data set.  


Next, we make a new data set consisting of the distance (this is a positive number) between each data point and the median.  Finally, we find the median of the new data set.  That is, we compute the following:

MAD = b median( abs(x – median(x) ) )

If we set b=1.4826, then MAD is an estimate of the standard deviation of our data set, assuming that the true underlying data came from a Gaussian distribution.  For our data set above, here is the estimate of the standard deviation, centered on the median:



Based on MAD estimate of the standard deviation, we would say that the left-most data point is indeed more than 3 estimated standard deviations (MADs) from our estimate of the mean (the median). 

So a typical approach is to label as ‘outlier’ a data point that is farther than 3 times the MAD (standard deviation) than the median of the data.  That is, compute the following for each data point:

abs(x – median(x) ) / MAD

Label as ‘outlier’ the data points for which this measure gives you a number greater than 3.   But how good is this method?  To check it, I did the following experiment.  I generated data sets drawn from a normal distribution with a constant mean and standard deviation, and then computed the probability of a false positive, that is, I computed how likely it was that a point would be labeled as outlier by MAD, when in fact it was less than 3 standard deviations from the mean.  Here is the resulting probability, plotted as a function of the data size:


The above plot shows that when the data set is small (say 10 data points), about 20% of the data points that the algorithm picks as outliers are in fact within 3 standard deviations of the mean.  As the data set grows larger, the probability of false positives declines and the algorithm does better.  But even for a data set of size 20, there is better than 15% chance that the bad data point is in fact not bad.


Median Deviation of the Medians (MDM)

Rousseeuw and Croux (1993) suggested a method that, as we will see, is better.  For each data point xi, we find the distance to all other data points and find the resulting median.  We do this for all data points and we get n medians.  Now we find the median of this new data set:

MDM = c median( median( abs(xi –xj) ) )

If we set c=1.1926, then MDM is a robust estimate of the standard deviation of the data set, assuming that the true underlying data came from a Gaussian distribution.  For our data set above, here is the estimate of the standard deviation:


To check how this method compares with MAD, I generated data sets drawn from a normal distribution with a constant mean and standard deviation, and then computed the probability of a false positive, that is, I computed how likely it was that a point would be labeled as outlier by MDM, when in fact it was less than 3 standard deviations from the mean.  Here is the resulting probability, plotted as a function of the data size:


The above plot shows that regardless of the size of the data (here ranging from 6 data points to 20), a data point that MDM labels as an outlier has about 9% chance of being a false positive, i.e., not an outlier.  For small data sets, MDM is two to three times better than MAD.

References
Hampel FR (1974) The influence curve and its role in robust estimation. Journal of American Statistical Association 69:383-393.
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. Journal of American Statistical Association 88:1273-1283.


Saturday, December 22, 2012

Tehran cemetary

The transition from west to the Middle East begins at a European airport.  In the waiting area for the flight to Tehran, a young woman stands up and holds her hands close to her face, appearing to be reading a small, imaginary book.  She bows her head slightly, and then kneels to the ground, bringing her head down to the cold stone floor.   The ritual lasts no more than a couple of minutes, and after she finishes, she simply sits down and carries on a conversation with her fellow traveler.  But then someone else, over at the corner of the room, stands up and starts the ritual, facing exactly the same direction, holding up the same imaginary book.  

It’s time for afternoon prayers, and the devout do not need anything other than their faith to perform it.

Tehran is a densely populated, sprawling city that sits on the edge of a mountain range.  In the winter, when the westerly wind blows away the brown smog, the city is spectacular: a pearl necklace of snow covered jagged granite rise up toward the clouds, seemingly a few feet away from the tall apartments in the northern edge of the city.

The only reasonable method of transportation is the metro: a clean, modern system that is slowly growing.  At the last stop on the southernmost point of Line 1, something delightful awaits the traveler.  As you near the exit turn styles, a few people are standing with baskets or boxes full of cookies or sweets, giving them away for free.  They wait until all their food is gone before they go in to catch the train.  This is the stop for Beheshte-Zahra, the city’s main cemetery for its millions of inhabitants.  In the Iranian tradition, visiting your loved ones at the cemetery is a ritual, filled with compassion and giving to strangers; you offer food to a stranger, and silently ask for a prayer for your departed. 

The cemetery itself is a checkerboard of graves, marked only with black rectangular granite or white marble tombstones, lying flat on the ground, with the face of the departed chiseled in the stone.  The tombstones are works of art commissioned by ordinary people, each piece using calligraphy to describe a departed, often including a few tearful lines of poetry to measure the loss.

Many of the tombstones have only the top half marked, leaving one half unfinished.  Here is a wife or a husband, awaiting their mate.

People are grieving, of course, but there is a sense of shared pain, as all have brought something to give, making a friend for a moment, receiving a smile, a nod of the head, a few words of comfort.  Some even bring small stoves and make traditional soups (in winter) near the grave that they have come to visit.  You see the elderly woman making the soup, and the young boy walking with small bowls and spoons, offering it to strangers.


Friday, November 23, 2012

Choosing sides


On thanksgiving, after the turkey, homemade fruit salad, mango salsa, cheesecake, and blackberry pie, about fourteen of us, two families, head to an indoor volleyball court.  The young ones, from 12 to 20 something, take off early to buy a volley ball, which turns out to be a little difficult.  The older ones, the two dads (me and my friend), join them a little later. 

On the court, after a couple of hours of fun where teams are randomly put together, we come to the final game, where the two dads pick their teams.  Your competitive nature takes over your brain, and you pick the best player among the ones waiting to be called, not thinking that these are your kids, and you are choosing sides.  

On game point, my friend’s daughter, who is on my team, serves a good ball, but they return it strong, and our back center puts it in the net.  We all think that the point is over, but my son digs it out of the net, and another of my friend’s daughters puts it over for a win. 

Explosion of laughter and cheering.                                                                                  

My friend’s daughter yells out to him: "you should have picked me dad!"