I have just finished a wonderful book, Reading in the Brain: The New Science of How We Read, which ostensibly is about the origins and the neural underpinnings of the human ability to read. Dehaene not only takes on the conventional wisdom that reading is a sequential, letter by letter process, but he unwittingly sheds light on the foundations of many universally accepted usability practices.
The big picture
The major takeaway from the book is that, contrary to conventional wisdom, reading is not like a standard string search algorithm that one might program into a computer (e.g. to look up the word “time”, one might start with all words that begin with t, narrow down to words containing i as the second letter, and so on until there is only one word left).
Instead, reading is a massively parallel and hierarchical process whereby neurons at the bottom of the pyramid are allowed to vote “yes” or “no” to a neuron one level up the hierarchy. This aggregation neuron tallies the votes of all neurons that report to it, dutifully passing its findings further up the pyramid.
So how does this work in practice? Let’s take our example of the word “time”.
At the very bottom of the pyramid are neurons which respond to very basic elements in an image: vertical lines, horizontal lines, diagonal lines, intersections, right angles, etc. For our ’t’ in ”time”, there is perhaps a neuron somewhere which detects the intersection of the vertical line and horizontal line in the letter ‘t’. This neuron then proceeds to vote “yes” in favor of every possible letter that contains this intersection. It also votes “no” for letters which do not contain this intersection, like ‘s’.
The aggregation neuron in charge of the letter ‘t’ receives votes from below, tallies them, and then sends a vote to the next level in the pyramid that the word probably contains a ‘t’. Meanwhile, this higher neuron is simultaneously receiving “no” votes for letters like ‘c’, ‘y’, and ‘u’. Eventually, the tallied votes for ‘t’, ‘i’, ‘m’, and ‘e’ win the election, and the chaotic process very quickly converges on the right word – somewhere near the top of the pyramid.
What this means for usability
I am certainly not suggesting that these new findings will turn the field of usability on its head. However, they do shed some light on some long-held usability best practices.
The brain’s visual system is biased to the right of one’s focus
Most of the text processing capabilities of the brain reside in the left hemisphere. This means that visual stimuli appearing on the right side of the visual field have a decided advantage.
In fact, information on the left side of the visual field has to enter the right hemisphere of the brain and be transferred through the corpus callosum to the left hemisphere – through two centimeters of callosal cable. As a result, words on the left side of the visual field are recognized more slowly and are subject to more errors.
In the English-speaking world, our left to right reading style seems to be well adapted to this limitation. However, I think this idea could be generalized to more than just words.
It seems to follow that not only are pages optimally processed from left to right, but that actions regarding the current area of focus are best placed on the right. This approach could certainly apply to context menus, action buttons, and error states.
Word familiarity matters, length doesn’t
Because of the massively parallel operation of the brain’s neural network, the brain processes all of the letters in a word at once, and common words receive stronger votes in the voting process.
This finding holds for words of up to eight letters, where presumably the eye has to move forward to process the rest of the letters. Therefore, it is often better to use a longer and more familiar word than a shorter and less common one.
Words are not recognized by their shape
The image from the retina is initially processed in what is called the “letterbox” area of the left hemisphere, and it turns out that this area processes uppercase and lowercase letters in parallel. Indeed, according to the book, the small difference in reading performance for the sentences “It was a dark and stormy night” and “It WaS a DaRk AnD sToRmY nIgHt” is due simply to the fact that capital letters are less familiar than mixed case.
It is not immediately clear how far this finding extends into font choices. How unusual must a font be before a lack of familiarity slows reading speed? The book suggests that our text processing system is highly tolerant of errors, giving examples of situations where transposed letters, missing letters, or misshapen letters have no effect on reading performance.
So it seems usability best practices were right for the wrong reason. Yes, it is slightly slower to read all capital letters, but it is only because we see mixed case more often.
And what about images?
It turns out that the brain uses similar mechanisms to recognize images, and to some degree text processing is just a very specialized case of the brain’s natural ability to recognize shapes and patterns in a visual field.
The brain only stores a simplified version of images
The brain is not a video recorder, meticulously documenting every detail of an image.
In fact, it stores only a very abstract generalization of an image. For example, the same neurons that recognize a shape as a person will also fire when presented with a circle with a smaller circle placed on top. Likewise, and Apple can be recognized either by a full representation or simply a circle with a line sticking out of it.
So what exactly serves as the essence of an image?
Just as the brain uses line intersections to differentiate letters (a ‘t’ shaped intersection versus an ‘L’ shaped one), it also uses intersections to store and represent images. These different intersection types, called proto-letters, are the basic “alphabet” of image recognition.
For instance, the ‘T’ intersection is one of the most common intersections found in nature, often when one object is placed in front of another.
Not only does the brain use these intersections to store objects, but it is almost impossible to store the image without them, as you can see in the following illustration:
Mirror perception simplifies image recognition
The human brain is hardwired for left-right invariance, but is sensitive to vertical orientation.
From the standpoint of evolution, it is easy to see why this might be so. As Dehaene states, “A Tiger’s left and right sides are equally threatening… But the beast is less of a threat upside down than right side up!”
The upshot of this is that despite what you might think, you probably have no idea of the orientation of everyday things. (Quick, does the Firefox icon flow clockwise or counterclockwise?)
Is seems that any left right distinctions that we do make (like left and right arrows) have to be specifically learned.
Recently viewed images are recalled faster
Called image priming, researchers have found that images that have been seen before are recalled much more quickly than new images. This effect is seen even weeks after the original image was displayed.
As User Experience practitioners, we are always learning rules. Sometimes we are fortunate enough to know the ‘why’ behind these rules, but often we do not.
Especially now that eyetracking and A/B testing are widely used, it is becoming commonplace to improve engagement or conversion without touching the why question at all.
But answering the ‘why’ question is important.
All of the A/B testing in the world will just find local maximums. Getting in the vicinity of the global maximum for a design is unfortunately still more of an art than a science. And it’s harder to get there when all we have is intuition, experience, and inference.
That’s why I welcome all of these nuggets we’re getting from neuroscience. Right now, it’s just a bunch of puzzle pieces, but I hope more discoveries from neuroscience will help us put the pieces in place.