Uncharted: Big Data As A Lens On Human Culture

“One of the most exciting developments from the world of ideas in decades, presented with panache by two frighteningly brilliant, endearingly unpretentious, and endlessly creative young scientists.” – Steven Pinker, author of The Better Angels of Our NatureOur society has gone from writing snippets of information by hand to generating a vast flood of 1s and 0s that record almost every aspect of our lives: who we know, what we do, where we go, what we buy, and who we love. This year, the world will generate 5 zettabytes of data. (That’s a five with twenty-one zeros after it.) Big data is revolutionizing the sciences, transforming the humanities, and renegotiating the boundary between industry and the ivory tower. What is emerging is a new way of understanding our world, our past, and possibly, our future. In Uncharted, Erez Aiden and Jean-Baptiste Michel tell the story of how they tapped into this sea of information to create a new kind of telescope: a tool that, instead of uncovering the motions of distant stars, charts trends in human history across the centuries. By teaming up with Google, they were able to analyze the text of millions of books. The result was a new field of research and a scientific tool, the Google Ngram Viewer, so groundbreaking that its public release made the front page of The New York Times, The Wall Street Journal, and The Boston Globe, and so addictive that Mother Jones called it “the greatest timewaster in the history of the internet.” Using this scope, Aiden and Michel—and millions of users worldwide—are beginning to see answers to a dizzying array of once intractable questions. How quickly does technology spread? Do we talk less about God today? When did people start “having sex” instead of “making love”? At what age do the most famous people become famous? How fast does grammar change? Which writers had their works most effectively censored by the Nazis? When did the spelling “donut” start replacing the venerable “doughnut”? Can we predict the future of human history? Who is better known—Bill Clinton or the rutabaga? All over the world, new scopes are popping up, using big data to quantify the human experience at the grandest scales possible. Yet dangers lurk in this ocean of 1s and 0s—threats to privacy and the specter of ubiquitous government surveillance. Aiden and Michel take readers on a voyage through these uncharted waters.

I was lucky enough to read Aiden & Michel's original study, "Quantitative Analysis of Culture Using Millions of Digitized Books," when it appeared in Science on 14 January 2011. It was an astonishing piece of scholarship, one of the rare papers that divides an entire branch of human learning into "before" and "after." I felt the hair on the back of my neck rise as I read it. In essence, they mined through the Google Books database to answer concrete questions about linguistics, culture, politics, even topics such as the nature of fame and the pace of propagation of new technologies. It was a tour de force.The title of this book, "Big Data as a Lens on Human Culture," suggests that it will be a general text on Big Data, but it is not. It covers only this body of work by these two researchers and their assistants.The book repeats the contents of that 2011 article, explaining the results for the general public, adding some discussion of the origins of the work and the researchers' thoughts about the future. In the process, they expand the original piece, which was about six pages long excluding notes, to about 220 pages. Some of the new material is fun; I got a kick out the story about a romance novel that had been alphabetized and the information that could still be gleaned from it. Others seem like padding; who cares about this history of lexical concordances?It's a shame that Aiden & Michel wrote this book themselves; the same material coming from a third party would not have seemed so self-congratualtory and, sometimes, smug.

Uncharted: Big Data as a Lens on Human Culture, is a fun look at a pretty amazing research project. Starting as graduate students, authors Erez Aiden and Jean-Baptiste Michel wanted to use big data to answer interesting questions. What started out as a simple research question ended up jump starting the authors' careers and an entirely new way to look at big data.They came up with an idea to make a tool that could query Google's digitized library in order to determine word frequencies. Using the tool they invented, called the Google Ngram Viewer, they have been able to answer interesting questions that relate to word frequencies, explore how language changes over time, assess the adoption of new technology, assess fame, and conjecture as to how the answers to the questions they pose reflect on the prevailing culture.Although the idea is simple in concept, it wasn't so simple in execution. They had to wiggle their way into the Googleverse to get permission to use the database, write a lot of code, and iron out certain legal/copyright problems. But once all this was done, the magic began.I won't go into detail about their findings, but suffice it to say, they not only created the Ngram Viewer but used it intelligently to come to some very interesting (and often humorous) conclusions. Their analogy of Ngram Viewer as a modern equivalent of Galileo's telescope is an apt one. Without the telescope, Galileo couldn't have made some of his most important astronomic observations. Without the Ngram Viewer, it would be much impossible to look at; things like the transformation of irregular verbs over time or get a good idea when writers really started to refer to The United States in the singular (the results are surprising).

Two young research scientists from Harvard University, Erez Aiden and Jean-Baptiste Michel teamed up with Google in 2010 to create the Ngram Viewer. It sifts through millions of digitized books and charts the frequency with which words have been used. On the day that the Ngram Viewer debuted, more than one million queries were run through it. Some consider it to be at the center of a major revolution.In an interview with Studio 360`s Kurt Andersen, Aiden and Michele said how pleased they are that the new technology can open up academic research to the "independently curious.""It's good that a tool that's at the leading edge of science can generate so much enthusiasm in the general public." Michele cautions however, "it's inevitable that a tool like that will generate a large number of discussions that are actually irrelevant or that are flat-out wrong . . . it's still important that bona fide experts are the ones interpreting the research." [1]In their new book Uncharted: Big Data as a Lens on Human Culture, however, they are nowhere near so humble about the so-called "big data revolution," nor are they convinced about the value of "bona fide experts.""At its core, this big data revolution is about how humans create and preserve a historical record of their activities. Its consequences will transform how we look at ourselves. It will enable the creation of new scopes that make it possible for our society to more effectively probe its own nature. Big data is going to change the humanities, transform the social sciences, and renegotiate the relationship between the world of commerce and the ivory tower.

