Of Borges and Big Data, Or: Is Big Data Too Big?
…In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.
— Suarez Miranda,Viajes devarones prudentes, Libro IV,Cap. XLV, Lerida, 1658
Jorge Luis Borges, Collected Fictions, translated by Andrew Hurley.
The above paragraph is the entire “fiction” by Borges entitled On Exactitude in Science. I was reminded of this piece when I read a recent GigaOm post positing (and I paraphrase here) that big data is too big, that data is being generated and recorded faster than we can ever hope to analyze and understand it, and that it is therefore useless, and it is a fool’s errand to pursue the amassing of more data. The post concludes: “We . . . lack an understanding of how to automate the interpretation of data in an intelligent way, and will do so for the foreseeable future.”
In Borges’ fiction, the map that describes the entirety of the empire by becoming coextensive with the empire is recognized to be useless. What good is a map that is the size of the empire? (Can we distinguish the empire from the map? Reality from The Matrix?)
Big data is useless only if we cannot understand it, if we cannot extract knowledge, and even wisdom, from the data. Big data is not useless to us if we can analyze it as it is generated, if not faster. And we are now able to do so. Unsupervised learning allows us to direct our machines to learn without any predetermined perspective, even adding data as it is generated to the data sets being analyzed. The knowledge yielded by unsupervised deep learning can then be applied to new data, new situations, new patients. That is how artificial intelligence can aid in diagnosing and treating disease. We should no longer need to wait seventeen years for conclusions drawn in medical journals to become accepted clinical practice.
Google Brain is taking this approach to other problems, and may yet turn its attention to health care. One interesting note in the recent piece on Google Brain and AI in the NY Times is that Google developed a new chip to aid in its machine learning calculations that is capable of faster, but less precise, computation. We are more interested in the gestalt drawn from a massive volume of computations, which will be more generalizable, than in precise results from a handful, which are more liable to yield the familiar comment: “further study is needed.” These results, which are drawn from the application of machine learning algorithms in unsupervised learning, inform the development of ever more useful AI algorithms that can build on a mass of data that until recently would indeed have been effectively useless — just as the laboriously detailed map, coextensive with empire, is ultimately useless and is properly abandoned.
While the map lies in tatters, the pace of change accelerates; automation of data interpretation is just around the corner.
This article was originally published on HealthBlawg and is republished here with permission.