The Data Doctrine: More’s Law

Last Updated:March 22, 2018, 14:47 IST

As data gathering goes into overdrive, India could lead the quest to make the whole greater than the sum of bits and bytes

ONE OF MY best memories of my dozen-year career at Intel was being given a coffee table book, One Digital Day (1998), by Rick Smolan in collaboration with Intel. I had my then bosses Andy Grove, Gordon Moore and Craig Barrett sign the inside cover with comments. In his usual style, Andy demanded his 'moving pictures within bound covers' with a smiley. That was the true Andy style of encouraging you not to rest on your laurels but to always push forward.

In this book, we explored a world where a soldier could see his newborn via video conferencing and conservation biologists studied data from cheetahs implanted with microchips in South Africa all the way to Singaporeans having their favourite fruit durian delivered home instead of carrying it in a car with its smell left behind for days. It was the decade that analog data started becoming digital. I remember a meeting where top Hollywood directors stormed out saying they would never make their content digital as it made it easy to copy. Not too long after that, content did go digital and DVDs came into being. What we call 'Big Data' is the accumulation of all the digital data that we have been collecting. To understand its true potential, one needs to grasp its evolution. Gordon Moore famously said, "If the auto industry advanced as rapidly as the semiconductor industry, a Rolls Royce would get half a million miles per gallon, and it would be cheaper to throw it away than to park it." True to his prediction, the performance of microprocessors increased, prices fell, server farms sprouted with sumptuous storage space and the analog content that became digital found a permanent home in cyberspace. Fast forward into the first decade of the new millennium: Google, Facebook, YouTube and Netflix became household names, and Mark Andreessen introduced us to the concept of cloud computing. With all these developments, every byte that travelled across cyberspace had an eternal presence hovering over us. Billions of photos, videos, personal status updates, professional dealings, profits and losses moved from paper to the personal computer. Rooms of physical paper file storage transformed into bits and bytes of cyber storage. Over the last two decades, a lot of data was collected, collated and even discarded at regular intervals. Beyond the corridors of computer companies, the world took notice of the value of large data for the first time when human genome sequencing was completed in 2003. The realisation of the importance of data with a purpose, combined with higher levels of cyber security and cloud computing, enabled researchers from various disciplines to do something unique. They could strip large amounts of data of personal details and study it to observe patterns.

Worldwide Big Data and business analytics software revenues are expected to increase from nearly $122 billion in 2015 to over $187 billion in 2019, according to marketing strategy firm Fourquadrant. India has a huge opportunity to leapfrog the internet economy and move into the Big Data economy to claim a leadership role. Talent in India can dream up solutions to solve global problems and provide data analytics services to the world just as it did with software services, while Indian industries could use it to improve their own performance.

Mapmaker remains one of the most robust and accurate crowd-sourced big data projects, allowing us to walk into any village in any country confident of finding our destination

India has taken the lead in solving global problems before. By way of an example, consider this story. In 2004, a young engineer who moved from the US to India to be part of Google India's founding team had an idea. At that time, most of the world's map data was approximate at best. Map makers in most countries managed to plot a few cities and towns on the map, but there were no maps for streets within cities, let alone villages. Most of the mapping was done in what was deemed the 'developed world' of the US and parts of Europe, where the map data was gathered by professional surveyors and expensive GPS trucks.

Lalitesh Katragadda of Google spearheaded a programme called Google Mapmaker that took on all the possible challenges to solve the problem. The basic idea was simple. Around the world, over a billion phones were profuse generators of data. Of the thousands of people who lived on any street, at least one would be willing to map the neighbourhood if shown how. Could they be inspired to help create maps for those who had none? In my conversation with Lalitesh, I understood the power of this goal and the complexity of the problem. To map the unmapped world, the project had neither the money nor the time to hire mapping companies and involve huge government agencies. The plan was to enable netizens to become mappers.

It was an audacious effort to create and edit a large map that would be accurate and held trustworthy by people across countries of the so-called 'emerging' world. Lalitesh said that he wanted to build "a global product out of India that was for the other five billion who perpetually live in information darkness". It was one of the first experiments that brought big data into the public domain. ¬¬There were four major challenges that had to be dealt with:

1) Collection of data. For most large data gathering—say, on individuals for a census—we can collect separate pieces of data and analyse it; relational databases store this information and can be sorted the way we want to study it. A map, however, is one large integral thing. One mistake made in mapping one part of the country could ripple across the entire map.

2) Moving from Experts to Everyone. When millions of people upload data, how do you make sure that they are drawing the right thing? And none of these people were professional GIS specialists who were believed to be the only ones trained at length to make maps under supervision.

3) Ease of Using the Tool. The interface used by folks from all walks of life had to be simple. It also had to let multiple people edit the map data at the same time.

4) Authentication of Data. How do you ensure that those uploading data are not making up towns and cities? At that time, crowd-sourced data had an accuracy of 40-60 per cent, but the bar set for Mapmaker was 97 per cent. With hundreds of thousands of people at work, accuracy was a challenge.

Worldwide big data and business analytics software revenues are expected to increase from nearly $122 billion in 2015 to over $187 billion in 2019

Lalitesh and another engineer, Sanjay Jain, took up the task on an empty floor in Google's Bengaluru office and slowly expanded their team. Ordinary individuals from across South and South East Asia edited the map, powered by no-SQL databases, computational geometry, machine learning and creative UX. Mapmaker was launched in these two regions in 2008, made its way across 150 countries and then finally landed in Silicon Valley in 2012. This was the first time such a technology product was developed in India, launched in the 'emerging' world first and then made it to the 'developed' world.

The smallest bylanes in the most remote villages found their identity on the map alongside streets of large cities. When major floods hit Pakistan in 2010, maps uploaded by people from remote places were used as a guide for rescue operations. Mapmaker data was uploaded into Google Maps, making it available to a global audience. In the initial stages, it took four to six months for a street uploaded into Mapmaker to be authenticated and put on Google Maps. Soon, this time reduced to nine seconds. Mapmaker has since been shut down and its features moved into Google Maps, which has become ubiquitous. It remains one of the most robust and accurate crowd-sourced large-data projects, allowing us to walk into any village in any country confident of finding our destination. Behind it lie many smaller stories—of a person who drove thousands of kilometres on his motorcycle to map the most remote areas, for example, and of others working against the odds.

The true purpose of any data is its ability to tell a story, make a prediction, peek into a possible future and ultimately empower its users. The consumer products industry has been the master of analysing data collected through focus groups, surveys and savvy minds that understood behavioural psychology. Companies would know our wants and needs even before we ourselves knew them and present their products in such a way that we felt compelled to purchase them. Now, imagine what happens when we combine the analytical capability of such a company with large amounts of reliable data and open up applications that go beyond enticing consumers to buy products. We can make predictions of disease, natural calamities and climate change as well as professional choices. Here are a few Indian examples.

BELONG.CO, FOUNDED by Vijay Sharma, Rishabh Kaul, Sudheendra Chilappagari and Saiteja Veera, is a new kind of recruitment firm. Using predictive analysis and machine learning, their platform is able to assist employers in hiring individuals who are best suited for the jobs at hand. Their algorithm looks for people whose interests and skills match the company's needs by studying their behaviour on the web, and can approach a candidate before even they themselves know they are looking for a change.

Since copious amounts of data are needed to make any prediction, a lot of the buzz has been about collecting and finding better ways to store, manage, analyse and make sense of it

Sunita Maheshwari of Telerad RX/DX, a healthcare services firm, told me that for years they would routinely delete X-rays from their hard disk to save space. Then they started working with an Artificial Intelligence company that wanted as much data as possible to be able to detect patterns. This allows a radiologist to identify the problem area on a digital image quicker. It also lets big-city doctors reach out and connect with persons in Tier 2/3 cities, run tests using the internet (with no other special equipment), coach local nurses or compounders on taking care of patients, and then feed the data on patients and their results into the AI system—anonymously—to help it improve its utility.

Applications in the field of education are no less promising. Prasad Ram, aka Pram, applied his years of experience at Google to start a non-profit organisation called Gooru. After years of working with students across ages, he had an interesting observation. The fundamental problem with the education system is that we are asking everyone to reach the same destination not knowing their personal starting point. Different students in the same class might be at different levels of a subject's understanding, and it is difficult to pinpoint which concept isn't understood by whom. Ram's platform has developed tools to enable a student to spot his or her specific area of deficiency and customise the learning to catch up. His next goal is to bring the same clarity to a student who might not have access to a personal laptop and a learning system in rural India.

With all these examples, we can see a trend on the multiple uses of Big Data. India has a huge role to play in being a creator as well as a beneficiary in this data economy.

Since copious amounts of data are needed to make any prediction, a lot of the buzz has been about collecting large quantities and finding better ways to store, manage, analyse and make sense of it. It's in my conversations with scientists Uma Ramakrishnan and Shannon Olsson that I truly understood what makes Big Data so meaningful.

The human brain is an enormously complex organ, as we all know, with over 80 billion neurons. Despite all this neuron power, we cannot locate objects as quickly or efficiently as a mosquito can find a human. Yet, the mosquito only has about 100,000 neurons in its brain. Its precision targeting ability must lie in how those 100,000 neurons work together. Likewise, it is not the amount of data, but how storehouses of it interact with one another that make the difference. And it is scientists processing the data that we pass by everyday without thinking, be it tiger faeces or insect behaviour, and make sense out of it that points to endless possibilities. There might exist enough computer power to match the collective human brain power, but that's just raw data. The real magic is in the way we all become scientists, studying all that surrounds us and making sense of it in our own way. It's in our ability to become the child who asks 'why' incessantly. And it is by asking 'Why not?' that we stay true to our purpose. In Shannon's words, "In this information age, our greatest challenge is not locating information, it's knowing what to do with it when we have it. And we still can't even compete with a mosquito on that level!"

As though to answer Andy Grove's demand, albeit a couple of decades later, a new book by Rick Smolan, The Good Fight, has arrived and it does have the moving pictures within its covers, thanks to mobile technology and augmented reality. Many technologies that we argued over, discussed, dreamt of and invested in while I was at Intel are a reality today. Some fell by the wayside, while others went beyond our wildest imagination. Delivering Durian to a Singaporean home through e-commerce was an innovation replaced by drones delivering medicines to rural areas, which also became old news even before it caught the public eye. It is in those corridors as a young professional that I learnt that the only constant in life is change and that to stay relevant, we have to reinvent our businesses, our homes and ourselves at a exponential pace. While it is thrilling to see some of the futures we had imagined come true, it is still that signature of Andy that leaves me with a lump in my throat, grateful for having had an opportunity to learn from the best. Despite all the Big Data, it's the smallest moments, the simplest acts of kindness, that stay with us forever.