Prof Anindya Ghose (AG) of New York University’s Stern
School of Business, in conversation with Abraham
Kuruvilla (AK) on issues relating to big data and the
Professor’s experience with crowd funding.
AK: Prof Ghose I hear that Big Data is defined by the 3Vs of Volume, Velocity and Variability. Would you illustrate these parameters for the benefit of both technologists and business users? AG: Yes, Big Data is all about Volume, Variety, Velocity and its unstructured nature. Some three billion gigabytes of data is being generated every day, say the equivalent of information stored in a thousand million filing cabinets. Every second, there is more data flowing across the Internet than what was stored on it 20 years ago. Another contrasting anecdote is that today an average person processes more data in one day than was done in the 1500s in one’s entire lifetime. It is being generated, from messages, pictures, video images, social media, cellphone signals, turnstiles, on-line shopping, automated machines and so on in a variety of ways. All this is real time data, and much of it from sources that didn't exist a decade ago. It is essentially dynamic and unstructured data, being generated at very high speeds. Such data is typically difficult to organize in conventional databases. By inferring interesting patterns and critical insights into customer behaviour and their underlying causes, its analysis could enable companies gain an edge over competitors.
AK: High volume of data in structured and unstructured form has been around for a long time in large businesses. So, why this recent hype around it! Is it technology driven or is it market driven, or is there some new technology that has come to the forefront? Or is it being driven by competitive forces, driving more businesses into mining more data?” AG: “Yeah, I would say all of the above. I’ll take the relatively narrow perspective of the marketing world. So over the last few decades, what we’ve seen is that decisions have been primarily driven by what we call ‘gut feel’. And that’s changing rapidly. Imagine a boardroom conversation, where the Vice President of Sales stands up and says “Look, you know, I believe we should do this” because of so-and-so; he or she is not going to get a push back from the subordinates saying “Let’s not just go with beliefs and guts, but also lets look at the data.” But the trend now is that decisions are being driven by insight garnered from mining data sets, as opposed to simply relying on instincts. This doesn’t mean that instinct or intuition doesn’t have a role at all – in fact, they do have a very important role. It’s just that the time when they play a role has changed. Earlier it used to be gut-driven completely, but now when you have data, people expecting companies to mine their data, look for insights, and then when you have interpreted those patterns or findings, that’s where creativity and intuition or gut feeling are seen to play a very important role. This is essentially what is driving businesses to adopt big data methods and analytics and see if they can make those improvements in their bottom line.”
AK: “So, on one side it is driven by the need to gain a competitive advantage, to be more realistic in your decision making. On the other side, it is these new technologies that are making this possible - remembering that the same problem has been in existence for years!”
AG: “Actually, what is happening now is that there is an infrastructure that is being created, such as Hadoop, which is enabling companies mine such massive data sets at a much more rapid pace. In other words, earlier, we had this data, but the reaction time of companies would be a month; now that can be a matter of days, or even a matter of hours. So the infrastructure underlying the collection of such massive data sets is being put in place by a number of companies, and one such example is ‘Hadoop’, which basically involves massive parallel processing of computers using a grid like structure.
AK: “And what industry sectors and in which functions within them, other than marketing, will big data have the most impact?”
AG:“One example and this is more US based, it’s from the public sector: the New York Police Department is piloting a big data scheme to predict crime. Essentially it involves an algorithm that predicts where crime is likely to take place based on mining hundreds of thousands of prior incidents based on location, time, people involved and so on. And when they are able to predict where a crime is likely to take place, they are going to be patrolling those particular areas more often. And in the pilot phase, we’ve seen a 12% decrease in property crimes, 26% decrease in burglary and these models are in fact working. Now that the pilot seems very successful, the Los Angeles Police Department is planning to kind of phase in a full-blown project, which would then spread across 150 cities in America. Every notable police department will deploy a similar algorithm. I like this example, because even though it’s not from the private sector, it is really meaningful in terms of societal impact. AK: “As far as big data is concerned, do you see from your research if it is going to impact consumer finance and retail banking?” AG: “We haven’t seen any remarkable examples, but my understanding is that the insurance industry has gone up by mining a lot of data, to see at what point are people more likely to churn out, and what they see is people are most likely to churn out at the time the policy has to be renewed – and so if you can figure that out, then you can figure out some good incentives to prevent people from churning out; for example, a week or two before the policy expires, you send a non-trivial discount to incentivise renewal. A similar example that I’m seeing is with credit cards – in the banking space – in preventing credit card churn, especially with new technologies on mobile payments coming in, the credit card industry is facing a non-trivial amount of competition from them, at least in the US. So models and methods that better predict what is causing people to switch and at what point, would be immensely useful for banking and financial services.” AK: “Other than technology costs, what are some of the constraints to the growth of big data applications? How do you see them being overcome?” AG: “The most important constraint, I would say, is human capital. And the reason I say so is that, right now, the way the industry is working – we have more or less sort of arrived at a consensus that we need a variety of different scales for a person to work in this area, and as designations go, you know, it could be a data scientist, or it could be a business analytics specialist – there are a couple of different names being thrown out there! What nobody seems to figure out is, what are the skill-sets required in a person who will work in this way? And that is partly because this job is not well defined. My understanding comes not only from the fact that I’m a professor in the NYU and I come across many students in my classroom, but because I also run a centre for Business Analytics at NYU, which makes me interact quite often with practitioners and folks like you from the industry. What I hear is that people are essentially looking at these things:
1. In-depth understanding of statistics, very broadly defined that includes both predictive modelling for data mining and explanatory modelling for econometrics and so on. 2. Some expertise in programming, we are not talking about just Java or C++, but about some of the more recent programming languages involved in data analysis such as R and Python and Hyde and so on. 3. You need, essentially, data visualization – what I mean by that is even before you start doing any modelling or analysis of the data, the person should have the skills to simply stare at the data set and make inferences based on what kind of questions we should be looking for. We need to identify the questions first before we actually dive in deep into the data, and so data visualization can be very important. 4. Primarily, the core skill is domain expertise. If you include one person from banking and one from consumer marketing and expose them to the same patterns, it is very likely that they will come out with different interpretations of those patterns, and that is driven by each specific domain expertise. So, I think the biggest constraint is in identifying an individual who can have all these four skills – obviously it is an incredibly difficult task. So right now, companies are looking for anybody who has at least two of these four, and hopefully, from an academic institution, our role is to view and churn out more graduates with all four skills. The good news is that a number of leading business schools are taking the initiative in starting a massive programme in business analytics, very very specifically targeted at equipping students with these four skills. And our expectation and hope is that in a year or two from now this demand-supply gap will be reduced and once this happens, a huge constraint will go away.” AK: “Besides domain expertise, they would want people with very very high IQ to educate them in this business of big data applications – wouldn’t that be so?” AG: “To some extent I think the first three skills are something that we can impart in the classroom. We can equip them with predictive modelling for data mining and explanatory modelling skills, we can equip them with programming skills - R and Python and also with Data Visualization. But the main expertise is one of the hardest skills to impart; because that is something you carry with you, based on where you’re coming from – and domain expertise typically means that: if you have been working in Banking and Financial Services for a number of years, then you have a pretty good sense of the institutional details in that industry, and how different processes work, how people interact and the cultural dimension in each organization. So when you see data from that industry, you are likely to have an advantage over somebody who comes from CPG, because they are not going to have the same depth of institutional knowledge that we do. So that’s where domain expertise plays a very important role.” AK: “So you are saying it would be easier to educate people with good domain expertise, in the first three areas, than to educate freshers?” AG: “Not typically educate, but it would be valuable for a person having those three skills to also have relevant domain expertise before he or she decides to go into a certain industry. If you have been working in Banking and Financial Services, and you come back for let’s say a Master’s Programme in Business Analytics, where you are trained in statistics programming and data visualization, then when you go back to Banking and Financial Services that is a pretty strong edge to have, as opposed to somebody who goes to a completely different industry. Because the second person is very likely to not have the same level of domain expertise! He or she may have come from advertising, or from manufacturing.” AK: “Lastly, Dr. Ghosh, this is quite tangential – could you say in a few words, something about crowd funding, about its growth and impact on government and society?”
AG: “Sure. I have been working a lot of late in crowd funding with a number of companies over here. The largest of them is called Indiegogo, which is globally the largest crowd funding platform. The basic idea is that, in the traditional world, entrepreneurs would have to go to venture capitalists and angel investors for potential seed funding. We see that angel investors form a very closed old-boys network. it is based entirely on who knows whom, and so many small or relatively unknown entrepreneurs would get shut off from the market because they are not well connected with the network of VCs. So then came in the phenomenon of crowd funding, which basically means that you harness the wisdom and the funding of the crowds to fund a certain idea, a project, or a product. The core idea behind this is that, let’s say you have a great idea but you don’t have the resources to go ahead and produce it, or launch it; then float the idea on a platform like Indiegogo, in the hope that the average person out there are willing to support you. And so this person would throw in a few dollars, and this would start a cascade where other people would see few people throwing in dollars, then they would jump in, and then eventually – if the idea has merit – this can generate a non-trivial fund of money. So that’s what crowd funding is, basically relying on the resources of the crowd; complete strangers, people we don’t even know, to fund your entrepreneurial project or idea.” AK: “Simplistically: would you say that Obama’s first election campaign was run on crowd funding?” AG: “Yes to some extent. I mean, that is a kind of crowd funding - an atypical example. Most of the projects that we have seen involve, let’s say you have an idea for producing a movie – a documentary movie or an idea for writing a book, or for producing a new kind of watch. One of the most successful crowd funding stories was the Pebble watch. So it can be a product or even a small company. And one of the interesting things that is going to happen is, recently, the Obama government has allowed in the US to have equity based crowd funding, which means that people who are going to be donating to a certain campaign, they’re going to get equity – actual equities from the company they fund. Presently, the incentive for the people who donate money is pure altruism or some nominal reward out of it, but there is no equity involved. But a few months down the line, the law has been passed, and so we are going to see the emergence of equities in crowd funding. If I put in some money, I am going to get some shares of that company – almost like a kind of substitute for the stock market. So those are some of the types of crowd funded projects that we could see – although politicians should be able to use it for their own campaign, but those are atypical.”
AK: Prof Ghose I hear that Big Data is defined by the 3Vs of Volume, Velocity and Variability. Would you illustrate these parameters for the benefit of both technologists and business users? AG: Yes, Big Data is all about Volume, Variety, Velocity and its unstructured nature. Some three billion gigabytes of data is being generated every day, say the equivalent of information stored in a thousand million filing cabinets. Every second, there is more data flowing across the Internet than what was stored on it 20 years ago. Another contrasting anecdote is that today an average person processes more data in one day than was done in the 1500s in one’s entire lifetime. It is being generated, from messages, pictures, video images, social media, cellphone signals, turnstiles, on-line shopping, automated machines and so on in a variety of ways. All this is real time data, and much of it from sources that didn't exist a decade ago. It is essentially dynamic and unstructured data, being generated at very high speeds. Such data is typically difficult to organize in conventional databases. By inferring interesting patterns and critical insights into customer behaviour and their underlying causes, its analysis could enable companies gain an edge over competitors.
AK: High volume of data in structured and unstructured form has been around for a long time in large businesses. So, why this recent hype around it! Is it technology driven or is it market driven, or is there some new technology that has come to the forefront? Or is it being driven by competitive forces, driving more businesses into mining more data?” AG: “Yeah, I would say all of the above. I’ll take the relatively narrow perspective of the marketing world. So over the last few decades, what we’ve seen is that decisions have been primarily driven by what we call ‘gut feel’. And that’s changing rapidly. Imagine a boardroom conversation, where the Vice President of Sales stands up and says “Look, you know, I believe we should do this” because of so-and-so; he or she is not going to get a push back from the subordinates saying “Let’s not just go with beliefs and guts, but also lets look at the data.” But the trend now is that decisions are being driven by insight garnered from mining data sets, as opposed to simply relying on instincts. This doesn’t mean that instinct or intuition doesn’t have a role at all – in fact, they do have a very important role. It’s just that the time when they play a role has changed. Earlier it used to be gut-driven completely, but now when you have data, people expecting companies to mine their data, look for insights, and then when you have interpreted those patterns or findings, that’s where creativity and intuition or gut feeling are seen to play a very important role. This is essentially what is driving businesses to adopt big data methods and analytics and see if they can make those improvements in their bottom line.”
AK: “So, on one side it is driven by the need to gain a competitive advantage, to be more realistic in your decision making. On the other side, it is these new technologies that are making this possible - remembering that the same problem has been in existence for years!”
AG: “Actually, what is happening now is that there is an infrastructure that is being created, such as Hadoop, which is enabling companies mine such massive data sets at a much more rapid pace. In other words, earlier, we had this data, but the reaction time of companies would be a month; now that can be a matter of days, or even a matter of hours. So the infrastructure underlying the collection of such massive data sets is being put in place by a number of companies, and one such example is ‘Hadoop’, which basically involves massive parallel processing of computers using a grid like structure.
AK: “And what industry sectors and in which functions within them, other than marketing, will big data have the most impact?”
AG:“One example and this is more US based, it’s from the public sector: the New York Police Department is piloting a big data scheme to predict crime. Essentially it involves an algorithm that predicts where crime is likely to take place based on mining hundreds of thousands of prior incidents based on location, time, people involved and so on. And when they are able to predict where a crime is likely to take place, they are going to be patrolling those particular areas more often. And in the pilot phase, we’ve seen a 12% decrease in property crimes, 26% decrease in burglary and these models are in fact working. Now that the pilot seems very successful, the Los Angeles Police Department is planning to kind of phase in a full-blown project, which would then spread across 150 cities in America. Every notable police department will deploy a similar algorithm. I like this example, because even though it’s not from the private sector, it is really meaningful in terms of societal impact. AK: “As far as big data is concerned, do you see from your research if it is going to impact consumer finance and retail banking?” AG: “We haven’t seen any remarkable examples, but my understanding is that the insurance industry has gone up by mining a lot of data, to see at what point are people more likely to churn out, and what they see is people are most likely to churn out at the time the policy has to be renewed – and so if you can figure that out, then you can figure out some good incentives to prevent people from churning out; for example, a week or two before the policy expires, you send a non-trivial discount to incentivise renewal. A similar example that I’m seeing is with credit cards – in the banking space – in preventing credit card churn, especially with new technologies on mobile payments coming in, the credit card industry is facing a non-trivial amount of competition from them, at least in the US. So models and methods that better predict what is causing people to switch and at what point, would be immensely useful for banking and financial services.” AK: “Other than technology costs, what are some of the constraints to the growth of big data applications? How do you see them being overcome?” AG: “The most important constraint, I would say, is human capital. And the reason I say so is that, right now, the way the industry is working – we have more or less sort of arrived at a consensus that we need a variety of different scales for a person to work in this area, and as designations go, you know, it could be a data scientist, or it could be a business analytics specialist – there are a couple of different names being thrown out there! What nobody seems to figure out is, what are the skill-sets required in a person who will work in this way? And that is partly because this job is not well defined. My understanding comes not only from the fact that I’m a professor in the NYU and I come across many students in my classroom, but because I also run a centre for Business Analytics at NYU, which makes me interact quite often with practitioners and folks like you from the industry. What I hear is that people are essentially looking at these things:
1. In-depth understanding of statistics, very broadly defined that includes both predictive modelling for data mining and explanatory modelling for econometrics and so on. 2. Some expertise in programming, we are not talking about just Java or C++, but about some of the more recent programming languages involved in data analysis such as R and Python and Hyde and so on. 3. You need, essentially, data visualization – what I mean by that is even before you start doing any modelling or analysis of the data, the person should have the skills to simply stare at the data set and make inferences based on what kind of questions we should be looking for. We need to identify the questions first before we actually dive in deep into the data, and so data visualization can be very important. 4. Primarily, the core skill is domain expertise. If you include one person from banking and one from consumer marketing and expose them to the same patterns, it is very likely that they will come out with different interpretations of those patterns, and that is driven by each specific domain expertise. So, I think the biggest constraint is in identifying an individual who can have all these four skills – obviously it is an incredibly difficult task. So right now, companies are looking for anybody who has at least two of these four, and hopefully, from an academic institution, our role is to view and churn out more graduates with all four skills. The good news is that a number of leading business schools are taking the initiative in starting a massive programme in business analytics, very very specifically targeted at equipping students with these four skills. And our expectation and hope is that in a year or two from now this demand-supply gap will be reduced and once this happens, a huge constraint will go away.” AK: “Besides domain expertise, they would want people with very very high IQ to educate them in this business of big data applications – wouldn’t that be so?” AG: “To some extent I think the first three skills are something that we can impart in the classroom. We can equip them with predictive modelling for data mining and explanatory modelling skills, we can equip them with programming skills - R and Python and also with Data Visualization. But the main expertise is one of the hardest skills to impart; because that is something you carry with you, based on where you’re coming from – and domain expertise typically means that: if you have been working in Banking and Financial Services for a number of years, then you have a pretty good sense of the institutional details in that industry, and how different processes work, how people interact and the cultural dimension in each organization. So when you see data from that industry, you are likely to have an advantage over somebody who comes from CPG, because they are not going to have the same depth of institutional knowledge that we do. So that’s where domain expertise plays a very important role.” AK: “So you are saying it would be easier to educate people with good domain expertise, in the first three areas, than to educate freshers?” AG: “Not typically educate, but it would be valuable for a person having those three skills to also have relevant domain expertise before he or she decides to go into a certain industry. If you have been working in Banking and Financial Services, and you come back for let’s say a Master’s Programme in Business Analytics, where you are trained in statistics programming and data visualization, then when you go back to Banking and Financial Services that is a pretty strong edge to have, as opposed to somebody who goes to a completely different industry. Because the second person is very likely to not have the same level of domain expertise! He or she may have come from advertising, or from manufacturing.” AK: “Lastly, Dr. Ghosh, this is quite tangential – could you say in a few words, something about crowd funding, about its growth and impact on government and society?”
AG: “Sure. I have been working a lot of late in crowd funding with a number of companies over here. The largest of them is called Indiegogo, which is globally the largest crowd funding platform. The basic idea is that, in the traditional world, entrepreneurs would have to go to venture capitalists and angel investors for potential seed funding. We see that angel investors form a very closed old-boys network. it is based entirely on who knows whom, and so many small or relatively unknown entrepreneurs would get shut off from the market because they are not well connected with the network of VCs. So then came in the phenomenon of crowd funding, which basically means that you harness the wisdom and the funding of the crowds to fund a certain idea, a project, or a product. The core idea behind this is that, let’s say you have a great idea but you don’t have the resources to go ahead and produce it, or launch it; then float the idea on a platform like Indiegogo, in the hope that the average person out there are willing to support you. And so this person would throw in a few dollars, and this would start a cascade where other people would see few people throwing in dollars, then they would jump in, and then eventually – if the idea has merit – this can generate a non-trivial fund of money. So that’s what crowd funding is, basically relying on the resources of the crowd; complete strangers, people we don’t even know, to fund your entrepreneurial project or idea.” AK: “Simplistically: would you say that Obama’s first election campaign was run on crowd funding?” AG: “Yes to some extent. I mean, that is a kind of crowd funding - an atypical example. Most of the projects that we have seen involve, let’s say you have an idea for producing a movie – a documentary movie or an idea for writing a book, or for producing a new kind of watch. One of the most successful crowd funding stories was the Pebble watch. So it can be a product or even a small company. And one of the interesting things that is going to happen is, recently, the Obama government has allowed in the US to have equity based crowd funding, which means that people who are going to be donating to a certain campaign, they’re going to get equity – actual equities from the company they fund. Presently, the incentive for the people who donate money is pure altruism or some nominal reward out of it, but there is no equity involved. But a few months down the line, the law has been passed, and so we are going to see the emergence of equities in crowd funding. If I put in some money, I am going to get some shares of that company – almost like a kind of substitute for the stock market. So those are some of the types of crowd funded projects that we could see – although politicians should be able to use it for their own campaign, but those are atypical.”
No comments:
Post a Comment