I wanted to talk about a recent article from Foster Provost and Tom Fawcett, in which they discuss data science, and how it is being used to benefit businesses. They argue that data science lacks an academic authority and explains why there is no little understanding of what data science is and therefore encourages further discussion “in order for data science to flourish” (2013).
Provost and Fawcett define data science as “a set of fundamental principles that support and guide the principled extraction of information and knowledge from data.” (2013) In basic terms, data science tries to identify specific methods of extracting information from data.
The article gives examples of how data can be used to create useful information, particularly for businesses. Firstly, it explains how Walmart used data to identify their customers’ buying trends before a hurricane. The data showed that the demand for strawberry pop tarts increased “seven times their normal sales rate” and the “pre-hurricane top-selling item was beer”. Walmart could then use this unexpected information to predict customers needs before a hurricane in preparation for other disasters in the future. This application of data science is known as ‘Data-Driven Decision Making’ (DDD) and shows how discoveries can be made by studying data which will help to make more informed decisions.
The article gives another example of DDD by which mobile phone company use customer data to predict whether or not customers are likely to leave the company once they near the end of thir phone contract, based on usage, service, history, how many of their friends have cancelled their contract etc. The phone company’s then use this information to work out how loyal the customer is and therefore how much attention/persuasion will be necessary to make sure they don’t change phone company. Again, this is another example of DDD, however unlike the first example, this shows patterns in data that can be used to help make decision, especially on a massive scale where these small increases of accuracy can make a company a lot more competitive.
Quick Summary:
Walmart: Discovered that people buy more pop-tarts
Phone company: Repeat consumer patterns made more informed decisions.
Economist, Erik Brynjolfsson found “statistically that the more data-driven a firm, the more productive it is”. He developed a DDD scale that measured how strongly a firm uses data to help make decisions and found that “one standard diviation higher on the DDD scale is associated with with a 4-6% increase in productivity” as well as a higher return on assets, return on equity, asset utilization and market value. Clearly using DDD will make a business decisions a lot more effective.
Most business systems are computerized to collect data which is then used to make DDD automatically by computers such as the buying and selling of share in Wall Street. The share market changes so quickly and the exchanges are so quick it would be impossible for a person to keep up and be able make good decisions therefore computers are used to make these impossibly quick decisions to trade based on estimations from data, and scientists are constantly used to try and improve these decision making computers to make them decide faster and more effective decisions.
Data is used in so many aspects of business the article explains that it is now essential for anyone involved in business including managers and even some line workers to have a basic knowledge of data analysis and data science to be able to understand what the data shows and how it can be used from deciding whether to buy shares or realizing what customers prefer, data can reveal an endless amount of useful information that stakeholders must be able to use.
The internet has massively further increased the amount of useful data which is growing at break neck speed with people adding content and new applications and websites that collect and mine data. We are now surrounded by technology that is collecting information about us that company’s want to access to understand us better so that they can make better business decisions and give us what we want. However, regardless of the amount of data around us and the growing importance of business using this data, as the article explains, there is still a lack of a investment and research into data science, which is why there is little understanding of data science. ” Without academic programs defining the field for us, we need to define the field for ourselves. However each of us sees the field from a different perspective and thereby forms a different conception.” Different industries and academics use data in different ways therefore without an authoritative body it is difficult to define the field of data science, and “due to the state of the art in data processing, data scientists tend to spend a maority of their problem-solving time on data preparation and processing.” Like at Wall Street data scientist spend so much of their time processing and preparing data no one has bothered to develop an establishment.
Nevertheless, the authors realize that there has been some progression, “For example, in New York City alone, two top universities are creating degree programs in data science” which will lead to further academics and research into the field.
Regardless of the lack of authority and research into data science scientist have identified a set of fundamental concept underlying the principled extraction of knowledge from data.
- Extracting useful knowledge from data to resolve business problems can be treated systematically by following a process with reasonably well defined stages. e.g. The Cross Standard Process of Data is one of these processes.
- Evaluating data-science results requires careful consideration of the context in which they will be used: you have to look at the context and sthe situation as well as the data to make a decision e.g. if loads of bought umbrellas 1 day in summer when it rained its probably doesn’t mean that it would be a good idea to stock loads of umbrella for summer.
- The relationship between the business problem and the analytics solution often can be decomposed into tractable subproblems via the framework of analyzing expected value: Data must be split into small departments in the context of a number of different things going on that must all be brought together to make a sensible evaluation.
- Information technology can be used to find items from within a large body of data
- Entities that are similar with respect to known features or attributes often are similar with respect to unknown features or attributes: The main tool of data science is computing similarity to identify a correlation.
- If you look too hard at a set of data, you will find something but it might not generalize beyond the data your observing
- To draw casual conclusions one must pay very close attention to the presence of confounding factors, possibly unseen one: you have to incorporate assumptions regarding the presence or absence of confounding factors to be able to extract knowledge from data.
Conclusion: We are surrounded by data, particularly with the introduction of the internet and new technology and business must use this data to make themselves more competitive. With this growing amount of data used everywhere there need to be further research into data science to help people realize the best ways to extract information from the data around us.
AMAZING! 😀