What is the value of data? Who benefits from data? Can you construct a decent predictive model to manage risk if you have lousy data?
Some enterprising souls have tried to sell their data back to the tech giants that collect and use big datasets. There are even some small companies such as CitizenMe and Datum that pay a user for taking a quiz or sending location data.
“Data are far from becoming a standardized commodity whose value can readily be established through trades,” writes Diane Coyle, professor at University of Cambridge, in a report she tabled February 26, 2020, for the Nuffield Foundation. The goal of the report was to produce a “framework identifying key types of data and the appropriate approaches for valuing them.”
Data Infrastructure
Data can be thought of as a form of virtual infrastructure on which we increasingly rely. Aspects of designing data infrastructure include: data assets (such as databases); standards and technologies providing access to the data; policies and regulations around the use of data (such as Europe’s GDPR law); organizations that govern the infrastructure; and communities that are stakeholders (such as a city that both provides and benefits from data).
What is Meant by Value?
The authors of the report emphasize that the value of data is beyond monetary. “In this report ‘value’ refers to the economic concept of social welfare: the well-being of all society. Value arises from data when businesses create jobs or become more productive; when governments deliver more effective public services; when our environment is clean and diverse; and when people live happier and healthier lives.”
Estimating the value of data
It’s well known that data has market-based value (stock market valuations, income-based valuations). But pause and consider the broad non-market-based valuation. The authors of the report assess the economic value of free and open datasets such as Landsat data and Transport for London (TfL) data.
Economic Characteristics
Data has some unique economic characteristics, the report reminds us. It is non-rival, meaning that “many people can use the same data at the same time without it being used up,” rather like radio waves.
Data varies in whether it is excludable. For data that is easy to collect, such as images from websites, it is called a “public good.” Other data can only be collected by those who deliver a service, such as power usage of a household, and is called a “club good.”
Data involves externalities. Positive externalities occur when one database crosslinked to another comes up with more information. For example, consider the cold cases that were solved when law enforcement agencies ran DNA samples against genetic-testing databases set up for family trees.
As more data is collected, it may have increasing or decreasing returns. Data has a large option value, because we don’t know yet the value of the data to future decisions. This leads some organizations to “keep data for its potential value rather than its current value.”
Data collection often has a high up-front cost and low marginal cost. An organization might have to pay a lot for hardware (such as sensors), or for rendering the analog signals in digital form. However, afterward, the data might be cheap to collect, especially if automated.
Data Characteristics
The informational characteristics of the data are diverse. The subject matter—geographical location, financial, health, transport—provide the incentive for obtaining the data. Some datasets are very general-purpose, others are highly specific. The temporal coverage of data can be past, present, or future. For example, historical data may be used to “train” models; forecasts are useful during planning; and real-time data is valuable for those operating retail outlets. The quality of the data refers to its completeness, accuracy, and timeliness. Data can be sensitive, such as the geolocation of endangered species.
Last but not least, to be truly valuable, the data should be interoperable and linkable. “Interoperability affects how easy it is to work with and aggregate datasets; linkability is the ease with which they can be joined up. Both make it easier to get value from data as combining datasets enables new insights,” notes the report.
Key Conceptual Challenges
- How can we, as a society, fund data as a public good, when it may need a large up-front investment?
- How can we incentivize investment in data when the benefits from it will go beyond the investors?
- How can the benefits that arise from using data be fairly distributed?
- How can we compensate those who steward data for the costs and risks they take?
- How can we gain value from aggregated personal data while respecting people’s privacy?
- How can we ensure data will be linked and combined to create positive externalities?
- How can we keep options open for potential future uses of data?
“There are growing concerns about [data’s] potential to be abused, including in the case of facial recognition and other applications that involve a significant erosion of privacy,” Coyle wrote in a commentary on the report.
She urges the government and non-profit sector to get involved in the dialogue about data. “Markets alone will not make the most of this new resource, owing to non-rivalry and various externalities.”♠️
Click here to read the European Commission’s newly proposed data strategy.