Choir room

Data Hoarding as Double Marginalization

Data Hoarding as Double Marginalization

Ed Steinmueller, SPRU, University of Sussex
Simone Vannuccini, SPRU, University of Sussex

Extended Abstract

The rise of machine learning artificial intelligence (AI) algorithms, the availability of massive datasets, further advances in computing power and the increasing sophistication of automated production has set the stage for radical change in current sociotechnical systems. Indeed, a rich and expanding array of machine learning techniques are quickly making their way through the decision-making pipeline of companies, and scholars already suggest they might represent a novel general purpose technology. While the latter claim deserves further scrutiny, the rapid diffusion of prediction algorithms is undeniable in many fields, in particular those in which the drop in the relative cost of prediction can generate increasing benefits and efficiency gains. Despite that, the microeconomic implications of such transformations have not yet received full attention or a satisfying analytical description.
In the domain of business decision-making and industrial organisation, the application of predictive analysis to high-dimensional unstructured datasets has little to do with the technology-angst of machines starting to think, i.e. the emergence of artificial general intelligence. Questions of whether these developments are labour demand displacing or enhancing (new skills and jobs) are important but governed by the way in which machine learning is implemented. A foundational observation is that these techniques produce ‘marketing done better’, e.g. whether companies employing these techniques are able to achieve first degree price discrimination through sorting, classification and recommendation systems. Two interesting questions in this context are i) what happens to the market organization while the economic system rewires its working logic around the new core input of high-dimensional data and ii) how market actors collect and use this fundamental input in shaping strategic choices.
In this paper, we focus on the industrial organization of a market deeply affected by data elaboration to highlight the trade-offs and competing incentives firms face. Recent modelling exercises (Farboodi et al. 2019) studies how Big Data influences firms dynamics, by showing how data generation – a by-product of production and selling activities – generates a ‘data feedback loop’, with data becoming a source of success-breeds-success. While other studies highlighted the emergence (to different degrees) of decreasing returns to data accumulation (Bajari et al. 2018; Azevedo et al. 2018), the hoarding of data has become a relevant choice variable in determining firms’ behavior and a lever to foster competitive advantage. We contribute to this growing research trajectory by including a more realistic view on the market structure that exploits data in seeking price discriminations. More precisely, we highlight how data hoarding as a strategic choice can lead to double marginalisation when data elaboration is conducted by specific actors that own machine learning platforms ‘on the Cloud’. Retail companies can use these data analytics services to discover the ‘properties’ of the customers valuation functions (the distribution of the ‘true’ value of willingness to pay for a given good or bundle of goods).
The intuition of the model goes as follows: retail firms aim to reach first degree price discrimination through sorting, classification and recommendation systems, and to do that, they need to resort to data analytics companies that have unique capabilities to process and extract value and patterns from the high dimensional data retail firms collect. The information superiority that data analytics actors gain lead to incentives for ‘data hoarding’, used to construct specific information monopolies. This, in turn, has the potential to produce double marginalization effects offsetting the welfare gains realized through price targeting or discrimination. We combine several modeling strands in tractable model to explore the viability of policy suitable data governance schemes that limit data hoarding and thereby influence total welfare, firms’ decisions and market structure.


Azevedo, E. M., Alex, D., Montiel Olea, J., Rao, J. M., & Weyl, E. G. (2018). A/b testing.
Bajari, P., Chernozhukov, V., Hortaçsu, A., & Suzuki, J. (2018). The impact of big data on firm performance: An empirical investigation (No. w24334). National Bureau of Economic Research.
Farboodi, M., Mihet, R., Philippon, T., & Veldkamp, L. (2019). Big Data and Firm Dynamics (No. w25515). National Bureau of Economic Research.