Data Science Research in Economics

On July 2, 2025 I presented at the Economic Research Seminar at the University of Hohenheim on Data Science Research in Economics.
The talk highlighted how new data sources, larger datasets, and methodological innovation are reshaping economic research - and how generative AI is accelerating this transformation.

Why Data Science Matters

Economic research is increasingly shaped by:

Access to new and larger data sources,
Advanced analytics and automation (pipelines, reactive visualization),
Productivity gains from generative AI (e.g. AI agents preparing IPO documentation within minutes),
The need for economic theory to guide ethical alignment of AI.

This requires an expanded skillset that integrates economics, statistics, and computing - the three circles of data science.

Research Projects Presented

I illustrated these themes with three ongoing lines of work:

New Data: Index of Prices Searched Online (IPSO)
Joint Work with T. Dimpfl. By reconstructing daily Google Trends data at high frequency, I developed a method to create an index of prices searched online.
IPSO shows promise for forecasting US inflation and consumption, reducing forecast errors compared to AR benchmarks.
Big Data: Water Levels and Fuel Prices
Joint work with K. Kuck investigates whether German fuel prices are affected by extreme water levels on major rivers.
We combine ~160 GB of data:
- Fuel price data from 17,592 stations (since 2014),
- High-frequency river gauge data (since 2000),
- GIS and border shapefiles.
  Rivers matter for transporting ~20% of oil products in Germany; preliminary results show that both low and high water events can shift local fuel prices.
New Methodology: Transfer Entropy and Conditional Density Estimation
Building on my earlier work, I proposed using smoothed quantile regressions to estimate transfer entropy and mutual information without discretization.
Applications include:
- Non-linear causality analysis,
- Conditional density forecasts (e.g. US inflation fan charts),
- Improved inference through GMM-based standard errors.

A related project with C. Tarantola explores a Bayesian-Frequentist approach to post-double selection regressions on survey data with multiple imputation. The idea is to guide variable selection by Bayes factors across multiply imputed and bootstrapped datasets.

Broader Lessons

Across these projects, some recurring themes emerge:

Speed: Data collection and computation accelerate research cycles.
Scale: Handling big datasets (fuel prices, hydrology, census data) requires efficient algorithms and infrastructure.
Methods: Robust econometric methods remain essential for uncertainty, causality, and bias - areas where AI tools alone fall short.
Infrastructure: GPU resources and modern pipelines will become central in applied econometrics.

Conclusion

Economic research is transforming rapidly. Generative AI, new data, and scalable methods promise productivity gains - but also demand stronger mathematical and statistical reasoning to ensure reliability.

The “three circles of data science” - economics, statistics, and computing - are more important than ever for meaningful contributions in our field.