We introduce FRIEDA, a benchmark designed to stress-test open-ended, multi-step cartographic reasoning in large vision-language models (LVLMs).
FRIEDA is built from real map figures collected from documents and technical reports across diverse domains (e.g., geology, urban planning, environmental assessment) and geographic regions. Grounded in GIS theory, FRIEDA spans a diverse spectrum of spatial relations: topological (border, equal, intersect, within), metric (distance), and directional (orientation). Questions are deliberately compositional: every example requires multi-hop inference, and many demand cross-map grounding, where evidence must be located and integrated across multiple maps.
We evaluate 11 state-of-the-art LVLMs in two regimes: direct, where the relevant map(s) are provided, and contextual, where the model must first discover which map(s) matter before answering. Despite strong vision and language capabilities, performance remains low: Gemini-2.5-Pro and GPT-5-Think reach only 38.20% and 37.20% accuracy, respectively, compared to 84.87% for humans.
These results highlight a persistent gap in spatial intelligence over real-world maps. FRIEDA offers a rigorous, realistic target for measuring progress, and a concrete challenge for building LVLMs that can truly read, ground, and reason with cartographic evidence.
Tip: click a model bar to highlight it.
@misc{friedabenchmark2025,
title={FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models},
author={Jiyoon Pyo and Yuankun Jiao and Dongwon Jung and Zekun Li and Leeje Jang and Sofia Kirsanova and Jina Kim and Yijun Lin and Qin Liu and Junyi Xie and Hadi Askari and Nan Xu and Muhao Chen and Yao-Yi Chiang},
year={2025},
eprint={2512.08016},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.08016},
}