CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models

KDA-AI Workshop, IEEE ICSA 2026 (Accepted), 2026

Abstract

Large language models are increasingly used as software architecture co-pilots, yet no benchmark evaluates their cloud-native architecture knowledge. We present CAKE, 188 expert-validated questions spanning four cognitive levels and five cloud-native topics. We tested 22 model configurations from four families. MCQ accuracy plateaus above 3B parameters; free-response scores scale steadily; the two formats capture different facets of knowledge; reasoning augmentation improves free-response quality while tool augmentation degrades small models.

Download Paper

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Phongsakon Mark Konrad

Abstract

Share on