CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models
KDA-AI Workshop, IEEE ICSA 2026 (Accepted), 2026
Abstract
Large language models are increasingly used as software architecture co-pilots, yet no benchmark evaluates their cloud-native architecture knowledge. We present CAKE, 188 expert-validated questions spanning four cognitive levels and five cloud-native topics. We tested 22 model configurations from four families. MCQ accuracy plateaus above 3B parameters; free-response scores scale steadily; the two formats capture different facets of knowledge; reasoning augmentation improves free-response quality while tool augmentation degrades small models.
