Complexity-Constraint Code Evaluation (C3E): A Benchmark for Time Complexity Compliance in LLM-Generated Code
-
Li-Guo Chen,
-
Xin Wang,
-
Jue-Yu Chen,
-
Ren-Zhao Liang,
-
Zheng-Ran Zeng,
-
Yang-Ning Li,
-
Ying-Hui Li,
-
Yi-Dong Wang,
-
Yi-Jiang Xu,
-
Qing Gao,
-
Shi-Kun Zhang
-
Abstract
While Code LLMs excel at generating functionally correct code, existing benchmarks neglect a crucial aspect: adherence to explicit time complexity constraints. We introduce Complexity-Constraint Code Evaluation(C3E), a novel benchmark evaluating both functional correctness and complexity compliance across feasible/infeasible scenarios. C3E enables precise differentiation between asymptotic complexity classes and tests model robustness against theoretically impossible constraints. Our proposed Complexity Alignment Score (CAS) integrates correctness and complexity adherence into a unified metric, assessed through theoretical analysis rather than costly executions. Experiments reveal a striking gap in state-of-the-art models: GPT-4o achieves 81% correctness but only 22%CAS,demonstrating poor complexity compliance. Notably, most models fail to recognize infeasible constraints except advanced ones like GPT-4o. These findings underscore the necessity for complexity-aware evaluation, positioning C3E as an essential tool for advancing real-world coding reliability in Code LLMs.
-
-