We use cookies to improve your experience with our site.
Li-Guo Chen, Xin Wang, Jue-Yu Chen, Ren-Zhao Liang, Zheng-Ran Zeng, Yang-Ning Li, Ying-Hui Li, Yi-Dong Wang, Yi-Jiang Xu, Qing Gao, Shi-Kun Zhang. Complexity-Constraint Code Evaluation (C3E): A Benchmark for Time Complexity Compliance in LLM-Generated CodeJ. Journal of Computer Science and Technology. DOI: 10.1007/s11390-025-5518-5
Citation: Li-Guo Chen, Xin Wang, Jue-Yu Chen, Ren-Zhao Liang, Zheng-Ran Zeng, Yang-Ning Li, Ying-Hui Li, Yi-Dong Wang, Yi-Jiang Xu, Qing Gao, Shi-Kun Zhang. Complexity-Constraint Code Evaluation (C3E): A Benchmark for Time Complexity Compliance in LLM-Generated CodeJ. Journal of Computer Science and Technology. DOI: 10.1007/s11390-025-5518-5

Complexity-Constraint Code Evaluation (C3E): A Benchmark for Time Complexity Compliance in LLM-Generated Code

  • While Code LLMs excel at generating functionally correct code, existing benchmarks neglect a crucial aspect: adherence to explicit time complexity constraints. We introduce Complexity-Constraint Code Evaluation(C3E), a novel benchmark evaluating both functional correctness and complexity compliance across feasible/infeasible scenarios. C3E enables precise differentiation between asymptotic complexity classes and tests model robustness against theoretically impossible constraints. Our proposed Complexity Alignment Score (CAS) integrates correctness and complexity adherence into a unified metric, assessed through theoretical analysis rather than costly executions. Experiments reveal a striking gap in state-of-the-art models: GPT-4o achieves 81% correctness but only 22%CAS,demonstrating poor complexity compliance. Notably, most models fail to recognize infeasible constraints except advanced ones like GPT-4o. These findings underscore the necessity for complexity-aware evaluation, positioning C3E as an essential tool for advancing real-world coding reliability in Code LLMs.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return