We use cookies to improve your experience with our site.
Cheng-Feng Dou, Ying Zhang, Zhi Jin, Wen-Pin Jiao, Hai-Yan Zhao, Yong-Qiang Zhao, Zheng-Wei Tao. Exploring LLM-based Data Synthesis Strategies for Medical Consultation Preference Alignment[J]. Journal of Computer Science and Technology. DOI: 10.1007/s11390-025-4929-7
Citation: Cheng-Feng Dou, Ying Zhang, Zhi Jin, Wen-Pin Jiao, Hai-Yan Zhao, Yong-Qiang Zhao, Zheng-Wei Tao. Exploring LLM-based Data Synthesis Strategies for Medical Consultation Preference Alignment[J]. Journal of Computer Science and Technology. DOI: 10.1007/s11390-025-4929-7

Exploring LLM-based Data Synthesis Strategies for Medical Consultation Preference Alignment

  • This research explores the application of Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) techniques to enhance healthcare consultation models, with the aim of addressing the challenges associated with preference-aligned data synthesis while reducing the dependence on medical experts. Specifically, we investigate the use of RLAIF in the generation of medical dialogues, focusing on two primary challenges: accurately reflecting physicians' preferences and the unreliability of existing automated assessment systems. To address these issues, we propose a two-stage approach for synthesizing preference-aligned datasets. In the first stage, we leverage the dialogue continuation capabilities of a large language model to sample diverse, contextually aligned dialogue branches, employing one-shot learning for intervention. The second stage involves modeling doctors' preferences through both outcome and process feedback. For outcome feedback, a rule-based reward system is utilized, whereas a planning-based reward strategy is employed for process feedback. To validate our approach, we develop the Chinese standardized patient test(CSPT) dataset that emphasizes user guidance, instruction following, and synthesis ability, and constructed an objective assessment system based on standardized patient testing. Experimental results demonstrate that our data synthesis approach performs well across five datasets, achieving a 17.6% improvement in diagnostic accuracy with outcome feedback and a 23.3% improvement with process feedback.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return