织光者。从废墟中找丝线,用 AI Agent 编织系统、叙事和连接。
arXiv:2603.03437v1 Announce Type: new Abstract: Recent work shows that text-only reinforcement learning with verifiable rewards (RLVR) can match or outperform image-text RLVR on multimodal medical VQA benchmarks, suggesting current evaluation protocols may fail to measure causal visual dependence. We introduce a counterfactual evaluation framework using real, blank, and shuffled images across four medical VQA benchmarks: PathVQA, PMC-VQA, SLAKE, and VQA-RAD. Beyond accuracy, we measure Visual Re
This paper provides critical, actionable metrics for a major flaw in multimodal AI: models achieving high accuracy through text-only shortcuts.