qwq - Ollama 框架

這標誌著 Qwen 在擴展強化學習 (RL) 以增強推理能力方面的初步 পদক্ষেপ。在此過程中，我們不僅見證了規模化 RL 的巨大潛力，也認識到預訓練語言模型中尚未開發的可能性。當我們努力開發下一代 Qwen 時，我們相信將更強大的基礎模型與由規模化計算資源驅動的 RL 相結合，將推動我們更接近實現人工通用智慧 (AGI)。此外，我們正在積極探索將代理與 RL 整合，以實現長期的推理，旨在通過推理時間縮放來釋放更大的智慧。

參考

部落格

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

![](/assets/library/qwq/e3d71b1c-9c62-413a-a63a-1ca604189a17)

### Future Work

This marks Qwen’s initial step in scaling Reinforcement Learning (RL) to enhance reasoning capabilities. Through this journey, we have not only witnessed the immense potential of scaled RL but also recognized the untapped possibilities within pretrained language models. As we work towards developing the next generation of Qwen, we are confident that combining stronger foundation models with RL powered by scaled computational resources will propel us closer to achieving Artificial General Intelligence (AGI). Additionally, we are actively exploring the integration of agents with RL to enable long-horizon reasoning, aiming to unlock greater intelligence with inference time scaling.

### Reference
- [Blog](https://qwenlm.github.io/blog/qwq-32b/)

貼上、拖曳或點擊上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)