marco-o1

使用 CoT 數據進行微調： 我們通過使用開源 CoT 數據集結合我們自行開發的合成數據，對基礎模型進行全參數微調，從而開發出 Marco-o1-CoT。
通過 MCTS 擴展解決方案空間： 我們將 LLM 與 MCTS (Marco-o1-MCTS) 整合，使用模型的輸出置信度來引導搜索並擴展解決方案空間。
推理行動策略： 我們實施了新穎的推理行動策略和反思機制 (Marco-o1-MCTS mini-step)，包括探索 MCTS 框架內不同的行動粒度，並提示模型進行自我反思，從而顯著提高模型解決複雜問題的能力。
在翻譯任務中的應用： 我們是第一個將大型推理模型 (LRM) 應用於機器翻譯任務的團隊，探索多語言和翻譯領域中的推理時間縮放定律。

使用方法

ollama run marco-o1 "How many Rs are in strawberry?"

解析 <Output> 和 </Output> 之間的結果字串

...
<Output>
There are 3 Rs in strawberry.
</Output>

參考文獻

GitHub

HuggingFace

* **Fine-Tuning with CoT Data:** We develop <ins>Marco-o1-CoT</ins> by performing full-parameter fine-tuning on the base model using open-source CoT dataset combined with our self-developed synthetic data. 
* **Solution Space Expansion via MCTS:** We integrate LLMs with MCTS (<ins>Marco-o1-MCTS</ins>), using the model's output confidence to guide the search and expand the solution space. 
* **Reasoning Action Strategy:** We implement novel reasoning action strategies and a reflection mechanism (<ins>Marco-o1-MCTS mini-step</ins>), including exploring different action granularities within the MCTS framework and prompting the model to self-reflect, thereby significantly enhancing the model's ability to solve complex problems.
* **Application in Translation Tasks:** We are the first to apply Large Reasoning Models (LRM) to <ins>Machine Translation task</ins>, exploring inference time scaling laws in the multilingual and translation domain.

## Usage

```
ollama run marco-o1 "How many Rs are in strawberry?"
```

Parse the resulting string between `<Output>` and `</Output>`:

```
...
<Output>
There are 3 Rs in strawberry.
</Output>
```

## References

[GitHub](https://github.com/AIDC-AI/Marco-o1?tab=readme-ov-file)

[HuggingFace](https://huggingface.co/AIDC-AI/Marco-o1)

貼上、拖曳或點擊上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)