Minions:本地端與雲端 LLM 的交會點
2025 年 2 月 25 日
來自 Christopher Ré 的史丹佛 Hazy Research 實驗室的 Avanika Narayan、Dan Biderman 和 Sabri Eyuboglu,以及 Avner May、Scott Linderman、James Zou,開發了一種方法,透過讓裝置上的小型模型(例如搭配 Ollama 的 Llama 3.2)與雲端中較大型的模型(例如 GPT-4o)協作,將 LLM 工作負載的大部分轉移到消費者裝置上。
這篇新的論文及其隨附的開放原始碼旨在透過兩種協定配置,在幾乎不降低或完全不降低品質的情況下,降低雲端成本。
- Minion:雲端模型與單一可存取資料的本地端模型自由對話,直到兩者達成解決方案
- 在維持雲端模型 87% 效能的同時,實現遠端成本降低 30.4 倍
- MinionS:雲端模型將任務分解為位元大小的子任務,以便在上下文區塊上執行。小型 LLM 並行解決這些任務
- 在維持雲端模型 97.9% 效能的同時,實現遠端成本降低 5.7 倍
開始使用
複製儲存庫
git clone https://github.com/HazyResearch/minions.git
cd minions
(可選)使用您最喜愛的套件管理器(例如 conda
、venv
、uv
等)建立虛擬環境
python3 -m venv .venv
source .venv/bin/activate
接著,安裝 Python 套件和相依性
pip install -e .
如果您還沒有安裝,請安裝Ollama 和 Meta 的 Llama 3.2 模型
ollama pull llama3.2
最後,為雲端模型建立 OpenAI API 金鑰。
執行示範應用程式
提供的 streamlit 應用程式執行 Minion 和 MinionS 協定的互動式示範。若要啟動它,請執行
streamlit run app.py
瀏覽器視窗將會開啟,其中包含輸入您的 OpenAI API 金鑰、選擇本地端模型以及執行 Minion 或 MinionS 的說明
範例程式碼
若要使用 Python 以程式設計方式執行 Minion 或 MinionS,可以使用 minions
套件。
Minion
首先建立一個名為 example.py
的檔案,並加入以下內容
from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minion import Minion
local_client = OllamaClient(
model_name="llama3.2",
)
remote_client = OpenAIClient(
model_name="gpt-4o",
)
# Instantiate the Minion object with both clients
minion = Minion(local_client, remote_client)
context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""
task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."
# Execute the minion protocol for up to two communication rounds
output = minion(
task=task,
context=[context],
max_rounds=2
)
print(output["final_answer"])
然後執行範例
python example.py
MinionS
稍作修改,相同的程式碼即可用於執行 MinionS 協定
from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minions import Minions
from pydantic import BaseModel
class StructuredLocalOutput(BaseModel):
explanation: str
citation: str | None
answer: str | None
local_client = OllamaClient(
model_name="llama3.2",
temperature=0.0,
structured_output_schema=StructuredLocalOutput
)
remote_client = OpenAIClient(
model_name="gpt-4o",
)
# Instantiate the Minion object with both clients
minions = Minions(local_client, remote_client)
context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""
task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."
# Execute the minion protocol for up to two communication rounds
output = minions(
task=task,
doc_metadata="Medical Report",
context=[context],
max_rounds=2
)
print(output["final_answer"])
進行修改後,重新執行範例
python example.py