Minions:本地端與雲端 LLM 的交會點

2025 年 2 月 25 日

Minions

來自 Christopher Ré 的史丹佛 Hazy Research 實驗室的 Avanika Narayan、Dan Biderman 和 Sabri Eyuboglu,以及 Avner May、Scott Linderman、James Zou,開發了一種方法,透過讓裝置上的小型模型(例如搭配 Ollama 的 Llama 3.2)與雲端中較大型的模型(例如 GPT-4o)協作,將 LLM 工作負載的大部分轉移到消費者裝置上。

這篇新的論文及其隨附的開放原始碼旨在透過兩種協定配置,在幾乎不降低或完全不降低品質的情況下,降低雲端成本。

  • Minion:雲端模型與單一可存取資料的本地端模型自由對話,直到兩者達成解決方案
    • 在維持雲端模型 87% 效能的同時,實現遠端成本降低 30.4 倍
  • MinionS:雲端模型將任務分解為位元大小的子任務,以便在上下文區塊上執行。小型 LLM 並行解決這些任務
    • 在維持雲端模型 97.9% 效能的同時,實現遠端成本降低 5.7 倍

開始使用

複製儲存庫

git clone https://github.com/HazyResearch/minions.git 
cd minions

(可選)使用您最喜愛的套件管理器(例如 condavenvuv 等)建立虛擬環境

python3 -m venv .venv
source .venv/bin/activate

接著,安裝 Python 套件和相依性

pip install -e .

如果您還沒有安裝,請安裝Ollama 和 Meta 的 Llama 3.2 模型

ollama pull llama3.2

最後,為雲端模型建立 OpenAI API 金鑰

執行示範應用程式

提供的 streamlit 應用程式執行 Minion 和 MinionS 協定的互動式示範。若要啟動它,請執行

streamlit run app.py

瀏覽器視窗將會開啟,其中包含輸入您的 OpenAI API 金鑰、選擇本地端模型以及執行 Minion 或 MinionS 的說明

Minions Screenshot

範例程式碼

若要使用 Python 以程式設計方式執行 Minion 或 MinionS,可以使用 minions 套件。

Minion

首先建立一個名為 example.py 的檔案,並加入以下內容

from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minion import Minion

local_client = OllamaClient(
    model_name="llama3.2",
)
    
remote_client = OpenAIClient(
    model_name="gpt-4o",
)

# Instantiate the Minion object with both clients
minion = Minion(local_client, remote_client)

context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""

task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."

# Execute the minion protocol for up to two communication rounds
output = minion(
    task=task,
    context=[context],
    max_rounds=2
)

print(output["final_answer"])

然後執行範例

python example.py

MinionS

稍作修改,相同的程式碼即可用於執行 MinionS 協定

from minions.clients.ollama import OllamaClient
from minions.clients.openai import OpenAIClient
from minions.minions import Minions
from pydantic import BaseModel

class StructuredLocalOutput(BaseModel):
    explanation: str
    citation: str | None
    answer: str | None

local_client = OllamaClient(
    model_name="llama3.2",
    temperature=0.0,
    structured_output_schema=StructuredLocalOutput
)

remote_client = OpenAIClient(
    model_name="gpt-4o",
)


# Instantiate the Minion object with both clients
minions = Minions(local_client, remote_client)

context = """
Patient John Doe is a 60-year-old male with a history of hypertension. In his latest checkup, his blood pressure was recorded at 160/100 mmHg, and he reported occasional chest discomfort during physical activity.
Recent laboratory results show that his LDL cholesterol level is elevated at 170 mg/dL, while his HDL remains within the normal range at 45 mg/dL. Other metabolic indicators, including fasting glucose and renal function, are unremarkable.
"""

task = "Based on the patient's blood pressure and LDL cholesterol readings in the context, evaluate whether these factors together suggest an increased risk for cardiovascular complications."

# Execute the minion protocol for up to two communication rounds
output = minions(
    task=task,
    doc_metadata="Medical Report",
    context=[context],
    max_rounds=2
)

print(output["final_answer"])

進行修改後,重新執行範例

python example.py

閱讀更多