Llama 3.2 Vision

Llama 3.2 Vision 現已可在 Ollama 中執行，提供 11B 和 90B 兩種大小。

開始使用

ollama run llama3.2-vision

若要執行較大的 90B 模型

ollama run llama3.2-vision:90b

若要將圖片新增至提示，請將圖片拖放到終端機中，或在 Linux 上將圖片路徑新增至提示。

注意：Llama 3.2 Vision 11B 至少需要 8GB 的 VRAM，而 90B 模型至少需要 64 GB 的 VRAM。

範例

手寫

handwriting example

光學字元辨識 (OCR)

OCR example

圖表與表格

charts and tables example

圖像問答

image Q&A example

使用方式

首先，拉取模型

ollama pull llama3.2-vision

Python 程式庫

若要搭配 Ollama Python 程式庫使用 Llama 3.2 Vision

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)

JavaScript 程式庫

若要搭配 Ollama JavaScript 程式庫使用 Llama 3.2 Vision

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)

cURL

curl https://127.0.0.1:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'

2024年11月6日