llama3.2-vision:11b-instruct-q8_0

Llama 3.2-Vision 多模態大型語言模型 (LLM) 集合，是一個指令微調的圖像推理生成模型集合，包含 11B 和 90B 兩種尺寸（文字 + 圖像輸入 / 文字輸出）。Llama 3.2-Vision 指令微調模型針對視覺辨識、圖像推理、圖像標註以及回答關於圖像的通用問題進行了最佳化。這些模型在常見的行業基準測試中，效能優於許多現有的開源和封閉式多模態模型。

支援語言：對於純文字任務，官方支援英語、德語、法語、義大利語、葡萄牙語、印地語、西班牙語和泰語。Llama 3.2 的訓練使用了比這 8 種支援語言更廣泛的語言集合。請注意，對於圖像+文字應用，僅支援英語。

使用方式

首先，提取模型

ollama pull llama3.2-vision

Python 程式庫

若要將 Llama 3.2 Vision 與 Ollama Python 程式庫搭配使用

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)

JavaScript 程式庫

若要將 Llama 3.2 Vision 與 Ollama JavaScript 程式庫搭配使用

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)

cURL

curl https://127.0.0.1:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'

參考文獻

GitHub

HuggingFace

The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.

Supported Languages: For text only tasks, English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Note for image+text applications, English is the only language supported.

## Usage

First, pull the model:

```bash
ollama pull llama3.2-vision
```

### Python Library

To use Llama 3.2 Vision with the Ollama [Python library](https://github.com/ollama/ollama-python):

```python
import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)
```

### JavaScript Library

To use Llama 3.2 Vision with the Ollama [JavaScript library](https://github.com/ollama/ollama-js):

```javascript
import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)
```

### cURL

```shell
curl https://127.0.0.1:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'
```

## References

[GitHub](https://github.com/meta-llama/llama-models)

[HuggingFace](https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf)

貼上、拖曳或點擊以上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)