llama3.2-vision:90b-instruct-fp16

Llama 3.2-Vision 多模態大型語言模型 (LLM) 系列，是指令微調的影像推理生成模型集合，具有 11B 和 90B 兩種尺寸（文字 + 圖片輸入 / 文字輸出）。Llama 3.2-Vision 指令微調模型針對視覺辨識、影像推理、圖片描述和回答關於圖片的常見問題進行了最佳化。這些模型在常見的產業基準測試中，效能優於許多現有的開放原始碼和封閉式多模態模型。

支援語言：對於僅限文字的任務，官方支援英語、德語、法語、義大利語、葡萄牙語、印地語、西班牙語和泰語。Llama 3.2 的訓練資料集涵蓋比這 8 種支援語言更廣泛的語言集合。請注意，對於圖片 + 文字應用，僅支援英語。

使用方式

首先，拉取模型

ollama pull llama3.2-vision

Python 函式庫

若要搭配 Ollama Python 函式庫使用 Llama 3.2 Vision

import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)

JavaScript 函式庫

若要搭配 Ollama JavaScript 函式庫使用 Llama 3.2 Vision

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)

cURL

curl https://127.0.0.1:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'

參考資料

GitHub

HuggingFace

The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.

Supported Languages: For text only tasks, English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Note for image+text applications, English is the only language supported.

## Usage

First, pull the model:

```bash
ollama pull llama3.2-vision
```

### Python Library

To use Llama 3.2 Vision with the Ollama [Python library](https://github.com/ollama/ollama-python):

```python
import ollama

response = ollama.chat(
    model='llama3.2-vision',
    messages=[{
        'role': 'user',
        'content': 'What is in this image?',
        'images': ['image.jpg']
    }]
)

print(response)
```

### JavaScript Library

To use Llama 3.2 Vision with the Ollama [JavaScript library](https://github.com/ollama/ollama-js):

```javascript
import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2-vision',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: ['image.jpg']
  }]
})

console.log(response)
```

### cURL

```shell
curl https://127.0.0.1:11434/api/chat -d '{
  "model": "llama3.2-vision",
  "messages": [
    {
      "role": "user",
      "content": "what is in this image?",
      "images": ["<base64-encoded image data>"]
    }
  ]
}'
```

## References

[GitHub](https://github.com/meta-llama/llama-models)

[HuggingFace](https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf)

貼上、拖曳或點擊以上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)