視覺模型

2024年2月2日

Vision Models

全新 LLaVA 模型

LLaVA (大型語言與視覺助理) 模型集合已更新至 1.6 版本,支援

  • 更高的影像解析度: 支援高達 4 倍以上的像素,讓模型能掌握更多細節。
  • 提升的文字辨識與推理能力: 於額外的文件、圖表和示意圖資料集上訓練。
  • 更寬鬆的授權條款: 透過 Apache 2.0 授權或 LLaMA 2 社群授權發布。

這些模型提供三種參數尺寸。7B、13B 和全新的 34B 模型

  • ollama run llava:7b
  • ollama run llava:13b
  • ollama run llava:34b

使用方式

CLI

Art

若要搭配 ollama run 使用視覺模型,請使用檔案路徑參考 .jpg.png 檔案

% ollama run llava "describe this image: ./art.jpg"
The image shows a colorful poster featuring an illustration of a cartoon character with spiky hair. Above the character's head is a crown, suggesting royalty or high status. On the right side of the poster, there is text in French that reads "Jean-Michel Basquiat,"

Python

import ollama

res = ollama.chat(
	model="llava",
	messages=[
		{
			'role': 'user',
			'content': 'Describe this image:',
			'images': ['./art.jpg']
		}
	]
)

print(res['message']['content'])

JavaScript

import ollama from 'ollama'

const res = await ollama.chat({
	model: 'llava',
	messages: [{
		role: 'user',
		content: 'Describe this image:'
		images: ['./art.jpg']
	}]
})

console.log(res.message.content)

注意:在 Ollama PythonJavaScript 函式庫以及 REST API 中,可以於 images 參數中提供 base64 編碼的檔案。請參閱完整的 API 文件 以取得更多關於向視覺模型提供影像的範例。

範例

物件偵測

Object detection

% ollama run llava "tell me what do you see in this picture? ./pic.jpg"
In the image, a man wearing blue and white is holding video game controllers in his hands while smiling. His arms are raised above his head as he plays with the virtual reality gaming devices. The man appears to be enjoying himself and immersed in his gaming experience.

文字辨識

Word Art

% ollama run llava "what does the text say? ./wordart.png"
This is an image with a capitalized, bolded text of "ollama." The background is black and the letters are yellow, which create a contrasting appearance. Although the words "ollama" make up the majority of the image, it could also be seen as part of a sign or advertisement due to its brightness and prominent font style.

感謝 LLaVA 團隊

LLaVA 1.6 模型家族由一個傑出的團隊發布。關於模型訓練方式的資訊,以及 LLaVA 1.6 與領先的開源和專有模型比較的基準測試結果,請參閱 LLaVA 網站

Until next time