wizardlm:13b-fp16 - Ollama 框架

wizardlm

基於 Llama 2 的通用模型。

75.8K 下載次數更新於 16 個月前

73 個標籤

16 個月前更新

16 個月前

95a66502902f · 26GB

{ "stop": [ "USER:", "ASSISTANT:" ] }

31B

模板

{{ .System }} USER: {{ .Prompt }} ASSISTANT:

45B

系統

一個好奇的使用者和人工智慧助理之間的對話。助理會給予有幫助的，

154B

Readme

WizardLM 是一個基於 Llama 2，由 WizardLM 訓練的 70B 參數模型。

開始使用 WizardLM

以下範例中使用的模型是 WizardLM 模型，具有 700 億個參數，這是一個通用模型。

API

啟動 Ollama 伺服器 (執行 ollama serve)
執行模型

curl -X POST https://#:11434/api/generate -d '{
  "model": "wizardlm:70b-llama2-q4_0",
  "prompt":"Why is the sky blue?"
 }'

CLI

安裝 Ollama
開啟終端機並執行 ollama run wizardlm:70b-llama2-q4_0

注意：如果模型尚未下載，ollama run 命令會執行 ollama pull。若要在不執行的情況下下載模型，請使用 ollama pull wizardlm:70b-llama2-q4_0

記憶體需求

70b 模型通常需要至少 64GB 的 RAM

如果您在使用較高的量化等級時遇到問題，請嘗試使用 q4 模型或關閉任何其他佔用大量記憶體的程式。

模型變體

預設情況下，Ollama 使用 4 位元量化。若要嘗試其他量化等級，請嘗試其他標籤。q 後面的數字代表用於量化的位元數（即 q4 表示 4 位元量化）。數字越高，模型越準確，但執行速度越慢，並且需要更多記憶體。

模型來源

Ollama 上的 WizardLM 來源

70b 參數來源： The Bloke

70b 參數原始來源： WizardLM

WizardLM is a 70B parameter model based on Llama 2 trained by WizardLM.

## Get started with WizardLM

The model used in the example below is the WizardLM model, with 70b parameters, which is a general-use model.

### API

1. Start Ollama server (Run `ollama serve`)
2. Run the model

```bash
curl -X POST https://#:11434/api/generate -d '{
  "model": "wizardlm:70b-llama2-q4_0",
  "prompt":"Why is the sky blue?"
 }'
  ```

### CLI

1. Install Ollama
2. Open the terminal and run `ollama run wizardlm:70b-llama2-q4_0`

Note: The `ollama run` command performs an `ollama pull` if the model is not already downloaded. To download the model without running it, use `ollama pull wizardlm:70b-llama2-q4_0`

## Memory requirements

- 70b models generally require at least 64GB of RAM

If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory.

## Model variants

By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags. The number after the q represents the number of bits used for quantization (i.e. q4 means 4-bit quantization). The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.

## Model source

**WizardLM source on Ollama**

70b parameters source:
 [The Bloke](https://huggingface.co/TheBloke/WizardLM-70B-V1.0-GGML)

70b parameters original source:
 [WizardLM](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)

貼上、拖曳或點擊以上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)