llama2:70b-text-q2_K - Ollama 框架

llama2

Llama 2 是一系列基礎語言模型，參數規模從 7B 到 70B 不等。

7b 13b 70b

3M 下載次數更新於 14 個月前

102 個標籤

更新於 14 個月前

14 個月前

b45d7449cc92 · 29GB

# Llama 2 可接受使用政策 Meta 致力於推廣其工具的安全和公平使用，以及 f

4.8kB

授權條款

LLAMA 2 社群授權協議 Llama 2 版本發布日期：2023 年 7 月 18 日「協議」是指

7.0kB

讀我檔案

Llama 2 由 Meta Platforms, Inc. 發布。此模型以 2 兆個 tokens 進行訓練，預設支援 4096 個 tokens 的上下文長度。Llama 2 Chat 模型在超過 1 百萬個人工標註上進行微調，專為聊天而設計。

CLI

打開終端機並執行 ollama run llama2

API

使用 curl 的範例

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'

API 文件

記憶體需求

7b 模型通常至少需要 8GB 的 RAM
13b 模型通常至少需要 16GB 的 RAM
70b 模型通常至少需要 64GB 的 RAM

如果您在使用較高的量化等級時遇到問題，請嘗試使用 q4 模型，或關閉任何其他佔用大量記憶體的程式。

模型變體

Chat 針對聊天/對話使用情境進行了微調。這些是 Ollama 中的預設模型，以及在標籤頁中標記為 -chat 的模型。

範例：ollama run llama2

Pre-trained 是沒有聊天微調的版本。這在標籤頁中標記為 -text。

範例：ollama run llama2:text

預設情況下，Ollama 使用 4 位元量化。若要嘗試其他量化等級，請嘗試其他標籤。q 後面的數字代表用於量化的位元數（即 q4 表示 4 位元量化）。數字越高，模型越精準，但執行速度越慢，且需要的記憶體也越多。

參考資料

Llama 2：開放基礎和微調聊天模型

Meta 的 Hugging Face repo

Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.

### CLI

Open the terminal and run `ollama run llama2`

### API

Example using curl:

```bash
curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'
```

[API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md)

## Memory requirements

- 7b models generally require at least 8GB of RAM
- 13b models generally require at least 16GB of RAM
- 70b models generally require at least 64GB of RAM

If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory.

## Model variants

**Chat** is fine-tuned for chat/dialogue use cases. These are the default in Ollama, and for models tagged with -chat in the tags tab.

*Example: `ollama run llama2`*

**Pre-trained** is without the chat fine-tuning. This is tagged as -text in the tags tab.

*Example: `ollama run llama2:text`*

By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags. The number after the q represents the number of bits used for quantization (i.e. q4 means 4-bit quantization). The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.

## References
[Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

[Meta's Hugging Face repo](https://huggingface.co/meta-llama)

貼上、拖曳或點擊以上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)