llama2:70b-chat

llama2

Llama 2 是一系列基礎語言模型，參數規模從 7B 到 70B 不等。

7b 13b 70b

2.9M Pulls Updated 13 months ago

102 個標籤

更新於 13 個月前

13 個月前

e7f6c06ffef4 · 39GB

[INST] <<SYS>>{{ .System }}<</SYS>> {{ .Prompt }} [/INST]

59B

params

{ "stop": [ "[INST]", "[/INST]", "<<SYS>>", "<</SYS>>" ] }

91B

license

# Llama 2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and f

4.8kB

license

LLAMA 2 COMMUNITY LICENSE AGREEMENT Llama 2 Version Release Date: July 18, 2023 "Agreement" means

7.0kB

Readme

Llama 2 由 Meta Platforms, Inc. 發布。此模型以 2 兆個 tokens 進行訓練，預設支援 4096 個 tokens 的上下文長度。Llama 2 Chat 模型經過超過 1 百萬個人工標註進行微調，專為聊天而設計。

CLI

開啟終端機並執行 ollama run llama2

API

使用 curl 的範例

curl -X POST https://127.0.0.1:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'

API 文件

記憶體需求

7b 模型通常至少需要 8GB 的 RAM
13b 模型通常至少需要 16GB 的 RAM
70b 模型通常至少需要 64GB 的 RAM

如果您在使用較高的量化等級時遇到問題，請嘗試使用 q4 模型，或關閉任何其他佔用大量記憶體的程式。

模型變體

Chat 經過微調，適用於聊天/對話用例。這些是 Ollama 中的預設模型，以及在標籤頁中標記為 -chat 的模型。

範例：ollama run llama2

Pre-trained 是沒有聊天微調的版本。這在標籤頁中標記為 -text。

範例：ollama run llama2:text

預設情況下，Ollama 使用 4 位元量化。若要嘗試其他量化等級，請嘗試其他標籤。q 後面的數字代表用於量化的位元數（即 q4 表示 4 位元量化）。數字越高，模型越準確，但執行速度越慢，且需要的記憶體越多。

參考文獻

Llama 2：開放基礎和微調的聊天模型

Meta 的 Hugging Face repo

Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.

### CLI

Open the terminal and run `ollama run llama2`

### API

Example using curl:

```bash
curl -X POST https://127.0.0.1:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'
```

[API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md)

## Memory requirements

- 7b models generally require at least 8GB of RAM
- 13b models generally require at least 16GB of RAM
- 70b models generally require at least 64GB of RAM

If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory.

## Model variants

**Chat** is fine-tuned for chat/dialogue use cases. These are the default in Ollama, and for models tagged with -chat in the tags tab.

*Example: `ollama run llama2`*

**Pre-trained** is without the chat fine-tuning. This is tagged as -text in the tags tab.

*Example: `ollama run llama2:text`*

By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags. The number after the q represents the number of bits used for quantization (i.e. q4 means 4-bit quantization). The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.

## References
[Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

[Meta's Hugging Face repo](https://huggingface.co/meta-llama)

貼上、拖放或點擊以上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)