llama2 - Ollama 框架

llama2

Llama 2 是一系列基礎語言模型，參數範圍從 7B 到 70B。

7b 13b 70b

3百萬下載次數更新於 14 個月前

102 個標籤

更新於 14 個月前

14 個月前

78e26419b446 · 3.8GB

[INST] <<SYS>>{{ .System }}<</SYS>> {{ .Prompt }} [/INST]

59B

參數

{ "stop": [ "[INST]", "[/INST]", "<<SYS>>", "<</SYS>>" ] }

91B

許可證

# Llama 2 Acceptable Use Policy Meta is committed to promoting safe and fair use of its tools and f

4.8kB

許可證

LLAMA 2 COMMUNITY LICENSE AGREEMENT Llama 2 Version Release Date: July 18, 2023 "Agreement" means

7.0kB

讀我檔案

Llama 2 由 Meta Platforms, Inc. 發布。此模型以 2 兆個 tokens 進行訓練，預設支援 4096 個 tokens 的上下文長度。 Llama 2 Chat 模型在超過 100 萬個人工標註上進行微調，專為聊天而設計。

CLI

開啟終端機並執行 ollama run llama2

API

使用 curl 的範例

curl -X POST https://#:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'

API 文件

記憶體需求

7b 模型通常至少需要 8GB 的記憶體
13b 模型通常至少需要 16GB 的記憶體
70b 模型通常至少需要 64GB 的記憶體

如果您在使用較高的量化等級時遇到問題，請嘗試使用 q4 模型，或關閉任何其他佔用大量記憶體的程式。

模型變體

Chat 針對聊天/對話用例進行了微調。這些是 Ollama 中的預設模型，適用於標籤頁中標記為 -chat 的模型。

範例：ollama run llama2

Pre-trained 模型未經聊天微調。這在標籤頁中標記為 -text。

範例：ollama run llama2:text

預設情況下，Ollama 使用 4 位元量化。若要嘗試其他量化等級，請嘗試其他標籤。 q 後面的數字表示用於量化的位元數（即 q4 表示 4 位元量化）。數字越高，模型越準確，但執行速度越慢，且需要的記憶體越多。

參考資料

Llama 2：開放基礎和微調的聊天模型

Meta 的 Hugging Face 儲存庫

Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.

### CLI

Open the terminal and run `ollama run llama2`

### API

Example using curl:

```bash
curl -X POST https://#:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'
```

[API documentation](https://github.com/jmorganca/ollama/blob/main/docs/api.md)

## Memory requirements

- 7b models generally require at least 8GB of RAM
- 13b models generally require at least 16GB of RAM
- 70b models generally require at least 64GB of RAM

If you run into issues with higher quantization levels, try using the q4 model or shut down any other programs that are using a lot of memory.

## Model variants

**Chat** is fine-tuned for chat/dialogue use cases. These are the default in Ollama, and for models tagged with -chat in the tags tab.

*Example: `ollama run llama2`*

**Pre-trained** is without the chat fine-tuning. This is tagged as -text in the tags tab.

*Example: `ollama run llama2:text`*

By default, Ollama uses 4-bit quantization. To try other quantization levels, please try the other tags. The number after the q represents the number of bits used for quantization (i.e. q4 means 4-bit quantization). The higher the number, the more accurate the model is, but the slower it runs, and the more memory it requires.

## References
[Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)

[Meta's Hugging Face repo](https://huggingface.co/meta-llama)

貼上、拖曳或點擊以上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)