stable-code:code - Ollama 框架

模型	大小	Python	C++	Javascript	Java	PHP	Rust
Stable Code	3B	32.4%	30.9%	32.1%	32.1%	24.2%	23.0%
CodeLLama	7B	30.0%	28.2%	32.5%	31.1%	25.7%	26.3%
Deepseek Coder	1.3B	28.6%	29.2%	28.7%	29.0%	23.6%	18.5%
Wizard Coder	3B	31.6%	25.6%	26.2%	25.8%	25.3%	20.4%
StarCoder	3B	21.6%	19.8%	21.5%	20.5%	19.0%	16.9%
Replit Code V1.5	3B	23.0%	25.9%	26.2%	23.6%	23.2%	21.5%
Deci Coder	1B	19.1%	6.8%	18.4%	16.7%	2.1%	1.7%

模型詳細資訊

開發者：Stability AI
模型類型：stable-code 模型是基於 transformer 解碼器架構的自迴歸語言模型。
語言：英文、程式碼
聯絡方式：如有關於模型的疑問和意見，請寄送電子郵件至 lm@stability.ai

模型架構

此模型是一個僅解碼器的 transformer，類似於 LLaMA (Touvron 等人，2023) 架構，但有以下修改

參數	隱藏層大小	層數	注意力頭	序列長度
2,796,431,360	2560	32	32	16384

位置嵌入：旋轉位置嵌入 (Su 等人，2021) 應用於前 25% 的注意力頭嵌入維度，以提高吞吐量，此做法遵循 Black 等人 (2022)。
斷詞器：我們使用修改版的 GPTNeoX 斷詞器。NeoX。我們新增了特殊符記來訓練填空中間 (FIM) 功能，例如 <FIM_PREFIX> 和 <FIM_SUFFIX> 以及其他特殊符記。

訓練

訓練資料集

此資料集包含在 HuggingFace Hub 上可取得的經過篩選的開放原始碼大型資料集混合：Falcon RefinedWeb 摘錄 (Penedo 等人，2023)，以及 CommitPackFT 和 Github Issues (BigCode., 2023) 和 StarCoder (Li 等人，2023)。我們進一步使用來自數學領域的資料補充我們的訓練 (Azerbayev、Zhangir 等人，2023 以及 Yu、Longhui 等人，2023)。

訓練所用的前 18 種程式語言：- C - CPP - Java - JavaScript - CSS - Go - HTML - Ruby - Rust - Markdown - Shell - Php - Sql - R - Typescript - Python - Jupyter-Clean - RestructuredText

使用方式與限制

預期用途

此模型旨在用作應用程式特定微調的基礎模型。開發人員必須評估和微調模型，以確保在下游應用程式中的安全效能。

限制與偏見

作為基礎模型，此模型可能會表現出不可靠、不安全或其他不良行為，必須在部署前透過評估和微調來修正。即使在應用資料清理篩選器後，預先訓練資料集中可能仍包含冒犯性或不當內容，這可能會反映在模型產生的文字中。我們建議使用者在生產系統中使用這些模型時務必謹慎。如果模型不適合您的應用程式，或任何可能對他人造成蓄意或非蓄意傷害的應用程式，請勿使用這些模型。

參考文獻

Hugging Face

Stable Code 3B is a 3 billion parameter Large Language Model (LLM), allowing accurate and responsive code completion at a level on par with models such as Code Llama 7b that are 2.5x larger.

**Key Features**

* **NEW** instruct model `ollama run stable-code`
* Fill in Middle Capability (FIM)
* Supports Long Context, trained with Sequences upto 16,384

![spiderchart](https://github.com/jmorganca/ollama/assets/3325447/6c3de7a5-5e10-4884-81fb-3a1b3f566609)

| Model            | Size | Python | C++  | Javascript | Java | PHP  | Rust |
|------------------|------|--------|------|------------|------|------|------|
| **Stable Code**  | 3B   | 32.4%  | 30.9%| 32.1%      | 32.1%| 24.2%| 23.0%|
| CodeLLama        | 7B   | 30.0%  | 28.2%| 32.5%      | 31.1%| 25.7%| 26.3%|
| Deepseek Coder   | 1.3B | 28.6%  | 29.2%| 28.7%      | 29.0%| 23.6%| 18.5%|
| Wizard Coder     | 3B   | 31.6%  | 25.6%| 26.2%      | 25.8%| 25.3%| 20.4%|
| StarCoder        | 3B   | 21.6%  | 19.8%| 21.5%      | 20.5%| 19.0%| 16.9%|
| Replit Code V1.5 | 3B   | 23.0%  | 25.9%| 26.2%      | 23.6%| 23.2%| 21.5%|
| Deci Coder       | 1B   | 19.1%  | 6.8% | 18.4%      | 16.7%| 2.1% | 1.7% |

## Model Details

* **Developed by**: [Stability AI](https://stability.ai/)
* **Model type**: stable-code models are auto-regressive language models based on the transformer decoder architecture.
* **Language(s)**: English, Code
* **Contact**: For questions and comments about the model, please email `lm@stability.ai`

### Model Architecture

The model is a decoder-only transformer similar to the LLaMA ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)) architecture with the following modifications:

| Parameters     | Hidden Size | Layers | Heads | Sequence Length |
|----------------|-------------|--------|-------|-----------------|
| 2,796,431,360  | 2560        | 32     | 32    | 16384            |

* **Position Embeddings**: Rotary Position Embeddings ([Su et al., 2021](https://arxiv.org/abs/2104.09864)) applied to the first 25% of head embedding dimensions for improved throughput following [Black et al. (2022)](https://arxiv.org/pdf/2204.06745.pdf).
* **Tokenizer**: We use a modified version of the GPTNeoX Tokenizer.[`NeoX`](https://github.com/EleutherAI/gpt-neox). We add special tokens to train for Fill in the Middle (FIM) capabilities like `<FIM_PREFIX>` and `<FIM_SUFFIX>` along with other special tokens.

## Training

### Training Dataset

The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), along with [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) and [Github Issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues) (BigCode., 2023), and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)). We further supplement our training with data from mathematical domains ([Azerbayev, Zhangir, et al., 2023](https://arxiv.org/abs/2310.10631) and, [Yu, Longhui, et al., 2023](https://arxiv.org/abs/2309.12284)).

Top 18 programming languages trained on:
- C
- CPP
- Java
- JavaScript
- CSS
- Go
- HTML
- Ruby
- Rust
- Markdown
- Shell
- Php
- Sql
- R
- Typescript
- Python
- Jupyter-Clean
- RestructuredText

## Use and Limitations

### Intended Use

The model is intended to be used as a foundational base model for application-specific fine-tuning. Developers must evaluate and fine-tune the model for safe performance in downstream applications.

### Limitations and Bias

As a base model, this model may exhibit unreliable, unsafe, or other undesirable behaviors that must be corrected through evaluation and fine-tuning prior to deployment. The pre-training dataset may have contained offensive or inappropriate content, even after applying data cleansing filters, which can be reflected in the model-generated text. We recommend that users exercise caution when using these models in production systems. Do not use the models if they are unsuitable for your application, or for any applications that may cause deliberate or unintentional harm to others.

## References

[Hugging Face](https://huggingface.co/stabilityai/stable-code-3b)

貼上、拖曳或點擊以上傳圖片 (.png、.jpeg、.jpg、.svg、.gif)