bespoke-minicheck

Document: A group of students gather in the school library to study for their upcoming final exams.
Claim: The students are preparing for an examination.

_回應

Yes

_提示

Document: A group of students gather in the school library to study for their upcoming final exams.
Claim: The students are on vacation.

_回應

No

模型效能

這些模型的效能在我們新收集的基準測試（我們的模型在訓練期間未見過）LLM-AggreFact 上進行評估，該基準測試來自 11 個最近人工註釋的關於事實查核和 grounding LLM 生成的資料集。儘管 Bespoke-MiniCheck-7B 尺寸很小，但它仍是 SOTA 事實查核模型。

參考文獻

網站

論文

LLM-AggreFact 排行榜

This is a grounded factuality checking model developed by [Bespoke Labs](https://bespokelabs.ai).

The model takes as input a document (text) and a sentence and determines whether the sentence is supported by the document. In order to fact-check a multi-sentence claim, the claim should first be broken up into sentences. The document does not need to be chunked unless it exceeds 32K tokens.

![bespoke-minicheck-howitworks.png](https://ollama.dev.org.tw/assets/library/bespoke-minicheck/4a1f8cce-a9b2-41e1-8d0a-cb4f1c6b5793)

Bespoke-MiniCheck is the SOTA fact-checking model despite its small size.

## Usage

The prompt template is as follows:

```
Document: {document}
Claim: {claim}
```

The response will either be `Yes` or `No`.

## Examples

Prompt
```
Document: A group of students gather in the school library to study for their upcoming final exams.
Claim: The students are preparing for an examination.
```

Response
```
Yes
```

Prompt
```
Document: A group of students gather in the school library to study for their upcoming final exams.
Claim: The students are on vacation.
```

Response
```
No
```

## Model performance

![performance.png](https://ollama.dev.org.tw/assets/jmorgan/bespoke-minicheck/5a757ad2-5eff-4440-a2e7-9efc0bad9703)

The performance of these models is evaluated on our new collected benchmark (unseen by our models during training), [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact), from 11 recent human annotated datasets on fact-checking and grounding LLM generations. **Bespoke-MiniCheck-7B is the SOTA fact-checking model despite its small size.**

## References

[Website](https://bespokelabs.ai/bespoke-minicheck)

[Paper](https://arxiv.org/pdf/2404.10774)

[LLM-AggreFact Leaderboard](https://llm-aggrefact.github.io/)

貼上、拖曳或點擊以上傳圖片 (.png, .jpeg, .jpg, .svg, .gif)