BEIR - Browse /v2.1.0 at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Name	Modified	Size	InfoDownloads / Week
Parent folder
README.md	2025-02-25	7.3 kB	0
v2.1.0_ Let_s drink BEIR! Now you can evaluate latest embedding models such as E5, Stella, NV-Embed-v2, LLM2Vec, Tevatron etc. source code.tar.gz	2025-02-25	393.8 kB	0
v2.1.0_ Let_s drink BEIR! Now you can evaluate latest embedding models such as E5, Stella, NV-Embed-v2, LLM2Vec, Tevatron etc. source code.zip	2025-02-25	481.6 kB	0
Totals: 3 Items		882.8 kB	0

After a busy & hectic 2024, I'm back contributing to the BEIR repository! 🎉

I upgraded the outdated repository in Python 3.6 to Python 3.9+. Also, sentence-transformers since 2023 has improved and changed. Therefore, it was necessary to update BEIR to include the latest decoder-based embedding models evaluation on the BEIR datasets.

BEIR provides you with easy to use code snippets and examples so that you can evaluate retrieval models without any issue in the examples/ folder and you are able to configure each parameter for retrieval which helps improve reproducibility!

Evaluate latest SoTA models such as E5, NV-Embed, ModernBERT etc.

Added models.HuggingFace which can be easily used to evaluate E5 models & fine-tuned PEFT models with Tevatron, e.g., RepLLAMA, or any custom embedding model present in HuggingFace. It supports three pooling techniques: mean, cls and eos pooling. To evaluate PEFT models, install peft using pip install beir[peft].

:::python

Example for E5-Mistral-7B

Check prompts: https://github.com/microsoft/unilm/blob/9c0f1ff7ca53431fe47d2637dfe253643d94185b/e5/utils.py

query_prompt = "Given a query on COVID-19, retrieve documents that answer the query" passage_prompt = "" dense_model = models.HuggingFace( model="intfloat/e5-mistral-7b-instruct", max_length=512, append_eos_token=True, # add [EOS] token to the end of the input pooling="eos", # end of sequence pooling normalize=True, prompts={"query": query_prompt, "passage": passage_prompt}, attn_implementation="flash_attention_2", torch_dtype="bfloat16" )

Example with RepLLAMA (PEFT) model

query_prompt = "query: " passage_prompt = "passage: " dense_model = models.HuggingFace( model="meta-llama/Llama-2-7b-hf", peft_model_path="castorini/repllama-v1-7b-lora-passage", max_length=512, append_eos_token=True, # add [EOS] token to the end of the input pooling="eos", normalize=True, prompts={"query": query_prompt, "passage": passage_prompt}, attn_implementation="flash_attention_2", torch_dtype="bfloat16", ) 2. Updated models.SentenceTransformer to include prompts, prompt_names and other latest features with LLM-based decoder models, for e.g., evaluate Stella, modernBERT-gte-base etc. Bonus: Alll sentence-transformer models can be used in multiple GPUs for evaluation. Checkout evaluate_sbert_multi_gpu.py

:::python

Example for Stella 1.5B v5

dense_model = models.SentenceBERT( "NovaSearch/stella_en_1.5B_v5", max_length=512, prompt_names={"query": "s2p_query", "passage": None}, trust_remote_code=True, )

Example for modernBERT GTE base

dense_model = models.SentenceBERT("Alibaba-NLP/gte-modernbert-base") 3. Added models.NVEmbed to evaluate the custom nvidia/NV-Embed-v2 model using BEIR, beware but currently you would need to downgrade your transformers version to 4.47.1, for the model to work. See discussion here.

:::python

Checkout prompts for NV-Embed-v2 model inside instructions.json.

https://huggingface.co/nvidia/NV-Embed-v2/blob/main/instructions.json

trec_covid_prompt = "Given a query on COVID-19, retrieve documents that answer the query"

Load the Dense Retriever model (NVEmbed)

dense_model = models.NVEmbed( "nvidia/NV-Embed-v2", max_length=512, normalize=True, prompts={"query": trec_covid_prompt, "passage": ""}, ) 4. Added models.LLM2Vec to evlauate the custom cross-attention embedding models provided in LLM2Vec repository here: https://github.com/McGill-NLP/llm2vec. Make sure you install LLM2Vec separately or using pip install beir[llm2vec].

:::python query_prompt = "Given a web search query, retrieve relevant passages that answer the query:" dense_model = models.LLM2Vec( model_name_or_path="McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp", peft_model_path="McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised", max_length=512, pooling="mean", normalize=True, prompts={"query": query_prompt, "passage": None}, attn_implementation="flash_attention_2", torch_dtype="bfloat16", ) 5. Removed a few old retrieval model examples such as USE-QA as they are sadly out of favour.

Util functions for easy saving of evaluation metrics and runfiles

Here now you can use two functions: util.save_runfile() which saves the results as a TREC runfile, which is useful to evaluate the top-k retrieved documents for a given query, and util.save_results() which saves your metrics: ndcg, map, recall, precision (optional: mrr, recall_cap and hole) into a JSON results file.

:::python
from beir import util

dataset="nfcorpus"
....
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
mrr = retriever.evaluate_custom(qrels, results, retriever.k_values, metric="mrr")

### If you want to save your results and runfile (useful for reranking)
results_dir = os.path.join(pathlib.Path(__file__).parent.absolute(), "results")
os.makedirs(results_dir, exist_ok=True)

#### Save the evaluation runfile & results
util.save_runfile(os.path.join(results_dir, f"{dataset}.run.trec"), results)
util.save_results(os.path.join(results_dir, f"{dataset}.json"), ndcg, _map, recall, precision, mrr)

Python upgraded to 3.9, Installation, ruff & pyproject.toml

We upgraded python to 3.9+ and accordingly changed the python formatting overall in the codebase. We also include ruff as the Python linter and code formatter to help me clean the codebase.

Merged old PRs

I changed beir installation to include only three main dependencies now: sentence-transformers, datasets and pytrec-eval-terrier, as many complained that pytrec_eval was not actively maintained and Windows users faced issues.

PS: Next, I have plans to add ColBERT evaluation now which is easily supported in PyLate and BM25s etc.

What's Changed

Pull latest main branch into development by @thakur-nandan in https://github.com/beir-cellar/beir/pull/153
merge latest main into development by @thakur-nandan in https://github.com/beir-cellar/beir/pull/190
Support Multi-node evaluation by @NouamaneTazi in https://github.com/beir-cellar/beir/pull/155
replacing pytrec_eval with pytrec-eval-terrier by @archersama in https://github.com/beir-cellar/beir/pull/175
merge latest main into development by @thakur-nandan in https://github.com/beir-cellar/beir/pull/191
Merge development into main by @thakur-nandan in https://github.com/beir-cellar/beir/pull/192

New Contributors

@archersama made their first contribution in https://github.com/beir-cellar/beir/pull/175

Full Changelog: https://github.com/beir-cellar/beir/compare/v2.0.0...v2.1.0

Source: README.md, updated 2025-02-25

BEIR Files

A Heterogeneous Benchmark for Information Retrieval

Evaluate latest SoTA models such as E5, NV-Embed, ModernBERT etc.

Example for E5-Mistral-7B

Check prompts: https://github.com/microsoft/unilm/blob/9c0f1ff7ca53431fe47d2637dfe253643d94185b/e5/utils.py

Example with RepLLAMA (PEFT) model

Example for Stella 1.5B v5

Example for modernBERT GTE base

Checkout prompts for NV-Embed-v2 model inside `instructions.json`.

https://huggingface.co/nvidia/NV-Embed-v2/blob/main/instructions.json

Load the Dense Retriever model (NVEmbed)

Util functions for easy saving of evaluation metrics and runfiles

Python upgraded to 3.9, Installation, ruff & pyproject.toml

Merged old PRs

What's Changed

New Contributors

BEIR Files

A Heterogeneous Benchmark for Information Retrieval

Get an email when there's a new version of BEIR

Evaluate latest SoTA models such as E5, NV-Embed, ModernBERT etc.

Example for E5-Mistral-7B

Check prompts: https://github.com/microsoft/unilm/blob/9c0f1ff7ca53431fe47d2637dfe253643d94185b/e5/utils.py

Example with RepLLAMA (PEFT) model

Example for Stella 1.5B v5

Example for modernBERT GTE base

Checkout prompts for NV-Embed-v2 model inside instructions.json.

https://huggingface.co/nvidia/NV-Embed-v2/blob/main/instructions.json

Load the Dense Retriever model (NVEmbed)

Util functions for easy saving of evaluation metrics and runfiles

Python upgraded to 3.9, Installation, ruff & pyproject.toml

Merged old PRs

What's Changed

New Contributors

Checkout prompts for NV-Embed-v2 model inside `instructions.json`.