| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| README.md | 2025-06-04 | 6.0 kB | |
| v2.2.0 source code.tar.gz | 2025-06-04 | 401.3 kB | |
| v2.2.0 source code.zip | 2025-06-04 | 498.4 kB | |
| Totals: 3 Items | 905.7 kB | 1 | |
Since the previous release of BEIR, I have been updating the repository to support evaluation for the latest SoTA embedding models.
1. Relax faiss dependency; as it's made optional now! Users would need to install faiss-cpu manually
A major complaint was that faiss-cpu was causing installation errors when users would install beir alongside other packages. To avoid this, in the previous version v2.1.0, we removed the faiss-cpu dependency from BEIR. However, this was causing installation errors as the faiss type was present in the Faiss search modules in BEIR, which I sadly overlooked. I have removed the faiss type module, and now the PyPI BEIR installation should be smooth in v2.2.0 without the faiss-cpu package.
2. Extended models.HuggingFace to support multi-GPU inference! 🎊
Thanks to boilerplate codes provided by the E5-team & MTEB, I have updated the huggingface code to use DDP, where the data is distributed across multiple GPUs for inference. Check out an example code here: evaluate_huggingface.py.
It should work directly, just provide the number of GPUs with CUDA_VISIBLE_DEVICES.
CUDA_VISIBLE_DEVICES=0,1,2,3 python evaluate_huggingface.py
3. Added EvaluateRetrieval.encode_and_retrieve() which first computes embeddings, saves them as a pickle (numpy), and loads them to search with faiss! 🥳
Thanks to the boilerplate codes provided by the Tevatron team, I added this crucial feature in the search with dense retrieval. Earlier, in EvaluateRetrieval.retrieve(), we would encode a sub-batch of corpus embeddings (usually 50K), compute the top-k similarity scores using PyTorch, and save them in a results heap.
Now, we have introduced EvaluateRetrieval.encode_and_retrieve() function that first encodes the queries and encodes the corpus in batches, and saves the embeddings (numpy float) and text_ids as a pickle. This is especially great with embeddings with API providers, as we do not want to recompute embeddings, as it takes time & money.
-
Encode the queries and passages and store them as a pickle in your local folder. The embeddings will be stored in the
encode_output_pathfolder withqueries.pklfor queries andcorpus.0.pkl,corpus.1.pkl, .... for passages in the corpus, where each pickle contains a maximum of 50K documents. Theoverwriteparameter denotes whether we need to overwrite the existing embeddings present or not.:::python self.retriever.encode( corpus=corpus, queries=queries, encode_output_path="./embeddings/", overwrite=False, query_filename="queries.pkl", corpus_filename="corpus..pkl", *kwargs, ) 2. After encoding, load the pickle back into numpy and use faiss to do an exact flat search to get the similar documents for each query. Make sure you install the faiss-cpu library:
pip install faiss-cpu. You need to provide thequery_embedding_fileas str and all the list ofcorpus_embedding_filesas List[str]. The function will output back theresultsdictionary, which contains the top-k passages with scores for each query_id.:::python self.retriever.search_from_files( query_embeddings_file=query_embeddings_file, corpus_embeddings_files=corpus_embeddings_files, top_k=self.top_k, **kwargs, )
4. Added LoRA evaluation models with vLLM support for much faster encoding and inference than huggingface! 🥳
Again, thanks to boilercode from the Tevatron team, we can now support LoRA fine-tuned models with LLMs such as rlhn/Qwen2.5-7B-rlhn-400K, We have added support to evaluate LoRA fine-tuned models with the vLLM package. You need to make sure you install
peft,accelerate, andvllmpackages to be able to utilize this.
An example of how to use LoRA evaluation models with vLLM support is shown in evaluate_lora_vllm.py.
NOTE: You can merge the LoRA model weights back to the original LLM model and use it for an even faster inference with vLLM!
5. Added API evaluations such as Cohere, Voyage, etc.! 😎
Many providers wish to benchmark their models against API providers such as OpenAI, Cohere, or Voyage, among others. To enable this we now support API evaluation with API based models. We currently support two vendors: Cohere & Voyage, with more to come soon in the repository.
- An example of how to use the Cohere API embed model: evaluate_cohere.py
- An example of how to use the VoyageAI model: evaluate_voyage.py
6. Small but mighty: Added a util function to load a TREC runfile and compute the evaluation scores with BEIR.
A small but useful utility is to get the TREC runfile and compute the nDCG@K or similar metric scores. Now this is possible as I have added a util function to load the TREC runfile as a results dictionary, which can be used to evaluate with qrels to quickly get the evaluation metric scores.
I'm happy to take suggestions pertaining to the improvement of the repository; e.g., what features do users want, and how to keep this repository relevant even though it has been 4-5 years since its inception.
What's Changed
- relax faiss type dependency as its optional; breaking when faiss is not installed by @thakur-nandan in https://github.com/beir-cellar/beir/pull/200
- merge latest development into main by @thakur-nandan in https://github.com/beir-cellar/beir/pull/201
Full Changelog: https://github.com/beir-cellar/beir/compare/v2.1.0...v2.2.0