Instructions to use capemox/ettin-encoder-17m-pretrain-50m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use capemox/ettin-encoder-17m-pretrain-50m with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("capemox/ettin-encoder-17m-pretrain-50m") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
ettin-encoder-17m-pretrain-50m
A retrieval encoder pretrained from jhu-clsp/ettin-encoder-17m on a 50M-pair balanced split of the DenseOn embeddings pre-training dataset.
Training data: this model was contrastively pretrained on
capemox/denseon-pretrain-50m-balanced, a 50,000,000-pair sample drawn from the DenseOn corpuslightonai/embeddings-pre-training-curated(665M curated query–document pairs across 34 sources). Pairs are sampled with T=2 temperature weighting + iterative equal-redistribution capping so that no single source dominates the mix.
This is a Stage 1 (pretraining-only) checkpoint — it has not been fine-tuned on supervised retrieval data. Use it as a strong starting point for Stage 2 fine-tuning, or as a zero-shot retrieval encoder.
Training recipe
| Base model | jhu-clsp/ettin-encoder-17m (ModernBERT, ~17M params) |
| Training data | capemox/denseon-pretrain-50m-balanced (50M pairs, DenseOn split) |
| Loss | MultipleNegativesRankingLoss (full, in-batch negatives) |
| Batch size | 1024 (1023 in-batch negatives/anchor) |
| Per-source batching | each batch drawn from one source dataset (DenseOn recipe) |
| Learning rate | 3e-5, linear decay, 5% warmup |
| Epochs | 1 (~48,827 steps) |
| Precision | bf16 + tf32, SDPA, torch.compile |
| Hardware | 1× A100-80GB, ~5h 26m |
Evaluation (Stage 1 checkpoint, NDCG@10)
BEIR subset (zero-shot, no fine-tuning):
| Dataset | NDCG@10 |
|---|---|
| ArguAna | 0.4482 |
| FiQA2018 | 0.2548 |
| NFCorpus | 0.2605 |
| SCIDOCS | 0.1629 |
| SciFact | 0.6276 |
| TRECCOVID | 0.5044 |
| Mean | 0.3764 |
NanoBEIR (13-dataset aggregate, end of training): 0.5074
Downstream value
When this checkpoint is fine-tuned on MS MARCO hard negatives
(tomaarsen/msmarco-Qwen3-Reranker-0.6B), it reaches 0.3061 mean BEIR NDCG@10 —
versus 0.2264 for the same fine-tune starting from the raw base model
(+35% from this pretraining step).
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("capemox/ettin-encoder-17m-pretrain-50m")
queries = ["What is the capital of France?"]
docs = ["Paris is the capital and largest city of France."]
q = model.encode(queries, normalize_embeddings=True)
d = model.encode(docs, normalize_embeddings=True)
print(q @ d.T)
- Downloads last month
- 45
Model tree for capemox/ettin-encoder-17m-pretrain-50m
Base model
jhu-clsp/ettin-encoder-17m