Method & Workflow · 5 min read

WDF*IDF - how we measure keyword relevance

WDF*IDF is the statistical method we use to derive from vertical data which words appear in which ratio in top-performing listings.

by robby

What is WDF*IDF?

WDF = Within-Document Frequency. IDF = Inverse Document Frequency. Together they measure how typical a word is for a cluster of documents.

Meaning: not "most-frequent word" wins - but "word that appears disproportionately often in top listings yet is rare in the broad pool".

Example

In jewelry:

"handmade" → high IDF (rare in the general pool, frequent in top

listings) → strong keyword

"ring" → low IDF (in nearly every jewelry listing) → weak keyword
"925 sterling" → medium IDF, high in premium top-10 → strong

differentiator keyword

How we use it

Per vertical we build a *WDFIDF vector** weekly from the top-10%.
When optimizing we compare your listing vector to the vertical vector.
Cosine similarity gives us branch_title and branch_desc.
At < 0.55 similarity we suggest the top-5 missing keywords.

Limits

WDF*IDF has no semantic understanding. "Ring" and "rings" are two words; synonyms aren't resolved. That's why an embedding layer (BERT-based) runs on top, building semantic clusters.

Tags:score methode wdf-idf keyword statistics

What is WDF*IDF?

Example

How we use it

Limits

Internal Rank - what it measures and how to improve it

Branch Rank - how we measure you against your industry