How Behar scores. Show your work.
Every number on every Behar dashboard has a formula behind it. Nothing is proprietary. Nothing is hidden behind “trust our algorithm.” This is the exact math.
Read it, challenge it, tell us we got it wrong. That's the point.
Maximum signal contributions. Scores cap at 100.
The scoring formula
Every LLM response for every prompt gets analyzed by Claude Haiku (structured JSON extraction) to detect: whether your brand was mentioned, in what position in any list, how many times, and with what sentiment. Those four signals produce a score from 0 to 100.
That's the whole formula. We chose simplicity on purpose. Transparent scoring gets challenged, iterated, and improved. Proprietary scoring gets defended. Ours is the former.
The four signals, explained
Either the LLM mentioned your brand or it didn’t. This is a binary gate. If your brand isn’t in the response, nothing else matters — you score 0. This is why presence is the foundation of every other metric.
When LLMs list options, order matters. The first recommendation gets 3-5× more consideration than the third, according to behavioral research we’ve run on our own test prompts. First position gets +30, second +20, third +15, fourth +10, fifth and beyond +5. No position detected (unordered mention) gets +0.
If your brand is mentioned multiple times in the same response, that’s a stronger signal than a single passing mention. Formula: (mention_count − 1) × 4, capped at 20. So 2 mentions = +4, 3 mentions = +8, 6+ mentions = +20. Capped so a single answer spamming your name doesn’t inflate scores.
How the AI describes you. Positive (+10), neutral (0), negative (−10). Negation-aware — ‘not the best’ correctly scores negative. Claude Haiku does the sentiment analysis with explicit prompting against bias.
How aggregates work
Each prompt × LLM produces one score. A brand's overall score for a prompt is the average across all supported LLMs. A brand's overall presence is the percentage of prompts where they're mentioned at all (independent of score).
We never weight LLMs differently by default. ChatGPT doesn't count more than Gemini. This is a design choice: your visibility should be measured against the ecosystem as it actually is, not as it might be weighted.
Time-series scores are raw. We don't smooth, normalize, or anti-alias. If your score jumps from 62 to 78 in one day, that's what happened. Volatility is data.
Gap detection
After every run, we classify each prompt into one of four states:
Priority score = (competitor_lead) + (20 if absent) + (15 if declining) + (5 if new_opportunity), clamped 0–100. Pure deterministic math. No AI involvement in priority ranking — keeps gap detection fast and zero-cost.