Ben Santora | ben-santora

March 2026

Small Language Models

Using small language models effectively requires treating them as constrained probabilistic systems whose behavior is governed by structure, incentives, and limited capacity. They are highly sensitive to how tasks are framed. A core skill is eliciting useful reasoning without exhausting context or attention. Rather than requesting explanations after an answer is produced, prompts should be structured so reasoning precedes commitment. Light, well-placed scaffolding can guide internal deliberation on logic-heavy tasks, while excessive demands for explicit reasoning can dilute signal and reduce accuracy.

Prompt quality is driven by clarity and example density. Instructions alone often leave too much ambiguity; concrete examples define expectations precisely. Few-shot prompting works because it demonstrates acceptable patterns, boundaries, and outputs directly. Training and fine-tuning benefit most from tightly scoped datasets that reflect real usage, with clear decision criteria and minimal noise. Overly verbose prompts are counterproductive, consuming attention and introducing conflicting signals. Precision and relevance carry far more weight than length.

Hallucination detection is a critical operational skill. These models readily generate fluent but unsupported claims, especially when operating near the edge of their domain. Outputs that sound confident while offering unnecessary specificity should be treated with suspicion. The most reliable testing approach is adversarial prompting: asking the model to enumerate uncertainties, assumptions, or potential failure cases. Any concrete factual claim should be cross-checked against external sources. Confidence should be interpreted as a risk indicator rather than reassurance.

Creativity control depends on managing variance. Sampling parameters directly shape output behavior. Moderate randomness combined with constrained decoding or multi-pass generation allows variation while suppressing low-probability noise. For training and fine-tuning, narrowly defined objectives and explicit reward signals produce more stable behavior than broad stylistic goals.

Long sessions expose practical context limits. Performance degrades as attention spreads across large inputs. Effective recovery involves summarizing state, restarting with distilled context, and maintaining external notes. Robust SLM workflows are built around these constraints rather than assuming they can be ignored.