Appendix C — devskills v0.4

Credentialing methodology

This page documents how Arbor combines signal sources to produce developer credentials. The methodology is published here — rather than buried in product code — because credentialing methodology that isn't open is credentialing that can't be trusted.

Signal sources

A credential may draw from up to three independent sources:

Public artifact evidence — GitHub activity (commits, reviews, pull requests). Strongest for Layer 1 (universal craft) and Layer 2 (technology fluency).
Peer attestation— colleagues affirming behavioral observations drawn from the taxonomy's level descriptors. The only available source for Layer 3 (contextual / soft skills) and the only credible way to assess work done in private repositories.
Self-reported context — narrative the developer adds to explain the gap between public artifacts and peer attestation, e.g. private-repository work. Always labeled clearly to readers.

How levels are assigned

A peer attestation flow built on this taxonomy must never ask a rater "what level is this developer" and must never present comparative framing across developers in the same session. Instead, the rater is shown specific behavioral statements drawn from the taxonomy's level descriptors and asked which they have directly observed. "I have not observed this" is a first-class, non-penalized response.

Each behavioral statement is tagged to a specific (skill, level). A level assignment per skill per subject is derived from the set of observed behaviors using a monotonic-implication rule: affirming a behavior at level N implies the rater would have affirmed behaviors at levels below N. This is structural, not statistical — it follows from the Dreyfus progression encoded in the taxonomy.

Comparison edges across subjects (necessary for level calibration in sparse graphs) are inferred on the back end from raters who attested to multiple subjects on the same skill. Raters never see comparative framing.

Asymmetric discrepancy framing

When peer attestation produces a different level than public-artifact evidence for the same skill, the methodology handles the discrepancy asymmetrically:

Peer attestation level > public evidence level — the elevated level is shown on the public profile with an inline delta (e.g. L3 → L5 (peer-validated)). The credential reflects the higher-confidence signal.
Peer attestation level < public evidence level — the public profile shows the public-evidence level only. The discrepancy is shown to the developer in the private growth view as a development opportunity.

This asymmetry is deliberate. Two reasons:

Thin public evidence is the common case for private-repository work. Peer attestation is precisely the mechanism that should make those skills visible.
Incentive alignment. A developer who solicits peer feedback can only improve their public credential, never damage it. Without this property, peer feedback becomes adversarial.

Credibility weighting

Not all peer attestations carry equal weight. The weighting algorithm uses four inputs: attestation history, consistency with consensus, rater's own skill level, and platform tenure.

The weight is a multiplicative composition:

weight = clamp(0, 1, BASE × tenure_factor × count_factor × consistency_factor)

Input	Bucket	Factor
BASE	constant	0.7
Tenure	< 1 day	0.7
	1–7 days	0.85
	7–30 days	1.0
	30–90 days	1.05
	90+ days	1.1
Prior attestations	0 (first-time rater)	0.5
	1	0.7
	2	0.9
	3–9	1.0
	10+	1.15
Consistency variance	< 0.3 (low)	1.1
	0.3–0.8	1.0
	0.8–1.5	0.85
	> 1.5 (high)	0.6

A first-time rater on day zero gets 0.7 × 0.7 × 0.5 × 1.0 = 0.245. A long-tenured rater with 10+ attestations and consensus-aligned observations gets 0.7 × 1.1 × 1.15 × 1.1 = 0.974.

The specific weight values may evolve — the inputs and their direction-of-effect will not. The formula is published here rather than hidden in product code because credentialing methodology has to withstand scrutiny.

Confidence and cross-validation

Behavioral observation produces meaningful levels only when the observation graph has sufficient density. A peer-attested level supported by three observations should look visibly different from one supported by thirty. Confidence is shown alongside every assessed level — it is never hidden from readers.

1–2 attestations — preliminary, confidence ≤ 0.3
3–9 attestations from independent raters — confidence 0.4–0.7
10+ attestations from independent raters — confidence ≥ 0.8
Inconsistent observation patterns (high variance) — confidence capped at 0.5 regardless of count

Dual signal cadence

Real-time engagement events (“Sarah just attested to your system design skills”) and credential changes (the public profile updates) are separated in time. Attestation receipt is shown immediately as an engagement event. Public-profile level changes happen on a weekly recalculation cadence.

This separation protects credential stability — a developer who shares their public profile should not see it shift the next day from a single new attestation — and disrupts the gameability of instant-update mechanics. The private growth view may update more frequently, as it has different trust implications than the public credential.

Open taxonomy →Back to Arbor →