How Report Card Scores Are Calculated

Overview

Each year, multiple monitoring groups survey the same seagrass transects independently. Because there is no single external ground truth, scores are based on how consistently groups agree with each other. A group that reports values close to the cross-group average earns a high score and one that deviates substantially earns a lower score.

Scores are calculated for three field measurements:

  • Abundance — Braun-Blanquet cover category (0, 0.1, 0.5, 1, 2, 3, 4, 5)
  • Blade Length — average blade length in cm
  • Short Shoot Density — shoots per m²

These three metric scores are averaged into an overall Total score, which is then converted to a letter grade.


Step 1: The Consensus Species List

Not every species recorded across all groups counts toward scoring. A species at a given transect is considered truly present only if at least two distinct groups reported it with non-zero abundance. This filters out likely misidentifications while still being inclusive — a species does not need to be unanimous, just corroborated.

Table 1: Consensus species at each transect in 2025 — species reported as non-zero by ≥2 groups.
Transect Species count Species
10 1 Halodule
3 2 Halodule, Thalassia
6 2 DR: Chondria, Thalassia
7 5 DR: Acanthophora, DR: Chondria, Halodule, Syringodium, Thalassia
8 3 DR: Acanthophora, Halodule, Thalassia
9 4 DR: Chondria, DR: Hypnea, Syringodium, Thalassia

The consensus list is computed per transect, so a species may be on the list at one transect but not another.


Step 2: True Values

For each consensus species at each transect, true values are calculated as the cross-group average:

  • Abundance: group Braun-Blanquet categories are converted to their numeric level (1–8), averaged across groups, then converted back to the nearest BB value.
  • Blade Length and Short Shoot Density: simple means across groups.

Only consensus species enter this calculation. This prevents a misidentified species (recorded by only one group) from distorting the true values for everyone else.

Table 2: Example true values for transect 1 in 2025 — cross-group averages for consensus species.
Transect Species Abundance Blade Length Short Shoot Density
3 Halodule 4 24.6 cm 9 per m²
3 Thalassia 0.5 23.0 cm 1 per m²

A dash (—) means that measurement was not recorded for that species at that transect.


Step 3: Group Deviations and Species ID Penalties

For each group, reported values are compared to the true values across all consensus species and transects. This comparison uses a full join, which naturally reveals two types of species identification errors:

Missed species — a consensus species is present but the group did not record it. The group’s abundance for that species is treated as level 1 (“no coverage”) when computing the deviation, applying a slight penalty.

False positives — the group recorded a species that is not on the consensus list (i.e., no other group confirmed it). The true abundance is treated as level 1, again applying a slight penalty.

These penalties only affect the Abundance score. Blade Length and Short Shoot Density cannot be meaningfully penalised for species that were not found, so those metrics use na.rm = TRUE when averaging.

Table 3: Reported vs. true values for group ‘SWFWMD (T. Harter, C. Anastasiou, W. VanGelder, M. Walton, E. Walters)’ at one transect in 2025. A dash in the Reported column indicates a missed species; a dash in the True column indicates a false positive.
Transect Species Abundance reported Abundance true Blade Length reported Blade Length true
3 Halodule 51-75% 51-75% 23.0 24.6
3 Thalassia solitary few 7.2 23.0

Step 4: Metric Scores

For each metric, deviations from the true value are summarised per species across transects, then combined into a single number per group per metric. The combination uses a weighted mean of absolute differences, where the weight for each species is the inverse of the standard deviation of the true values across transects:

\[ \text{metric score}_{\text{raw}} = \sum_{\text{species}} \frac{|\bar{d}_s|}{1 + \sigma_s} \]

where \(\bar{d}_s\) is the mean absolute deviation for species \(s\) and \(\sigma_s\) is the standard deviation of the true values for that species across transects. Species where the true value varies a lot across transects (high \(\sigma_s\)) contribute less to the final score, since agreement there is inherently harder to achieve.

Table 4: Per-species deviation summary for group ‘SWFWMD (T. Harter, C. Anastasiou, W. VanGelder, M. Walton, E. Walters)’ in 2025 — Abundance metric.
Species Reported (avg) True (avg) Mean deviation
Halodule 25-50% 25-50% 0
Syringodium <5% <5% 0
Thalassia <5% 5-25% -1

Step 5: Score Calibration

Raw metric scores are converted to a 0–100 scale. Without calibration, the best group in any year always maps to ~100 and the worst always maps to ~50, regardless of how closely groups agreed. This means a year where everyone performed very well would still produce a spread from A to D — an unfair outcome.

To address this, the score floor (the minimum possible score) is raised in years when all groups agree closely with each other, and kept at 50 in years when disagreement is typical or high.

How calibration works

For each year and metric, we compute the within-year standard deviation of group deviations — a measure of how spread out the groups were. Across all training years, these yearly spreads are standardised to z-scores. A negative z-score means groups agreed more than usual; a positive z-score means more disagreement than usual.

The score floor for a given year and metric is:

\[ \text{floor} = \max\!\left(50,\ 50 - z \times 15\right) \]

where \(z\) is the z-score of within-year spread for that metric and year, and 15 is a scaling constant (grade-points per standard deviation). A year that is 1 SD tighter than average gets a floor of 65; 2 SDs tighter gets a floor of 80. Loose years (positive z) are capped at 50 — no extra penalty beyond the standard range.

Figure 1: Within-year spread of group deviations for each metric and year. Bar colour shows the z-score; the dashed line is the historical mean. Bar labels show the z-score and the resulting score floor. Blue (negative z) years are tighter than average and receive a higher score floor.

Effect on scores: a tight vs. loose year

The table below compares the calibrated score floor for each year and metric, illustrating how the floor shifts in tighter training years.

Table 5: Calibrated score floor by year and metric. Years with consistently tight agreement receive a higher floor, raising the minimum grade for all groups in that year.
Year Abundance Blade Length Short Shoot Density
2020 50 68 50
2021 55 50 55
2022 50 50 50
2023 50 61 55
2024 50 59 58
2025 74 50 61

Step 6: Letter Grades

After calibration, each group’s numeric score for each metric falls on a 0–100 scale. These are mapped to letter grades using fixed thresholds:

Table 6: Letter grade thresholds. Scores at or above the lower bound but below the upper bound receive the listed grade.
Grade Score range
A 95 – 100
A- 90 – 94
B+ 85 – 89
B 80 – 84
B- 75 – 79
C+ 70 – 74
C 65 – 69
C- 60 – 64
D+ 55 – 59
D below 55

The Total score is the unweighted average of the Abundance, Blade Length, and Short Shoot Density numeric scores, then converted to a letter grade using the same thresholds.


Worked Example

The following walks through the full scoring pipeline for SWFWMD (T. Harter, C. Anastasiou, W. VanGelder, M. Walton, E. Walters) in 2025.

Raw deviations

Table 7: Per-species deviations across all metrics for group ‘SWFWMD (T. Harter, C. Anastasiou, W. VanGelder, M. Walton, E. Walters)’ in 2025.
Species Reported avg True avg Mean deviation
Abundance
Halodule 6.0 6.0 0.0
Syringodium 4.0 4.0 0.0
Thalassia 4.0 5.0 -1.0
Blade Length
Halodule 11.4 13.2 -1.8
Syringodium 6.4 13.9 -7.5
Thalassia 11.9 18.0 -6.1
Short Shoot Density
Halodule 6.3 7.2 -0.9
Syringodium 1.0 0.7 0.3
Thalassia 1.9 1.6 0.3

Scores

Table 8: Final scores for group ‘SWFWMD (T. Harter, C. Anastasiou, W. VanGelder, M. Walton, E. Walters)’ in 2025 — numeric (calibrated) and letter grade.
Metric Numeric score Letter grade
Abundance 91.4 A-
Blade Length 82.3 B
Short Shoot Density 82.5 B
Total 85.4 B+

How the calibration affected this group’s scores

Table 9: Effect of calibration on scores for group ‘SWFWMD (T. Harter, C. Anastasiou, W. VanGelder, M. Walton, E. Walters)’ in 2025. The floor is the lowest possible score any group could receive in this year for each metric.
Metric Score without calibration Score with calibration Z-score (within-year spread) Score floor
Abundance 83.3 91.4 -1.60 74
Blade Length 82.3 82.3 0.67 50
Short Shoot Density 77.4 82.5 -0.76 61

A negative Z-score (tighter than average cohort) raises the floor for all groups, including this one. A floor of 50 means no calibration adjustment was applied.