Mythos Mozilla Vulnerabilities: Counting engineer-days — a model for CVE remediation cost

Mozilla says Firefox 150 included fixes for 271 vulnerabilities identified during an initial Claude Mythos Preview evaluation. Mozilla's 271 figure includes rollup CVEs, Extended Support Release (ESR) branch and submodule fixes, and vulnerabilities not separately credited. The 131 directly described CVEs are the public surface; the remainder represents real remediation work the advisories do not individually attribute. So we did something different: we graded all 131 directly-described CVEs across the three most recent Firefox releases to build a defensible model of what remediation actually costs. Eight weeks of Firefox CVEs, measured.

AppSecAI — May 2026

Background

On April 21, 2026, Mozilla shipped Firefox 150 and credited Anthropic's red team using the Claude Mythos preview with surfacing 271 vulnerabilities. That number got a lot of attention. Most of the coverage focused on the discovery side: an AI program found a lot of bugs.

Mozilla's May 2026 retrospective subsequently disclosed that more than one hundred contributors were involved in landing these fixes across Firefox 148, 149, and 150. The find side scaled with one model; the fix side scaled with more than one hundred people.

So we wanted to look at the fix side in more detail. Not "how many bugs did AI find?" but "what does it cost to fix them all?" Finding bugs is getting cheaper. Fixing them is not. If the find rate keeps climbing and the fix rate stays flat, the security pipeline backs up.

To build a dataset large enough to model remediation costs, we analyzed the public CVEs across Firefox 148, 149, and 150, covering an eight-week window of releases. For each CVE, we pulled the mainline fix from mozilla-central, scored it on a three-axis difficulty rubric, and translated the combined score into engineer-day estimates.

The analysis uses four exploitability tiers to separate the bugs that matter most from the ones that don't:

T1 (Primary driver): Remote code execution, memory corruption, sandbox escape. Exploitable on its own.
T2 (Chain enabler): Same-origin policy bypass, CSP bypass, lifetime leaks. Not exploitable alone, but a building block.
T3 (Reachable but limited): Spoofing, permission logic, denial of service. An attacker can reach it, but the blast radius is small.
T4 (Hardening): Defense-in-depth, fuzzer crash cleanup, assertion improvements. Not a realistic attack vector.

TL;DR

We graded all 131 directly-described CVEs in MFSA 2026-13, 2026-20, and 2026-30 (the security advisories for Firefox 148, 149, and 150) on a three-axis difficulty rubric covering patch size, reasoning complexity, and validation difficulty. We then translated the combined scores into engineer-day estimates.

For the 111 CVEs in tiers T1, T2, and T3 (bugs that carry exploit primitives or serve as chain enablers, excluding fuzzer-only hardening):

Conservative — no AI: ~656 engineer-days over the 8-week release window. At a fully-loaded cost of $375K per engineer-year (including salary, benefits, management allocation, CI/CD compute, AI tooling, G&A, and recruiting — see appendix) and 200 working days per year (~$1,875 per working day), that is ~$1.23M for fixes alone in the 8-week window.
Central — 25% AI-assisted: ~492 engineer-days, ~$922K.
Optimistic — 50% AI-assisted: ~328 engineer-days, ~$615K.
FTE equivalent (8-week window / annualized): Conservative: 16.4 / 14.2 FTE (~$1.23M / ~$5.33M). Central: 12.3 / 10.7 FTE (~$922K / ~$4.00M). Optimistic: 8.2 / 7.1 FTE (~$615K / ~$2.66M). The difference is the engineering capacity that AI assistance returns to a security organization at this rate of CVE production.
Per-CVE cost (n=111 T1-T3 CVEs): ~$11,070 Conservative, ~$8,300 Central, ~$5,540 Optimistic.

The compression is largest in the middle of the difficulty curve. The hardest fixes (JIT correctness, sandbox semantics) compress less because committee review and fuzzing validation take calendar time regardless of who drafted the patch.

We are not claiming that 25% or 50% is the right number. The find rate is increasing faster than the fix rate, and Mythos-class programs are about to make the fix backlog the binding constraint. A controlled experiment is the next step. We are publishing these estimates so the conversation can start with numbers.

Why measure this

Mozilla's blog post about Firefox 150 cited "271 vulnerabilities" surfaced through a partnership with Anthropic. That number does two jobs at once: it tells a vulnerability-discovery story (Mythos found a lot of bugs) and it implies an operations story (Mozilla's security team had to triage, classify, and fix all of them).

The discovery side has gotten most of the attention. The operations side — what does it actually cost to absorb a flood of high-quality bug reports — has been mostly ignored. That gap matters. If AI-driven discovery scales the way Mythos suggests, the bottleneck in the security pipeline shifts from finding to fixing.

What "fixing" actually means

What happens between the bug report received and the fix shipped is worth being specific about.

Fixing a security vulnerability in a browser is not just writing new code. It starts with reproducing the bug and understanding the root cause, which for memory-safety issues in a codebase the size of Firefox can take days on its own. Then comes designing the right fix for the codebase: not just a patch that makes the test case pass, but a solution that handles related edge cases, doesn't break other subsystems, and is maintainable by whoever touches that code next. After that comes testing, both confirming the fix works and confirming it doesn't regress anything else, often across multiple platforms and configurations. Finally, there's code review, landing, and the downstream work of backporting to Extended Support Release (ESR) branches.

A team that doubles its find rate without expanding its fix capacity does not become twice as secure. It becomes twice as backlogged.

We wrote this analysis to put a defensible number on the fix side of that equation, for a codebase where the data is public.

A note on what this is not: we do not have access to Mozilla's internal time-tracking. We did not interview Mozilla engineers. We did not run a controlled experiment. The numbers below are what an outside engineer with reasonable familiarity with browser security work would estimate from the public artifacts: the MFSA advisory pages, the mozilla-central commit log, the Bugzilla entries that are not under core-security lock. The model is an initial estimate, calibrated to industry experience and existing AI-productivity literature. It is not a measurement.

The dataset

Three Firefox stable releases over an ~8-week window:

Release	MFSA	Released	Direct CVEs	Rollup CVEs
Firefox 148	2026-13	24 Feb 2026	48	3
Firefox 149	2026-20	24 Mar 2026	43	3
Firefox 150	2026-30	21 Apr 2026	40	3

We use 131 direct CVEs (48 + 43 + 40). The 9 rollup CVEs ("Memory safety bugs fixed in Firefox N," the catch-alls that bundle dozens of fuzzer crashes under one CVE ID) are excluded from the per-CVE effort distribution because they would distort the shape of the curve. We return to the rollups below; they likely add another ~50 engineer-days that our totals do not capture.

CVE counts were verified against the live MFSA pages on 2026-05-13. MFSA 2026-30 had been amended since our initial pull to add two CVEs (CVE-2026-7321 and CVE-2026-8091) with non-sequential IDs, both reported by Mozilla's internal fuzzing team and folded into the FF150 advisory post-publication.

For each MFSA bug ID, we ran git log _RELEASE.._RELEASE --grep="Bug NNNNNNN" against a local mozilla-central clone. Matching commits give us files-changed and lines-added+deleted (LoC) via git numstat. When a bug ID maps to multiple commits, files and LoC are summed.

Of the 131 direct CVEs, 18 do not map to any commit in the mainline release window. We classified each:

5 are Bugzilla core-security restricted (Access Denied, visible only to Mozilla's security group)
6 have old bug IDs (pre-2,000,000 in Bugzilla, indicating they were filed in 2024 or earlier) whose mainline commit wasn't picked up by our literal Bug NNNNNNN grep. We checked each against the public Bugzilla record. At least one (CVE-2025-59375, libexpat in the XML component) actually did land on mainline 149 and Firefox ESR 140.9, but under a different bug number (the fix shipped under bug 1988534, the libexpat 2.7.3 vendor bump). Others appear to be backports to the long-term-support branches — Firefox ESR 115 (still alive for legacy Windows 7/8/8.1 through August 2026) and Firefox ESR 140 (the current ESR line, which superseded ESR 128 in mid-2025). A couple remain security-restricted on Bugzilla, so we can't fully verify them from public data.
3 are committed but tagged only against the next release (cherry-pick onto a branch our clone does not carry)
2 are in the NSS (Network Security Services) submodule (separate repo)
1 was a partner-repo / NSS-adjacent miscellaneous case
1 was a post-publication addition to MFSA 2026-30 (CVE-2026-8091) whose mainline commit is not present in our clone, likely a FF150 dot-release backport that the advisory was later updated to reference

These 18 carry the reasoning and validation scores we introduce in the next section, but no patch-size score. They are excluded from the engineer-day totals, which biases our totals low, since core-security-restricted bugs may include some of the highest-severity issues. We did not invent patch-size data we do not have.

The three-axis rubric

Lines of code is the wrong unit for measuring how hard a bug is to fix. A 5-LoC fix can take three weeks if it is a JIT mis-compilation. A 200-LoC fix can take three hours if it is mechanical error-handling. We split fix difficulty into three orthogonal axes that scale with different things:

D1 — Implementation effort. How big is the patch and how spread out is it? Scales with edit volume.
D2 — Reasoning difficulty. How hard is it to find the root cause and decide on the right fix? Scales with subsystem expertise.
D3 — Test/validation difficulty. How hard is it to convince yourself and a reviewer that the fix is right and does not regress anything? Scales with test infrastructure and the bug class's amenability to deterministic regression coverage.

Each axis is graded 1–5. The combined score (D1 + D2 + D3) ranges from 3 to 15.

D1 — Implementation effort

Mechanical mapping from git numstat:

Score	Anchor
1	≤10 LoC, 1 file
2	≤50 LoC, ≤2 files
3	≤200 LoC, ≤5 files
4	≤500 LoC, ≤10 files
5	>500 LoC or >10 files

This is the most defensible axis because it comes directly from observable patches.

D2 — Reasoning difficulty

Driven by vulnerability class with a small bump for components where reasoning is intrinsically harder:

Score	Tier	Vulnerability classes
1	Routine	DoS, spoofing
2	Standard	boundary, integer overflow, uninitialized memory, info disclosure
3	Skilled	undefined behavior, mitigation bypass, privilege escalation, manual lifetime errors
4	Expert	UAF, sandbox escape, race UAF
5	Specialist	JIT mis-compilation, JS Engine UAF, JS GC, WASM lifetime

Components in JavaScript Engine, JIT, GC, WebAssembly, IPC, and XPCOM get a +1 bump on D2 and D3. Narrow OS-surface components (Widget: Cocoa) and simple parsers (XML) get a −1.

D3 — Test/validation difficulty

Score	Tier	Why testing is hard
1	Easy	Deterministic input → crash. One regression seed.
2	Standard	Subsystem setup needed. Codec input, graphics context, network state.
3	Hard	Negative + positive coverage required. Mitigation-bypass tests need both "blocked" and "still allowed" cases.
4	Very Hard	Flaky or non-deterministic. Race conditions, sandbox escape, lifetime issues.
5	Specialist	Invariant-based, not example-based. JIT correctness across optimization tiers; GC marking; WASM type system.

D3 is where the rubric leans hardest on judgment, and it is also where AI-assisted regression-test generation has the most plausible leverage.

Worked example: CVE-2026-2764

The largest Anthropic-credited fix in Firefox 148 illustrates how the rubric behaves at the top end. The MFSA describes it as a "JIT mis-compilation leading to use-after-free in the JavaScript engine." The mainline fix touches 5 files and changes 213 lines.

Walking through the rubric:

D1 = 4. 5 files and 213 LoC fits the "≤500 LoC, ≤10 files" anchor.
D2 = 5. JIT mis-compilation sits at the specialist tier baseline. The component (JavaScript Engine: JIT) would normally add +1, but D2 is already at the cap of 5.
D3 = 5. JIT mis-compilation starts at 5 for validation difficulty. This is the case where validation is invariant-based: the fix must hold across baseline, Ion, and Warp tiers under various OSR conditions. A test that exercises only one tier does not prove the fix.

Combined: 14. Under our baseline curve, that maps to about 30 working days with normal reviewer churn. Under a 50% AI-assistance scenario, about 15 days.

The rubric scoring for this CVE is reproducible from cve_difficulty_grades.csv. A reviewer who lived the bug might argue for a 15 (D1 staying at 4 but pushing D2 or D3 above their baseline with the JIT bump); doing so would add roughly 15 days to the T1 (primary driver) subtotal alone. We chose 14 because we apply the cap consistently across all CVEs.

What the distribution looks like

Across the 113 mapped CVEs (111 in T1–T3 plus 2 in T4), combined-score distribution:

Score range	Bucket	Count	%
3–5	Trivial (≤1 day)	14	12%
6–7	Small (1–2 days)	32	28%
8–9	Skilled (3–5 days)	29	26%
10–11	Hard (1–2 weeks)	26	23%
12–13	Specialist (2–4 weeks)	9	8%
14–15	Top tier (1–2 months)	3	3%

The shape matters more than any individual score. About 41% of mapped fixes (combined ≤7) are short enough that an experienced engineer can land them in a day or two. The remaining 59% are where the budget piles up. The 38 fixes scoring 10+ (about a third of the mapped count) account for roughly 60% of total estimated effort. The 12 fixes scoring 12+ account for about 30% of the budget on roughly 10% of the count.

The long-tail rule

This is the security-remediation equivalent of a Pareto distribution: about a third of CVEs eats roughly 60% of the engineering budget. It is the single most useful planning artifact in this analysis, and it has three concrete implications for a security organization:

Average per-CVE cost is misleading. Capacity planning on the per-CVE average understates true workload by roughly 2×, because the average is pulled down by the trivial tail.
Staff for the tail, not the average. Two senior specialists who can ship JIT correctness, sandbox-semantics, or GC-marking fixes account for more risk-relevant output than five generalists handling trivial bugs. "We need N more security engineers" undersells the staffing problem — you need specific engineers, and the market is thinnest at the tail.
AI compression is asymmetric. AI coding assistance shortens the middle of the difficulty curve most, the long tail least. Doubling AI investment does not double remediation throughput on the bugs that actually consume your budget.

Anyone who has run a security fix queue will recognize this pattern: the headline rate (fixes per week) hides the fact that one or two specialist bugs eat a disproportionate share of capacity.

Cross-tabbed against exploitability tier, we get the heat map:

Tier	Low (≤2d)	Medium (3–5d)	High (1–2w)	Extra High (2+w)	CVEs
T1 Primary driver	20	13	18	10	61
T2 Chain enabler	12	12	7	2	33
T3 Reachable but limited	12	4	1	0	17
T4 Hardening	2	0	0	0	2

The dominant cell is T1 × High (18 CVEs) and T1 × Extra High (10 CVEs). This is where most of the engineering budget lives: RCE-class memory corruption, use-after-free in JavaScript Engine and DOM, sandbox escapes, JIT mis-compiles. T3 and T4 are small both in count and in per-CVE cost.

If you are wondering why T4 is only 2 CVEs given that fuzzing finds many low-impact crashes: most of those bugs land in the rollup CVEs (CVE-2026-2792/2793 in FF148, CVE-2026-6784/6785/6786 in FF150), which we excluded.

Three productivity scenarios

Combined scores get translated to engineer-days through a baseline curve, then we apply three levels of AI-assisted compression to show how the numbers move.

The baseline curve

Combined score	Baseline days
3	0.5 d
4	0.7 d
5	1.0 d
6	1.5 d
7	2.0 d
8	3.0 d
9	5.0 d
10	7.0 d
11	10.0 d
12	15.0 d
13	20.0 d
14	30.0 d
15	45.0 d

The curve is anchored at the endpoints. Score 3 (one-line fix to a deterministic DoS in a low-complexity component) lands at half a day, which is the irreducible floor for the type-the-patch, rebase, push, and land cycle in a large project. Score 15 lands at 45 days, anchored to CVE-2026-2764 and similar JIT-correctness fixes that demand cross-tier validation. The middle of the curve uses 1.5–2× growth per score point in the upper half, consistent with standard effort estimation models (COCOMO-family, Putnam, NCB).

Sensitivity analysis: 0%, 25%, 50% AI-assisted compression

Rather than pick a single AI-productivity multiplier, we model three scenarios:

0% (no AI assistance): The baseline. An experienced browser security engineer working without AI tooling.
25% reduction: A conservative estimate, roughly in line with the lower end of published AI-coding-assistant studies. AI helps with test scaffolding and narrowing root cause, but the engineer still does most of the reasoning and all of the review.
50% reduction: An optimistic estimate, closer to the upper end of published studies for well-defined tasks. AI handles significant portions of diagnosis, patch drafting, and regression-test generation, but committee review and fuzzing-bake time still take calendar time.

Published results currently range from measured slowdowns (METR 2025) to self-reported gains of 25% to 50% on bounded development tasks (Copilot field studies). We bracket 0%, 25%, and 50% improvement scenarios to span the current published range rather than assuming consensus.

On the question of whether 50% is realistic: 50% reads better as an upper-bound scenario than as the central estimate. Weighted against where the budget actually concentrates in this dataset, particularly the ~60% of effort sitting in score-10-plus remediation work, the long tail compresses less aggressively than optimistic projections imply.

A defensible portfolio-average central case is closer to 25%.

Published productivity studies on AI coding assistants (METR, DORA 2024, GitHub Copilot field studies) report a wide range of results depending on task type and measurement methodology. We chose 0%/25%/50% to bracket that range rather than claim precision we do not have.

The aggregate result

Per tier, over the 8-week window:

Tier	Mapped CVEs	Conservative Baseline days	Central — 25% reduction	Optimistic — 50% reduction
T1 Primary driver	61	~437	~328	~218
T2 Chain enabler	33	~179	~134	~90
T3 Reachable but limited	17	~40	~30	~20
T1 + T2 + T3 subtotal	111	~656	~492	~328

In dollars

At a fully-loaded cost of $375K per engineer-year and 200 working days per year (~$1,875 per working day). The $375K figure is the all-in cost per FTE — salary, benefits, management allocation, CI/CD compute, AI tooling and inference, G&A, and recruiting amortization. The breakdown is in the appendix.

	8-week window	Annualized	8-wk FTE	Annual FTE	Per CVE (n=111)
Conservative — no AI	~$1.23M	~$5.33M	16.4	14.2	~$11,070
Central — 25% AI	~$922K	~$4.00M	12.3	10.7	~$8,300
Optimistic — 50% AI	~$615K	~$2.66M	8.2	7.1	~$5,540

The "annualized" column assumes this release-window pattern persists for the full year. Mozilla cuts ~13 stable releases per year on a 4-week cadence, so the per-release rate from our 3-release sample is the basis for the extrapolation. The "per CVE" column divides the 8-week total by 111 mapped T1-T3 CVEs.

The $375K assumption is a reasonable midpoint for a senior browser security engineer in a US-based organization with standard overhead. Adjust upward for higher-cost geographies (senior Bay Area JIT specialists, for example) or more senior staff; adjust downward for non-US or contract engineers. The cost of running AI tooling at scale on production code is included in the $375K under "AI tooling allocation"; teams running heavy inference workloads should consider whether that line item is sufficient.

Key takeaways

Before diving into caveats, here is what the data says:

Remediation costs are concentrated in T1 bugs. The 61 T1 primary-driver CVEs account for ~67% of total baseline effort (~437 of 656 engineer-days). These are the bugs that matter most for risk, and they are also the most expensive to fix.
The long-tail rule applies. About a third of mapped CVEs (those scoring 10+) consume roughly 60% of total engineer-days. Capacity planning on per-CVE averages will underestimate the real workload by roughly 2×, and staffing decisions should be tail-weighted — you need specialists who can ship the hardest fixes, not headcount that can cover the average.
AI assistance helps most in the middle of the curve. Trivial fixes (score 3–5) are already fast and don't leave much room to compress. The hardest fixes (score 14–15) have irreducible review and validation time. The 25–50% compression has the most room to work in the score 6–12 range.
At baseline, this 8-week window represents ~$1.23M in fully-loaded labor for 111 T1-T3 fixes. Annualized, that's roughly $5.33M in engineering capacity devoted to security remediation alone.
Even a modest 25% AI-assisted compression saves ~$307K per 8-week window (~$1.33M annualized) and frees up ~4 FTEs of engineering capacity.

What this doesn't tell you

A short list of things we are deliberately not claiming:

This is not Mozilla's actual cost. We do not have time-tracking data. The numbers are what an outside engineer with subsystem familiarity would estimate, applied uniformly. Mozilla's actual costs are presumably lower on the upper end (their JIT engineers are among the best in the world) and possibly higher in the middle (real reviewer churn is worse than the model assumes).
This is not an AI productivity measurement. The AI-assisted scenarios are modeled from published productivity literature, not measured against actual LLM-augmented Mozilla engineers. Converting this into a measurement requires a controlled study.
The exclusion of 18 no-mainline CVEs biases the totals low. Five of those 18 are core-security restricted, which may include some of the highest-severity issues in the set. Including them with conservative D1 estimates would push the total up.
The exclusion of rollup CVEs biases the totals low for fuzzer-class bugs. Rollup CVEs each hide between 5 and 316 fuzzer-found bugs. A complete "engineer-days to fix all of Firefox 150's security bugs" number would have to add the rollup work, typically a fraction of a day per fuzzer crash but multiplied across hundreds of bugs.
D2 and D3 are by class, not by bug. Two UAFs in different DOM components score the same. A reviewer who lived a particular bug would know whether it was actually a 1-day or a 5-day fix. The rubric is consistent, not personalized.
Anthropic-credited bugs are a non-random sample. Mozilla credits Anthropic specifically for the optimizer-class and lifetime-class finds, the bugs an LLM-driven discovery program is best at surfacing. The Anthropic share-of-effort figure (43% of total engineer-days from 24% of CVE count) is a property of what got submitted under that program, not a generalizable claim about AI bug-finding overall.
The component-bump table is small. It covers JIT/GC/JS Engine/WASM/IPC/XPCOM/Cocoa/XML/NSS/IP Protection. Other complexity-heavy components (Networking: HTTP/3, Layout: Flexbox, Graphics: WebRender shader compilation) do not get a bump even though they probably should. The default likely understates effort for 10–20 CVEs.

We know where this analysis is most exposed. If something looks wrong, start with the seven items above.

The shape of the problem to come

Mozilla's 271-vulnerability number is the headline of one quarter, in one browser, with one AI red-teaming program running in preview. Its successor models will not find fewer bugs. Other vendors are running their own programs with similar tooling. The find rate is a step function, and we just stepped.

The fix rate is not a step function. It scales with engineering capacity, which scales with hiring, which scales with budget and time. Browser vendors and other large security-critical projects have a finite number of senior engineers who can ship a JIT correctness fix in under three weeks. That ceiling does not move when discovery programs add a zero to their output.

This analysis applies to existing, large codebases with accumulated technical debt, which describes most of the software that matters for security. Greenfield codebases written with AI assistance from the start may have different vulnerability profiles, though current evidence suggests LLM-generated code introduces its own categories of security bugs.

What the data suggests, with all its uncertainty, is that AI assistance on the fix side of the pipeline (diagnosis, patch drafting, regression-test generation) is the most plausible relief valve. The compression we modeled (25–50%) is enough to keep pace with a doubling of the find rate. It is not enough to keep pace with a 5× or 10× increase. Those numbers will need real measurement, not modeling, before anyone can plan around them.

We wrote this so the conversation can start with a quantified estimate instead of intuition. The chart is the estimate. A controlled study, with a real codebase, real engineers, with and without AI assistance, scored against a held-out fix set, is the experiment. We are interested in running that experiment with anyone whose codebase is willing.

Postscript: what we do at AppSecAI

The Mozilla numbers above are about the find-fix gap. AppSecAI is built for the fix side of that gap.

We turn scanner findings into pull requests. You connect your existing scanner (Fortify, Snyk, Checkmarx, SonarQube) and our engine does what a senior security engineer would do with each finding: confirm it's real, locate the right place to fix it in your code, write the patch, generate the regression test, and open a PR your team can review and merge. Two products: Expert Triage Automation (ETA) for the confirm-and-prioritize step, and Expert Fix Automation (EFA) for the write-the-patch step. Pricing is pay-per-fix, so you pay for outcomes, not seats.

What we've measured on our public benchmarks:

97% triage accuracy on the OWASP Benchmark
93% fix accuracy on an open-sourced evaluation
10–100× faster than the manual workflow
8.2 minutes of expert validation time per fix (with AI generation running in roughly 42 seconds)

This is what we mean when we say Mythos found the bugs, and the find-fix gap is where the next round of security risk lives. AI red-team programs can surface hundreds of vulnerabilities in weeks. The remediation queue is what determines whether any of that actually reduces risk in production. AppSecAI is the remediation half of that pair.

If you have a scanner backlog you'd like to put numbers around, we run a 30-minute call where we show you what a representative fix from your codebase looks like end-to-end. appsecai.io.

Mozilla advisories:

Anthropic's red-team preview:

Mythos Preview

Related:

Time to Fix Vulnerabilities

Appendix — Fully-loaded cost per FTE ($375K)

~$250K base salary + benefits + 401(k) match
~$40K management allocation (assuming one manager per eight ICs at roughly $350K loaded)
~$15K–$25K cloud and CI/CD costs
~$5K–$15K AI tooling allocation
~$40K–$50K G&A allocation
~$10K–$15K recruiting amortization

Range: $360K to $405K per FTE. We use $375,000 in the calculations above — a clean 1.5× of the $250K salary-plus-benefits floor.

AppSecAI is automated remediation built for the find-fix gap this analysis describes. Expert Triage Automation (ETA) validates each scanner finding and routes false positives off your queue. Expert Fix Automation (EFA) writes the patch, generates the regression test, and opens the PR for human review. Pay-per-fix; 10× to 100× faster on the cleared volume. If your codebase is in that flow, we'd like to talk.

May 22, 2026 11:10:51 AM | AppSec Mythos Mozilla Vulnerabilities: Counting engineer-days — a model for CVE remediation cost

Background

TL;DR

Why measure this

What "fixing" actually means

The dataset

The three-axis rubric

D1 — Implementation effort

D2 — Reasoning difficulty

D3 — Test/validation difficulty

Worked example: CVE-2026-2764

What the distribution looks like

The long-tail rule

Three productivity scenarios

The baseline curve

Sensitivity analysis: 0%, 25%, 50% AI-assisted compression

The aggregate result

In dollars

Key takeaways

What this doesn't tell you

The shape of the problem to come

Postscript: what we do at AppSecAI

Written By: Bruce Fram

May 22, 2026 11:10:51 AM | AppSec Mythos Mozilla Vulnerabilities: Counting engineer-days — a model for CVE remediation cost

Share

Background

TL;DR

Why measure this

What "fixing" actually means

The dataset

The three-axis rubric

D1 — Implementation effort

D2 — Reasoning difficulty

D3 — Test/validation difficulty

Worked example: CVE-2026-2764

What the distribution looks like

The long-tail rule

Three productivity scenarios

The baseline curve

Sensitivity analysis: 0%, 25%, 50% AI-assisted compression

The aggregate result

In dollars

Key takeaways

What this doesn't tell you

The shape of the problem to come

Postscript: what we do at AppSecAI

Written By: Bruce Fram

You May Also Like

May 22, 2026 11:11:19 AM | AppSec Anthropic Mythos found 271 Firefox bugs. What does it cost to fix them?

May 1, 2026 10:30:00 AM | AppSec The Last Report You'll Ever Need to Build Manually

Apr 18, 2026 10:30:00 AM | AppSec Everyone's Panicking About Mythos. We Made Popcorn