Units of Work in AI and Crypto: From Bitcoin PoW to Salesforce AWU

Five ways industries measure work — Bitcoin Proof of Work, Ethereum gas, training FLOPs, LLM tokens, and Salesforce AWU. Where each metric breaks, how they line up on a dollar scale, and why the intersection of AI and crypto is the most interesting place in the spectrum.

19 May 2026 15 min read

On February 25, 2026, on its Q4 FY26 earnings call, Salesforce introduced a new metric — the Agentic Work Unit (AWU). Salesforce defines an AWU as “one discrete task accomplished by an AI agent” — decisions made, records updated, workflows triggered. Practically, at the platform level, that’s a prompt processed, a reasoning chain completed, or, most importantly, a tool invoked. On the same call, Marc Benioff said something more useful than the metric itself: “A token on its own doesn’t know your customers, your pipeline, your org chart, but Salesforce does.” Then: “The value isn’t in the token. The value is in what our platform does with it, the work.” Patrick Stokes added: “you can ask it a question and it can write you a poem, but that’s not really all that valuable in the enterprise world.” Over the quarter, Salesforce produced 771 million AWUs; 2.4 billion in total since launch.

The trend is clear: some count how much the model talks, others try to count how much work it actually did. This is not a new conversation. Each digital industry has picked its own unit of work: Bitcoin — hashes per second, since 2009; Ethereum — gas; ML infrastructure — FLOPs; LLM providers — tokens; Salesforce — AWUs. Each is solving (and failing) the same problem: how to formalize the thing that actually matters to the economy — a useful outcome.

This article is a comparative survey of five such metrics on a single coordinate system. For a deep dive on AI agent tokenomics and the related infrastructure layer, see AI Agent Tokenomics: From Memecoins to Revenue Share; this piece is about the units of measurement themselves.

Three categories of “work”

Sort the five metrics by what they physically measure and three layers emerge — each further from hardware, closer to value, and less unambiguous.

Physical work. Proof through burned energy. A Bitcoin hash is a concrete computation that leaves a physical trace in the form of a consumed kilowatt-hour. There is no reverse path: a hash, once computed, is irrevocably spent watts.
Computational work. Proof through counted operations. Ethereum gas, training FLOP, LLM inference token. Each has a registered cost in processor instructions, but the direct link to energy is blurred — the same operation can be executed more or less efficiently.
Task-level work. Proof through outcome. AWU, TPS, “closed ticket”, “updated CRM record”. Here the link to processor instructions is almost invisible — a task can take one message or forty steps, and the unit counts the closure.

The key point

The higher the layer, the closer the metric to economic value for the buyer — and the higher the risk it turns into a vanity metric optimized for its own sake. Bitcoin cannot produce “an extra hash for hash’s sake” (it instantly becomes more expensive); Salesforce can produce “an extra AWU for AWU’s sake” — which is exactly the main objection raised against AWU by CIO.com.

The math of five metrics

The minimum amount of math needed to work with the numbers.

Bitcoin PoW. Network mining boils down to brute-forcing SHA-256 nonces until the output falls below a target value. Expected number of hashes per block:

H_block ≈ D · 2³²

D — current network difficulty (dimensionless)
2³² ≈ 4.295 · 10⁹ — a consequence of how the Bitcoin target is encoded relative to difficulty = 1
H_block — expected hashes until a valid block is found
May 2026: D ≈ 136.61 · 10¹², H_block ≈ 5.87 · 10²³ hashes

Network hashrate is the time derivative: ~970 EH/s ≈ 9.7 · 10²⁰ hashes per second. A miner gets a fixed block reward (after the fourth halving in April 2024, the reward is 3.125 BTC per block until the next halving around 2028) plus fees — the decay schedule is analyzed in detail in Reward emission models.

Ethereum gas. Each EVM operation has a fixed cost in gas units. A simple ETH transfer — 21,000 gas. ADD — 3 gas. SLOAD on the first access to a slot (cold) — 2,100 gas; subsequent (warm) — 100 gas. SSTORE from zero to non-zero — 20,000 gas on warm access; in the typical “cold slot” case the COLD_SLOAD_COST of 2,100 is added, for 22,100 total (EIP-2929/2200). Transaction cost:

Fee_tx = gas_used · gas_price · 10⁻⁹ · P_ETH

gas_used — total gas across all opcodes in the transaction
gas_price — gas price in Gwei (1 Gwei = 10⁻⁹ ETH)
P_ETH — ETH price in USD
Fee_tx — fee in USD (computed)
May 2026: gas_price ≈ 0.1–0.7 Gwei (the norm after Dencun and blob transactions)

Training FLOP. Model training cost is measured in floating-point operations. The empirical rule for transformers:

C_train ≈ 6 · N · D

N — number of model parameters (for MoE — active parameters per token, not total)
D — number of tokens in the training corpus
C_train — total FLOP budget
Examples: GPT-4 ≈ 2 · 10²⁵ FLOP (Epoch AI), GPT-4o ≈ 3.8 · 10²⁵ FLOP (SemiAnalysis estimate), Gemini Ultra (2023) ≈ 5 · 10²⁵ FLOP; per Epoch AI, about 30 publicly known models had crossed 10²⁵ FLOP by 2025

LLM token. The formula is trivial:

Cost_inference = (N_in · P_in + N_out · P_out) / 10⁶

N_in, N_out — input/output token counts
P_in, P_out — price per 1 million input/output tokens, USD
Cost_inference — cost of one query, USD
May 2026 output prices per 1M: Claude Opus 4.7 — $25; Claude Sonnet 4.6 — $15; GPT-5.5 — $30; GPT-5.4 — $15

AWU. Salesforce does not publish a formula, but from the announcements it’s clear: 1 AWU = 1 discrete agent action — a prompt processed, a reasoning chain completed, or a tool invoked. On the Flex Credits pricing schedule it’s $0.10 per action. At the platform level:

AWU_period = Σᵢ Tasks_i

Tasks_i — agent tasks such as “update a record”, “close a ticket”, “invoke a tool”
AWU_period — total over the period
Q4 FY2026: 771 · 10⁶ AWUs in a single quarter; cumulative — 2.4 · 10⁹

Comparative table: five systems of measuring work

Side by side, the landscape becomes obvious.

System	What it physically measures	Unit	Who pays	Pricing mechanism	Main limitation	Reference point (May 2026)
Bitcoin PoW	Direct SHA-256 brute force	1 hash	miner (electricity and ASIC depreciation)	difficulty adjusts to a target block time (~10 min)	Cost is unrelated to the usefulness of transactions	Network: ~9.7 · 10²⁰ h/s; per block: ~5.87 · 10²³ hashes
Ethereum gas	EVM operations (opcodes)	1 gas unit	the sender of the transaction	EIP-1559 base fee + tip; varies with block congestion	Price is set by competition, not usefulness (MEV, gas wars)	ADD = 3; SLOAD cold = 2,100; SSTORE = 20,000; transfer = 21,000
Training FLOP	Floating-point multiply/add	1 FLOP	model owner (GPU cluster)	capex + energy; no market price for a single FLOP	Measured in lab conditions; price per FLOP is derived, not primary	GPT-4o ≈ 3.8 · 10²⁵ FLOP; ~30 models > 10²⁵
LLM token	BPE-subword in/out	1 million tokens	API user	provider’s price list (fixed; margin by model)	1M “filler” and 1M “answer” cost the same	Claude Opus 4.7: $5/$25 per 1M (in/out); GPT-5.5: $5/$30
Salesforce AWU	Agent action (prompt, chain, tool call)	1 AWU	customer (via Flex Credits or per-seat)	$0.10 per action / $2 per conversation / $125 per user per month	1 AWU doesn’t distinguish a result from an attempt	Q4 FY2026: 771M AWUs

The pattern that falls out of this table: the higher the relative cost per unit, the further the metric is from physics. One Bitcoin hash costs about $5.3 · 10⁻¹⁹ (network reward divided by expected hashes per block). One Ethereum gas at 0.3 Gwei and ETH ≈ $3,500 — about $10⁻⁶. One Claude Opus output token — $2.5 · 10⁻⁵. One AWU on Flex Credits — $0.10 (or 20 Flex Credits with a token cap of ~10,000 tokens per action). Roughly seventeen orders of magnitude between the extremes. At that scale it’s clear that we’re comparing different kinds of things: some metrics measure a stream of operations, others a stream of value.

Calculator: what does one unit of work cost

Plug in your own prices — see how the dollar scale shifts across five systems of measuring work. The six sliders are the six variables that drive cost-per-unit:

BTC price ($) — market price of bitcoin. The higher BTC goes, the more one hash is worth (numerator in block_reward · P_BTC / (D · 2³²)). Default $100k matches May 2026; the $20k–$250k range covers bear and bull scenarios.
ETH price ($) — market price of ether. Linearly drives the dollar cost of one gas: gas_price · 10⁻⁹ · P_ETH.
Gas price (Gwei) — fee per gas unit, in nano-gwei (1 Gwei = 10⁻⁹ ETH). Default 0.30 is the median in May 2026 after Dencun + EIP-4844. The top of the range (50) corresponds to peaks like NFT mints and cascade liquidations.
LLM output price ($/1M tokens) — provider’s price for a million output tokens. Default $25 = Claude Opus 4.7. The low end ($0.5) covers Haiku 4.5 / Gemini Flash; the high end ($60) — a hypothetical enterprise tier.
1 AWU price ($) — cost of one agentic task in Salesforce. $0.10 is the Flex Credits price from May 2025; $2 is the older Conversations tier (also one task, but more expensive).
Bitcoin difficulty (T) — Bitcoin network difficulty in trillions. Sits in the denominator of the hash formula: the higher D, the cheaper a single hash (but the more hashes per block). Default 136.61T matches May 2026.

The table below the sliders recomputes live: cost per unit in dollars and the ratio to 1 AWU.

Cost-per-unit-of-work calculator

BTC price ($) 100,000

ETH price ($) 3,500

Gas price, Gwei 0.30

LLM output price ($/1M tokens) 25.0

1 AWU price ($) 0.10

Bitcoin difficulty (T) 136.61

System	Unit of work	Cost, $	Ratio to AWU

At the defaults, the gap between “1 hash” and “1 AWU” is about 10¹⁸ (a hash is 5.3 · 10⁻¹⁸ cheaper). Between “1 LLM token” and “1 AWU” — only 4 · 10³ (a token is 4 thousand times cheaper). And “1 simple ETH transfer” at the current gas price comes out to about $0.02 — comparable to 1 AWU (transfer is ~4–5x cheaper). In gas-price peaks (50 Gwei and above) the relationship inverts and a transfer becomes tens of times more expensive than an AWU. The point is not that AWU is “expensive” and a hash is “cheap.” The point is that different metrics pack different amounts of useful work into one unit — and comparing them head-to-head by cost-per-unit only makes sense when you understand what’s actually inside that unit.

Where the metric breaks the economy

Each of the five systems has a structural flaw, and it’s not accidental — it’s baked into the metric itself.

Bitcoin: energy grows independent of value. A 970 EH/s network draws tens of gigawatts; that figure is set only by the BTC price and difficulty, not by how many transactions are actually useful. When the block subsidy drops below what fees can cover, the metric of “work” will still measure the same hash count — but there will be no one left to pay for it. This is the well-known problem of decaying emission and why DePIN projects copy it: halving is elegant, but it assumes someone pays for the work, not just for the issuance.

Ethereum gas: the price of work is set by something other than work. The same SSTORE always costs 20,000 gas, but the gas price itself is set by the competition for the block. In peak periods (NFT mints, liquidations) gas reached hundreds of Gwei — not because the ops got harder, but because block space became scarce. That dynamic produces MEV (Maximal Extractable Value), gas wars and front-running — situations where the price of “work” on Ethereum is determined by competition with other participants, not by how useful that transaction is to the user.

LLM tokens: 1M of “filler” = 1M of “the answer”. Anyone who has paid OpenAI or Anthropic prices knows the structure: a token is a unit that has no idea about the value of the result. “hi how are you” and “do due diligence on this contract” are priced on the same scale. Of the five metrics, tokens are the most direct imitation of work — and exactly what Salesforce is trying to escape with AWU.

AWU: the critique Salesforce admits itself. Two days after the announcement, CIO.com published a piece titled “AWU by Salesforce: A shiny new metric that tells CIOs little of value.” The main argument: AWU counts activity, not outcomes. 1 AWU = 1 reasoning chain, regardless of whether that chain closed a deal or turned the request into a dead end. Stokes himself acknowledged on the call: “you can ask it a question and it can write you a poem, but that’s not really all that valuable in the enterprise world” — meaning the metric does not distinguish a valuable query from a non-valuable one. Salesforce is betting that AWU will correlate with value by the nature of its platform — its agents only operate in enterprise context. But that’s an assumption, not a property of the metric.

Goodhart's law in a new form

“When a measure becomes a target, it ceases to be a good measure” — Goodhart’s law. Bitcoin is protected by the fact that a hash cannot be forged without cost; LLM tokens are not; AWU is even less so. If billing is tied to the number of reasoning chains, the vendor has a structural incentive to produce more chains for the same task. The history of ChatGPT turning every question into a verbose answer is an illustration of this risk well before the metric was even monetized.

Reducing to one scale: energy, money, value

If all five systems measure “work,” can we reduce them to a single scale? There are three candidates.

Energy scale (J). Works for Bitcoin and training FLOP — these are physical processes with a direct cost in kilowatt-hours. Doesn’t work for AWU: an “update a CRM record” task has no meaningful energy cost — it can be done in one API call or five, and the physical equivalent will differ by orders. For Ethereum gas the scale is intermediate: gas is an abstraction over energy, but the concrete mapping depends on what the validator runs on.

Dollar scale ($). Universal because everything is ultimately paid in dollars. But it hides structure: $0.10 per AWU is Salesforce’s margin on top of its own spend on LLM tokens and infrastructure; $5 · 10⁻²¹ per hash is the physical cost of electricity plus ASIC capex, with no intermediary. Comparing dollar costs, we’re comparing not work but its price — and price includes positioning, margin and bargaining power.

Value scale ($value created). Theoretically the most correct. In practice, unmeasurable for most systems. No one can say that one reasoning chain brought a company $X: value is created by the product, not by an individual agent step. Salesforce gets around this by rolling out AWU as a proxy for value, but that’s exactly the objection — a proxy easily diverges from the goal.

Where the scales converge: AI crypto. The most interesting region of the metric spectrum is the intersection where task-level work meets physical work. Bittensor with its Yuma consensus measures “work” through the quality of an ML model’s output, judged by validators. Strictly speaking, Yuma is not PoW in the Nakamoto sense — it’s a stake-weighted weight consensus: rewards are distributed by validator consensus, not by nonce discovery. “Useful proof-of-work” here is a metaphor for the fact that validators are paying for inference quality, not for hash work. Gonka with its Transformer-based Proof-of-Work is more literal — miners brute-force nonces and feed them through a transformer with randomly initialized layers; valid results count as work. That is PoW in the classical sense, where the “hash function” is a transformer forward pass. In both cases the unit of work is no longer a hash or a FLOP — it’s a successfully completed AI task.

What the industry is moving toward is compressing the three scales into one point: a metric that’s simultaneously verifiable like a hash, measurable like a FLOP, and meaningful like an AWU. No one has built it yet — but the experiments at the AI / crypto intersection are closer to that point than any of the pure metrics on their own.

Summary

Every industry constructs its own “unit of work,” and every metric is a compromise. Bitcoin measures precisely but with no regard to value; AWU measures value but with no verifiability.
The five metrics form a spectrum from physics to task. Hash → gas → FLOP → token → AWU. The higher the layer, the higher the vanity-metric risk and the more room for margin.
AWU is not a new idea — it’s a new translation of an old one. Salesforce is trying to dodge the token trap, where the company pays for “talk.” It hasn’t closed that trap, just moved it: now you can pay for “activity without outcome.”
The AI / crypto intersection is the most interesting place. Bittensor’s Yuma consensus (formally not PoW, but the same idea — “pay for useful ML work”) and Gonka’s transformer-PoW are attempts to build proof-of-useful-work — a unit of work that is useful by its very nature. That’s the prototype of the metric that none of the five existing systems has.
The key question for a tokenomics project is not “which metric is best” but “which metric does your system bill on.” If it’s the same one the system itself optimizes on, Goodhart’s law will eventually kick in.

Let's design the metrics and tokenomics for your project

We'll help pick a unit of work that won't degenerate into a vanity metric, and build a durable economy around it. Domains — crypto infrastructure, AI projects, DePIN, enterprise platforms.

Discuss the project

#metrics #awu #proof of work #tokenomics #ai agents #gas #llm