P2SH32: a long-term solution for 80-bit P2SH collision attacks

Yup, we can leave this debt repayment to some future transaction format upgrade, which P2SH32 using the familiar template will not preclude.

Other than cryptographic security hardening (20 to 32, bringing all our cryptographic primitives to the same level of security), we have here identified 3 buckets of technical debt that we have a chance of repaying:

  1. Unnecessary overheads in P2SH transaction encoding;
  2. Tangling a “hypervisor” (consensus) instruction with Script VM bytecode. (stack item size limit)
  3. The OP_IF and OP_NOTIF malleability vector. This one is not really inherited from P2SH feature but from transaction format, i.e. absence of a consensus version field, which CHIP 2021-01 Restrict Transaction Version aims to solve.

We can easily repay 2. and partially 3., and note that we can repay 3. as a side-effect of the hard-forking P2SH32 upgrade because it will be a pattern distinct from the legacy 20-byte one so we can use it to modulate opcode behavior for the upgraded feature without breaking the old feature. Kind of similar to how we can modulate how locktime is interpreted (BIP-0068).

The CHIP should also be informative, in hope that non-node software developers will learn of possibility of upgrading the format later, and not assume a particular wrapping of the constraint hash.

1 Like

UPDATE

Prompted by this discussion, I have started writing an “everything P2SH” document, rather broad in scope. It’s still a WIP and Draft can be found here, open to feedback.
Having jumped into hash functions rabbit hole, there will be a part on future quantum resistance (just realized, I think I made a wrongful claim about H160 quantum security, will fix soon), and on the SHA-256 vs HASH-256 question. (edit: done).

Then, there should be a small CHIP which proposes only the “hardened” P2SH pattern upgrade, without the extras:

  • OP_HASH256 OP_DATA_32 redeem_script_32 OP_EQUAL
    raw: AA20 1122334455667788990011223344556677889900112233445566778899001122 87.

Calin is already working on a proof-of-concept: Draft: p2sh_32 proof of concept implementation (!1556) · Merge requests · Bitcoin Cash Node / Bitcoin Cash Node · GitLab

At first I was convinced by the argument for OP_SHA256, but having done some research, the possibility of length-extension still spooks me.

This is true for addresses which have already revealed their contract, but it still applies to confidential contracts that could be updating themselves with a new secret on each spend. Length extension attack allows someone to copy (NOT change the extisting utxo) any contract and append arbitrary data to the end of it. I don’t know of a way to exploit this, if there even exists a way. The vulnerability is NOT currently present in P2SH-20, so why should we break what’s not already broken, even if we think it’s not exploitable.

The SHA-256 function is completely broken to length-extension because the final hash is the function’s complete internal state after processing the message, so anyone can just pretend there’s more of the message to hash and continue hashing to generate the hash of a longer message.
The composite we use for P2PKH and P2SH is NOT, because while both RIPEMD-160 and SHA-256 are broken there, the composite hides the internal state of the inner SHA-256 so the message can’t be extended. Same works for double SHA-256.

Core went with SHA256 for their P2WSH. However, their contract language and possible structures will never be as versatile as ours. Using the double makes the hash function close to a “random oracle”, and some future cryptographic proofs may have to rely on it being one.
The P2SH32 will be the foundation of all the cool future stuff, and we use double SHA-256 everywhere, so let’s not accidentally introduce a weakest link again when the whole point of the upgrade is to strengthen the weakest link.

I scanned the blockchain for both aa20 value_32 87 and a820 value_32 87 output templates, and found only these UTXOs currently exist:

OP_SHA256

Block Height TXID Output Index Satoshi Amount Locking Script
293549 c4b46c5d88327d7af6254820562327c5f11b6ee5449da04b7cfd3710b48b6f55 0 20000 a8205efe500c58a4847dab87162f88a79f08249b988265d5061696b5d0c94fd8080d87
293783 702c36851ed202495c2bec1dd0cefb448b50fafd3a5cdd5058c18ca53fc2c3d1 0 20000 a8203f6d4081222a35483cdf4cefd128167f133c33e1e0f0b1d638be131a14dc2c5e87
293923 fb01987b540ec286973aac248fab643de82813af452d958056fee8de9f4535ab 0 20000 a8206380315536fa75ccf0d8180755c9f8106466ee3561405081cab736f49e25baab87

OP_HASH256

Block Height TXID Output Index Satoshi Amount Locking Script
211914 af32bb06f12f2ae5fdb7face7cd272be67c923e86b7a66a76ded02d954c2f94d 0 100000000 aa20000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f87
291634 faf8989ed87c5a667a1ead813aea718727e01767c124193297eaf409ff4645e5 1 500000 aa203036419579d0abe63b46836a74380f1e2fd4a1e89f3aad616b171ddcdcbede4387
527397 ab7d514960db3d919478c6e3c58f3cbca22c0e257f4838be5e0ba75afa34dab7 0 58049 aa20826058c4582956f5a2d2bcb56e2f97cfdfb21a191c52dc83192089bc8e3aa36187
529578 7a27e164da5e2e961029fcc0d7566badd4adcb8b4d4b184a4d893899284378fb 0 181977 aa20099c96aead898703ec8685e22ae58c3d56f2571db08b57d88fe34f43f7f7775e87

Note: The 1 BCH (and 1 BTC) output is unspendable because the hash is that of the genesis block, but in wrong byte order.
An output with the correct byte order was created and spent using the genesis block header as signature, shortly after creation of the unspendable one.

OP_HASH160

Also, there are some pre-BIP16 UTXOs that have likely become unspendable due to BIP16

Block Height TXID Output Index Satoshi Amount Locking Script Status
170052 9C08A4D78931342B37FD5F72900FB9983087E6F46C4A097D8A1F52C74E28EAF6 1 400000 a91419a7d869032368fd1f1e26e5e73a4ad0e474960e87 Spent
170054 B0539A45DE13B3E0403909B8BD1A555B8CBE45FD4E3F3FDA76F3A5F52835C29D 1 400000 a914e8c300c87986efa84c37c0519929019ef86eb5b487 Unspent
170434 D0636198EA55FADEE5B4CCC07C85012DB7D07C2D25790B3AEC770C86617801C0 1 1000000 a91484b8ee2ee2970e4a5c3a18e73a9e251ad5c1df1c87 Unspent
170442 9AB59E2D4BE16C470160EB9B9A9D9799EAF29AF0461AEA131E748659D806FA1A 0 1000000 a91484b8ee2ee2970e4a5c3a18e73a9e251ad5c1df1c87 Unspent
170442 658FC92061F1C4125D5CD1034EB8A1F09BFEBD32A988D855EB7EE63689759A21 0 1000000 a91484b8ee2ee2970e4a5c3a18e73a9e251ad5c1df1c87 Unspent
170556 510B5A44935109E249A704C2900AA9D8303166062E81D2AC852C965B6266DCEE 0 1000000 a91484b8ee2ee2970e4a5c3a18e73a9e251ad5c1df1c87 Unspent

I’m done reviewing that one, and would just like to carry one aspect of my review to this thread here.

Concerning the discussion of hash functions, the current proposal limits itself to single/double SHA256.

BLAKE2s would be one a candidate that offers protection against length extension attacks while giving the same bitwise security as e.g. HASH256.

I think it just deserves a mention in passing that in theory, there are additional hash functions like that one which could meet our needs. In practice, I think the costs outweigh and I’m also not inclined to believe we’d be going wrong with HASH256 for our 32-byte hash.

1 Like

Please don’t use pure HASH256. Currently, I can see three cases, where this is used, and if we add that to P2SH, then it will be the fourth case, and could be easily mistaken with other cases. If we have a script:
“OP_HASH256 someHash OP_EQUAL”, then we could pass here:

  1. some 64-byte merkle branch
  2. some 80-byte block header
  3. some N-byte transaction
  4. you propose to add P2SH to this list

If we want to use SHA-256, then fine, let’s use that. But please at least add some 512-bit prefix to change IV, to avoid hashing the same data in a completely different context. In general, messing up the context with double SHA-256 is unlikely to be exploited, because it requires mining a lot of hardcoded bits, but technically it could be possible in the future, so that’s why I think pure SHA-256 with the same initialization vector is a bad idea.

1 Like

We’re faced with a choice between SHA-256 and HASH-256, other functions or same functions with different IVs would have a big cost in updating all other software that has to parse addresses and pay into outputs. If we were prepared to pay that cost then might as well upgrade to a new function.

You can’t just pass anything as redeem script it MUST be a valid Script VM bytecode to unlock. The hashlock output pattern allows any preimage though, and you can pass a raw block into a hashlock contract even now, what’s the problem with that? See this transaction posting the genesis block header as input unlocking script: Transaction 09f691b2263260e71f363d1db51ff3100d285956a40cc0e4f8c8c2c4a80559b1

Security of P2SH relies on collision resistance, not on it being a random oracle (where real functions become less so with each public query). What good is a database of known preimages when they aren’t valid redeem scripts?

same functions with different IVs would have a big cost in updating all other software that has to parse addresses and pay into outputs

Everything will be new, so it is possible to not change the format of the previous three cases, but choose the format of the new address to not interfere with other things. So, instead of doing SHA-256(redeemScript), it is just a matter of doing SHA-256(redeemPrefix||redeemScript). And the size of that redeemPrefix should be 512 bits, in this way it can be easily implemented as a SHA-256, executed with some prefixed data.

You can’t just pass anything as redeem script it MUST be a valid Script VM bytecode to unlock.

I can imagine a valid redeem script, that would be also a valid transaction at the same time. Just one big PUSH, then some output script, then some OP_PUSH and OP_DROP in the timelock.

The hashlock output pattern allows any preimage though, and you can pass a raw block into a hashlock contract even now, what’s the problem with that?

Well, I used this hashlock script just as an example. But even in the context of this hashlock: it would be useful to require some specific type of hashed data. For now, checking it by OP_SIZE is the only option. Maybe some OP_CAT or OP_SPLIT could also do the trick, but it makes the whole script longer (so also more expensive).

That would fix the possibility of overlap, sure. You have said what could happen, said how to prevent it from happening, but you haven’t demonstrated why it happening would be a problem, so why would we need a fix? If I put a raw TX or a block header in an input’s signature (a TX like that already exists) to unlock a hashlock contract or somehow first be authenticated using the hashlock and then successfully executed as a valid redeem script, why would that be a problem? You can put any existing TX’s or block’s hash into an output, so what?

but you haven’t demonstrated why it happening would be a problem, so why would we need a fix?

That’s a good question. Maybe I am too obsessed about security and “making sure that some hash is only valid in a given context”. I have to think more about real attacks that could be possible.

1 Like