P2SH32: a long-term solution for 80-bit P2SH collision attacks

Hehe, lets make sure transactions stay cheap :wink:

It is also relevant to know that most of the usecases of smart contracts, especially with multiple people, the transaction that locks up the funds is made by those same people and as such its fine that they pay for any fees.

Relaxing Output Standardness

I’d love to get to a place where outputs can have non-standard contracts!

That would be particularly useful for covenant applications – right now covenants have to waste 20 bytes (or for P2SH32, 32 bytes) and some contract bytes to place their next iteration in a P2SH output. And from the chain’s perspective, this sometimes wastes an extra copy of each covenant contract (the P2SH hash preimage has to be constructed in the first transaction for validation and also pushed again in the second transaction – though OP_ACTIVEBYTECODE usually eliminates this waste). It would be more efficient if covenants could simply validate the next output itself rather than doing the P2SH dance.

In practice though, there are some issues we need to solve first:

P2SH is currently protective in that it limits abuse of the contract system: with P2SH, transactions that are expensive to validate must include their own contract code within the spending transaction, so the expensive-to-validate transaction is much larger than it would be if that contract code was already present in the UTXO set.

This behavior is currently protecting miners from unwittingly creating expensive-to-validate blocks that include some malicious (non-standard) transactions and thus risking their blocks becoming stale during propagation (and/or set of miners possibly diverging due to such blocks). E.g. this worst-case contract could be stuffed into non-P2SH transactions with far fewer bytes per input. (Of course, part of this protection is that miners validate transactions themselves first before mining them, so honest miners are still reasonably protected if they only include transactions that have been broadcasted over the public network.)

In the same way, the quadratic sighash issue with OP_CODESEPARATOR is currently only avoided by the P2SH + isStandard strategy.

Both of those issues are handled for P2SH contracts by a hashing limit as proposed above (a simplification of the VM Limits CHIP), but to get rid of isStandard, do we need to instead define a limit in relation to spending transaction size (i.e. a per-byte hashing budget)? I think that requires some review.

Beyond those validation cost questions, there’s also the question of data storage – right now OP_RETURN outputs are generally limited to storing ~220 bytes of arbitrary data, but that limit is only meaningfully enforced by the concept of standardness. If output standardness is relaxed to allow custom contracts, there’s no reasonable way to prevent arbitrary data up to the same length as such contracts. So if, e.g., standardness was relaxed to allow output contracts up to 1650 bytes, OP_RETURN outputs of at least this size should also be allowed (and probably larger, since its in the network’s interest for data-carrier users to commit data in provably unspendable outputs, allowing us to prune that output from the UTXO set). I’ve written about how I think the current OP_RETURN limit is basically theater; I think in the long term we’ll probably want to simplify standardness to treat output OP_RETURN data and contracts similarly (e.g. the same per-output byte limit), but we probably need a really rigorous review of the topic to get widespread consensus.

One final development direction this brings up: increased occurrences of larger outputs (containing, e.g. 1650 byte contracts) would mean that many node implementations may want to revisit how they represent the UTXO set. Right now most implementations keep the full contents of each UTXO accessible to fast lookups, but it would be possible to instead only keep the hash of UTXOs – a UTXO Hash Set (UHS). This is part of the architecture explored by OpenCBDC. If a significant number of outputs are eventually larger than 32 bytes – for raw output covenants and/or something like CashTokens – some P2P protocol extension to enable pruned node implementations to use a UHS could be valuable.

So: I think relaxing standardness is a promising development direction. There’s a lot of work to be done; I’m not sure we can get all the way there before upgrade 2023, but I support the effort!

Nice! Do you have any links you can share related to your template work? I’ve also been working on a template concept in Libauth. (Here’s a test of multi-party contract creation and transaction signing.) It supports both P2SH and non-P2SH usage, and I’ve always hoped that BCH mainnet would eventually support non-P2SH usage.

If we relax output standardness, do we still need P2SH32?

Even if we were able relax output standardness by 2023, I still think it would be important to deploy some sort of P2SH32. Relaxing standardness would solve this 80-bit collision issue for “public” types of contracts (e.g. covenants), but it’s not a complete alternative to P2SH32; without P2SH32, other use cases would be forced to accept reduced privacy/security (as others have mentioned).

On privacy/security, I’d just add that in my view, a critical security feature of P2SH wallets is that unspent funds are hard for an attacker to analyze. This contributes little to long-term privacy (when they’re spent, they can be analyzed – you still want to use CashFusion regularly), but in practice, I think it offers meaningful operational privacy and therefore security (from meatspace attackers). An organization having funds in a set of P2SH-based multisig wallets (held by different operational teams) has quite a different privacy/security posture than the same set of wallets using raw outputs. UTXOs of well-designed P2SH wallets are not trivial to cluster, but for the same raw-outputs wallet, an attacker could determine with greater certainty how much is currently available to steal and which teams they need to kidnap/blackmail.

From a high level, a rough mapping of use cases for which I think each option is superior:

  • P2SH20 – Non-public, multi-party contracts with sufficiently interactive setups (can implement a pre-commitment scheme + HD derivation), saves 12 bytes vs. P2SH32, offers better privacy/security than raw outputs, and practically equivalent to Taproot for contracts with one/few spending paths. (Exception: highly-interactive use cases save even more with Taproot by looking like a single-signature spend.)
  • P2SH32 – Same as P2SH20, but better for use cases where interactive setup is more costly (maybe for some particular use case, 12 bytes per output is a reasonable price for avoiding pre-commitment schemes), or the wallet adds new addresses over time (not a covenant, but participants create new addresses via some highly-asynchronous coordination method).
  • Raw outputs – superior for covenants – the contract validates the spending transaction (publicly, on-chain), so the most efficient place to do this is raw outputs. (Hypothetically, partially-public covenants could be designed for Taproot-like outputs so that only one covenant path is revealed during spend – requires both Taproot and new opcodes though.)
  • Taproot (see @markblundeberg’s BCH Taproot discussion) – most designs could be superior to P2SH20/32 for many use cases (especially with sufficiently-interactive parties) by saving at least the cost of the hash in collaborative spends (that look like single-signature spends).

Aside: Taproot

I think some Taproot-like construction would be a viable alternative to P2SH32 for many use cases (in the same way it would be a superior alternative to the existing P2SH20), but given that the network already supports P2SH20 – and we aren’t going to “deprecate” P2SH20 – it’s reasonable that the network should support a “hardened” version of the P2SH primitive, too. The cost of adding a 32-byte variant in terms of protocol complexity/technical debt is trivial (P2SH20 already exists), but 1) the existence of a hardened P2SH32 option offers a much simpler upgrade path for vulnerable use cases, and 2) the hardened P2SH32 option offers some user-actionable resilience if a particular use case is discovered to be vulnerable to a new attack (when using the Taproot-like alternative).

And of course, as with OP_EVAL-like alternatives, I don’t expect we’ll have sufficient information to settle on a specific Taproot design before 2023. BTC has already deployed multiple versions of its virtual machine, so the relative increase in protocol complexity from its recent Taproot deployment is not as significant as it would be on BCH (considering BTC’s support for e.g. SegWit, the legacy sighash algorithm, etc.). BCH’s existing features and ecosystem also make deploying Taproot less valuable (BCH has more advanced contracts, covenants, and low fees + CashFusion), so we should take our time selecting a particular Taproot design, if any.

TL;DR

I think we should work on both: relaxing standardness would be great for covenants, and P2SH32 is important for privacy/security of non-covenant use cases.

Some future Taproot design could also replace P2SH (and P2PKH) for most use cases, but that doesn’t mean we should leave P2SH “partially broken” – we’re not going to deprecate P2SH20, so we should also support P2SH32 for completeness/out of an abundance of caution.

2 Likes

BTW, isn’t “Taproot” just a fancy name for “Threshold signature”? https://eprint.iacr.org/2020/1390.pdf

A threshold signature scheme (TSS) enables a group of parties to collectively compute a signature without learning information about the private key. In a (t, n)-threshold signature scheme, n parties hold distinct key shares and any subset of t + 1 ≤ n distinct parties can issue a valid signature, whereas any subset of t or fewer parties can’t. TSS’ setup phase relies on distributed key generation (DKG) protocol, whereby the parties generate shares without exposing the key. In practice, TSS is often augmented with a reshare protocol (a.k.a. share rotation), to periodically update the shares without changing the corresponding key.

Some quick responses. As usual, bitjsons posts are long and loaded with gems, so I’ll try to come back for other points that need more thinking.

This kind of issue has been the main reason for the sun-setting of sigops. We now use SigCheck which I think catches the issue you are talking about. It runs at validation time and thus it becomes agnostic to where the complexity comes from. It will catch it regardless of being part of an input or an output.
This then protects the network from blocks and/or transactions that are overly heavy on the validation phase.

I agree. From my point of view this is a economic matter that devs should not be in charge of. I don’t think it makes sense to define limits at any standardness or even consensus level to govern which data should be allowed to be mined. There are strong economic incentives available, I feel that the best way to solve this allocation of production-space is by allowing miners to set fees and priorities on them.
This is a discussion topic that indeed would be good to have, my current feeling is that we can redefine transaction-priority (which transactions combine into what size block) and move from our solution today where it is just about fees but create a score based on transaction-local properties like it spending more utxos than it creates. And, in the context of this point, the amount of block-space it uses which are not for economic activity.
Most of this is still quite irrelevant with current blocksizes, which is likely why its not been discussed much :slight_smile:

This is a very good point. The original UTXO did not actually copy the output script into the UTXO, that was added by Core much later. Anyone good at databases would loathe to see people dumping bytearrays in the same row as your primary-key. :roll_eyes:
What happened is that in order to make pruning work the output scripts had to go somewhere, since the original data would be deleted. Someone figured that the UTXO was to be that place.

I would expect an effort for the reference client that moves the output scripts out of the UTXO to not be a huge amount of work. Not simple, but not overly complex either. The goal is simply copying the output scripts somewhere safe and occasionally pruning those separately, so they don’t have any negative impact on the UTXO database.

Actually, each full node has a very different way of doing this today. Flowee the Hub has its own UTXO database (a raw C++ codebase), Bitcoin Verde uses a SQL one. BCHD is something different again (I don’t know really). BCHN still uses the way you are talking about.
But it should be pointed out that while this is more relevant the longer the data stored in there becomes, the tests on huge blocks shows that this is not a bottleneck any time soon. So those improvements are rather academic in nature.

1 Like

P2SH is something enforced by the Script VM’s hypervisor that is the native consensus code, it “hacks” the Script VM state from outside the sandbox.
The paradigm (send to address, spender reveals the contract later) has stood the test of time well, but in hind-sight it is obvious that the implementation could have been better.
It is what it is for historical reasons which deployed upgrades as soft-forks, and, as you clearly demonstrated, there’s still baggage of those upgrades to sort out.

We can stop pretending it belongs inside script - at all, and extract the contract hash and hashed contract code into their own respective transaction fields using the non-breaking PreFiX method.
Something like the below…

Output Format

  • transaction outputs
    • output0
      • satoshi amount, 8-byte uint
      • locking script length, compact variable length integer
      • locking script
        • PFX_LOCK_HASH, 1-byte constant 0xED
          • lock hash, 20 or 32 raw bytes
    • outputN

Consensus would never hand this to Script VM. From the point of view of Script VM it would be a NULL locking script.

From point of view of old software, this will still look like a script, one starting with a disabled opcode, one of the 2 possible “scripts”:

  • 0xED1122334455667788990011223344556677889900
  • 0xED1122334455667788990011223344556611223344556677889900112233445566

Consensus code would fail any TX where the script starts with 0xED and has (length != 20 && length != 32)

Input Format

  • transaction inputs
    • input 0
      • previous output transaction hash, 32 raw bytes
      • previous output index, 4-byte uint
      • unlocking script length, compact variable length integer
      • unlocking script
        • PFX_LOCK, 1-byte constant 0xED
        • real unlocking script, variable number of raw bytes
      • sequence number, 4-byte uint
    • input N

Introspection

  • New one, OP_OUTPUTLOCKHASH = 0xED - pops an index, returns the hash or empty stack item if feature is not used on the output. This completes definition of 0xED across all 3 contexts (input, VM, output).
  • OP_UTXOBYTECODE would return an empty stack item.
  • OP_ACTIVEBYTECODE would work the same.
  • OP_INPUTBYTECODE would work the same from Script VM PoV, it would return the whole thing i.e. concatenation of real unlocking script and redeem script

Speaking in relational database terms, we index UTXOs by their TXID/index and accept a level of denormalization with the locking script.

You’d want something like this instead:

(txid, index, satoshi_amount, locking script hash) M → 1 (locking script hash, locking script)

right?

When broadcasted, transactions could even omit the actual redeem script from the input if it matches one seen before, because nodes will already have it and could retrieve them unless pruned.

Fundamentally it doesn’t even matter how nodes learn of some script, whether it’s been broadcasted as locking script (relaxed standardness), or as redeem script (P2SH unlocking script), nodes will only need it at time of execution - when unlocking the input and updating the UTXO state. “bare” output gives it before it will be needed, and P2SH gives it the script’s primary key first, and data will come later.

I think we need to recognize that P2SH and OP_EVAL are fundamentally different even if they’d seemingly do the same thing, because they’re in different execution context. P2SH context is the hypervisor (consensus) layer while OP_EVAL is called within a particular VM, where the VM gets to control its execution state - it knows that it’s about to run some module of its own code authenticated with the OP_EVAL hash, and so the module must be a valid bytecode for that VM, and if unknown hashes were let in, then the contract could even pre-authenticate some known template at runtime, letting potential spenders keep their variable data private if the execution path will not be used.

My point is - it’s not one or the other - we’d want both P2SH32 (hopefully 2023) and OP_EVAL (later) :slight_smile:

Some more theoretical ramblings…
P2SH is fundamentally VM agnostic but its true nature may not be obvious because it’s been rolled out as if part of a VM. In theory it could support multiple VMs and languages while also hiding which VM it’ll use until time of execution comes, hiding everything about how some UTXO can be spent. @tom 's points about database structure made me realize that. Every UTXO is associated with some spending constraint. It’s a relationship (UTXOs) M → 1 (Constraints). If we use constraint hash as Constraints primary key it makes it easier to reason about - you just need to ignore those few bytes of P2SH wrapping that make it look like a Script when it’s really not.

Then, thinking in blockckain-as-database mental model we can better observe the difference between “bare” and P2SH, and how it relates to database operations:

  • “bare” executes 2 inserts: one in Constraints table (hash as key, code as value), other in UTXO table (outpoint ref. as key, hash as foreign key, sat amount as data that is related 1-to-1)
  • p2sh seemingly does the same where the Constraints table would have a null in place of Script (hash as key, NULL as value) but it does not ACTUALLY do this because the hash is available in the UTXO map so we save an insert into Constraints operation here

when execution time comes, what happens:

  • “bare” 1 delete from UTXO, (optional delete from Coinstraints)
  • p2sh: 1 insert into Constraints, 1 delete from UTXO, (optional delete from Coinstraints)

Deletes are authenticated by the spender, who provides the data for the Constraint to unlock, and in order to be able to provide data he ought to be aware of the Constraint before sending the TX, and so we relieve nodes of keeping that stuff in the blockchain state because they don’t need it until they get the unlocking values from the spender, and because Constraints always had some unique data then we didn’t bother keeping them around so we skipped managing the Constraints table entirely and saved an insert and later housekeeping deletes.

That held well till we got Introspection. It changes the game, because it makes possible to code contract where data is detached in another output, so we can have contract templates with a fixed hash, and if some contract template is often used then it would make sense to cache its code by nodes. Even so, relaxing the networking standarness rules would matter only for the first reveal of the contract (and saving some input bytes when forward-validating the next contract step). Alternative would be for contract authors to just do a single spend and reveal the contract to everyone and nodes could update the Contracts table just the same.

Next spend would only require the Contract key (hash) to be broadcasted and network messages could be optimized for this, so transactions could omit the Script code entirely if it matches a previously seen one.

Anyway, lots of interesting stuff for research, it’d be easy to get lost in it :sweat_smile: However, clock is ticking, if we want P2SH32 then let’s go for the least-friction upgrade for 2023: just extend the legacy wrapper format
0xA914111122223333444455556666777788889999000087 with 0xA820111122223333444455556666777788889999000011112222333344445555666687.

It would be real nice to snip off the 87, and also save a byte or two on the input’s push opcode with scripts longer than 75b.

However, clock is ticking, if we want P2SH32 then let’s go for the least-friction upgrade for 2023: just extend the legacy wrapper format

When given the choice between doing the “easy” ( all things considered ) thing and the elegant thing, always pick the easy path. :slight_smile:

1 Like

Yup, we can leave this debt repayment to some future transaction format upgrade, which P2SH32 using the familiar template will not preclude.

Other than cryptographic security hardening (20 to 32, bringing all our cryptographic primitives to the same level of security), we have here identified 3 buckets of technical debt that we have a chance of repaying:

  1. Unnecessary overheads in P2SH transaction encoding;
  2. Tangling a “hypervisor” (consensus) instruction with Script VM bytecode. (stack item size limit)
  3. The OP_IF and OP_NOTIF malleability vector. This one is not really inherited from P2SH feature but from transaction format, i.e. absence of a consensus version field, which CHIP 2021-01 Restrict Transaction Version aims to solve.

We can easily repay 2. and partially 3., and note that we can repay 3. as a side-effect of the hard-forking P2SH32 upgrade because it will be a pattern distinct from the legacy 20-byte one so we can use it to modulate opcode behavior for the upgraded feature without breaking the old feature. Kind of similar to how we can modulate how locktime is interpreted (BIP-0068).

The CHIP should also be informative, in hope that non-node software developers will learn of possibility of upgrading the format later, and not assume a particular wrapping of the constraint hash.

1 Like

UPDATE

Prompted by this discussion, I have started writing an “everything P2SH” document, rather broad in scope. It’s still a WIP and Draft can be found here, open to feedback.
Having jumped into hash functions rabbit hole, there will be a part on future quantum resistance (just realized, I think I made a wrongful claim about H160 quantum security, will fix soon), and on the SHA-256 vs HASH-256 question. (edit: done).

Then, there should be a small CHIP which proposes only the “hardened” P2SH pattern upgrade, without the extras:

  • OP_HASH256 OP_DATA_32 redeem_script_32 OP_EQUAL
    raw: AA20 1122334455667788990011223344556677889900112233445566778899001122 87.

Calin is already working on a proof-of-concept: Draft: p2sh_32 proof of concept implementation (!1556) · Merge requests · Bitcoin Cash Node / Bitcoin Cash Node · GitLab

At first I was convinced by the argument for OP_SHA256, but having done some research, the possibility of length-extension still spooks me.

This is true for addresses which have already revealed their contract, but it still applies to confidential contracts that could be updating themselves with a new secret on each spend. Length extension attack allows someone to copy (NOT change the extisting utxo) any contract and append arbitrary data to the end of it. I don’t know of a way to exploit this, if there even exists a way. The vulnerability is NOT currently present in P2SH-20, so why should we break what’s not already broken, even if we think it’s not exploitable.

The SHA-256 function is completely broken to length-extension because the final hash is the function’s complete internal state after processing the message, so anyone can just pretend there’s more of the message to hash and continue hashing to generate the hash of a longer message.
The composite we use for P2PKH and P2SH is NOT, because while both RIPEMD-160 and SHA-256 are broken there, the composite hides the internal state of the inner SHA-256 so the message can’t be extended. Same works for double SHA-256.

Core went with SHA256 for their P2WSH. However, their contract language and possible structures will never be as versatile as ours. Using the double makes the hash function close to a “random oracle”, and some future cryptographic proofs may have to rely on it being one.
The P2SH32 will be the foundation of all the cool future stuff, and we use double SHA-256 everywhere, so let’s not accidentally introduce a weakest link again when the whole point of the upgrade is to strengthen the weakest link.

I scanned the blockchain for both aa20 value_32 87 and a820 value_32 87 output templates, and found only these UTXOs currently exist:

OP_SHA256

Block Height TXID Output Index Satoshi Amount Locking Script
293549 c4b46c5d88327d7af6254820562327c5f11b6ee5449da04b7cfd3710b48b6f55 0 20000 a8205efe500c58a4847dab87162f88a79f08249b988265d5061696b5d0c94fd8080d87
293783 702c36851ed202495c2bec1dd0cefb448b50fafd3a5cdd5058c18ca53fc2c3d1 0 20000 a8203f6d4081222a35483cdf4cefd128167f133c33e1e0f0b1d638be131a14dc2c5e87
293923 fb01987b540ec286973aac248fab643de82813af452d958056fee8de9f4535ab 0 20000 a8206380315536fa75ccf0d8180755c9f8106466ee3561405081cab736f49e25baab87

OP_HASH256

Block Height TXID Output Index Satoshi Amount Locking Script
211914 af32bb06f12f2ae5fdb7face7cd272be67c923e86b7a66a76ded02d954c2f94d 0 100000000 aa20000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f87
291634 faf8989ed87c5a667a1ead813aea718727e01767c124193297eaf409ff4645e5 1 500000 aa203036419579d0abe63b46836a74380f1e2fd4a1e89f3aad616b171ddcdcbede4387
527397 ab7d514960db3d919478c6e3c58f3cbca22c0e257f4838be5e0ba75afa34dab7 0 58049 aa20826058c4582956f5a2d2bcb56e2f97cfdfb21a191c52dc83192089bc8e3aa36187
529578 7a27e164da5e2e961029fcc0d7566badd4adcb8b4d4b184a4d893899284378fb 0 181977 aa20099c96aead898703ec8685e22ae58c3d56f2571db08b57d88fe34f43f7f7775e87

Note: The 1 BCH (and 1 BTC) output is unspendable because the hash is that of the genesis block, but in wrong byte order.
An output with the correct byte order was created and spent using the genesis block header as signature, shortly after creation of the unspendable one.

OP_HASH160

Also, there are some pre-BIP16 UTXOs that have likely become unspendable due to BIP16

Block Height TXID Output Index Satoshi Amount Locking Script Status
170052 9C08A4D78931342B37FD5F72900FB9983087E6F46C4A097D8A1F52C74E28EAF6 1 400000 a91419a7d869032368fd1f1e26e5e73a4ad0e474960e87 Spent
170054 B0539A45DE13B3E0403909B8BD1A555B8CBE45FD4E3F3FDA76F3A5F52835C29D 1 400000 a914e8c300c87986efa84c37c0519929019ef86eb5b487 Unspent
170434 D0636198EA55FADEE5B4CCC07C85012DB7D07C2D25790B3AEC770C86617801C0 1 1000000 a91484b8ee2ee2970e4a5c3a18e73a9e251ad5c1df1c87 Unspent
170442 9AB59E2D4BE16C470160EB9B9A9D9799EAF29AF0461AEA131E748659D806FA1A 0 1000000 a91484b8ee2ee2970e4a5c3a18e73a9e251ad5c1df1c87 Unspent
170442 658FC92061F1C4125D5CD1034EB8A1F09BFEBD32A988D855EB7EE63689759A21 0 1000000 a91484b8ee2ee2970e4a5c3a18e73a9e251ad5c1df1c87 Unspent
170556 510B5A44935109E249A704C2900AA9D8303166062E81D2AC852C965B6266DCEE 0 1000000 a91484b8ee2ee2970e4a5c3a18e73a9e251ad5c1df1c87 Unspent

I’m done reviewing that one, and would just like to carry one aspect of my review to this thread here.

Concerning the discussion of hash functions, the current proposal limits itself to single/double SHA256.

BLAKE2s would be one a candidate that offers protection against length extension attacks while giving the same bitwise security as e.g. HASH256.

I think it just deserves a mention in passing that in theory, there are additional hash functions like that one which could meet our needs. In practice, I think the costs outweigh and I’m also not inclined to believe we’d be going wrong with HASH256 for our 32-byte hash.

1 Like

Please don’t use pure HASH256. Currently, I can see three cases, where this is used, and if we add that to P2SH, then it will be the fourth case, and could be easily mistaken with other cases. If we have a script:
“OP_HASH256 someHash OP_EQUAL”, then we could pass here:

  1. some 64-byte merkle branch
  2. some 80-byte block header
  3. some N-byte transaction
  4. you propose to add P2SH to this list

If we want to use SHA-256, then fine, let’s use that. But please at least add some 512-bit prefix to change IV, to avoid hashing the same data in a completely different context. In general, messing up the context with double SHA-256 is unlikely to be exploited, because it requires mining a lot of hardcoded bits, but technically it could be possible in the future, so that’s why I think pure SHA-256 with the same initialization vector is a bad idea.

1 Like

We’re faced with a choice between SHA-256 and HASH-256, other functions or same functions with different IVs would have a big cost in updating all other software that has to parse addresses and pay into outputs. If we were prepared to pay that cost then might as well upgrade to a new function.

You can’t just pass anything as redeem script it MUST be a valid Script VM bytecode to unlock. The hashlock output pattern allows any preimage though, and you can pass a raw block into a hashlock contract even now, what’s the problem with that? See this transaction posting the genesis block header as input unlocking script: Transaction 09f691b2263260e71f363d1db51ff3100d285956a40cc0e4f8c8c2c4a80559b1

Security of P2SH relies on collision resistance, not on it being a random oracle (where real functions become less so with each public query). What good is a database of known preimages when they aren’t valid redeem scripts?

same functions with different IVs would have a big cost in updating all other software that has to parse addresses and pay into outputs

Everything will be new, so it is possible to not change the format of the previous three cases, but choose the format of the new address to not interfere with other things. So, instead of doing SHA-256(redeemScript), it is just a matter of doing SHA-256(redeemPrefix||redeemScript). And the size of that redeemPrefix should be 512 bits, in this way it can be easily implemented as a SHA-256, executed with some prefixed data.

You can’t just pass anything as redeem script it MUST be a valid Script VM bytecode to unlock.

I can imagine a valid redeem script, that would be also a valid transaction at the same time. Just one big PUSH, then some output script, then some OP_PUSH and OP_DROP in the timelock.

The hashlock output pattern allows any preimage though, and you can pass a raw block into a hashlock contract even now, what’s the problem with that?

Well, I used this hashlock script just as an example. But even in the context of this hashlock: it would be useful to require some specific type of hashed data. For now, checking it by OP_SIZE is the only option. Maybe some OP_CAT or OP_SPLIT could also do the trick, but it makes the whole script longer (so also more expensive).

That would fix the possibility of overlap, sure. You have said what could happen, said how to prevent it from happening, but you haven’t demonstrated why it happening would be a problem, so why would we need a fix? If I put a raw TX or a block header in an input’s signature (a TX like that already exists) to unlock a hashlock contract or somehow first be authenticated using the hashlock and then successfully executed as a valid redeem script, why would that be a problem? You can put any existing TX’s or block’s hash into an output, so what?

but you haven’t demonstrated why it happening would be a problem, so why would we need a fix?

That’s a good question. Maybe I am too obsessed about security and “making sure that some hash is only valid in a given context”. I have to think more about real attacks that could be possible.

1 Like

With the Cashtokens upgrade this might become a lot less common because state can be offloaded to NFTs and it is not required to construct a different locking bytecode when the state changes. Still - the initial parameters of the contract can be vulnerable to the described collision attack :+1:

2 Likes

Posted a follow up to my initial interest in MINIMALIF:

3 Likes

And for future reference, this topic was solved by the recently locked-in P2SH32 CHIP:

3 Likes