CHIP 2023-01 Backup: wallet offline backup formats

After we got a fair reception on the initial idea, here is an actual first CHIP on the idea of standardizing wallet backups.

The full text is at;

4 Likes

I’ve been working on Flowee Pay and realized that it would be useful to have a standard data format for basic wallet backups. When I talk about backups, I’m mostly just referring to the seed phrase and similar. Not details which are available on chain.

In the case of HD wallets, you need 3 pieces of information to do a proper restore.

  • the actual seed phrase.
  • The derivation path
  • the blockheight of the first transaction depositing funds in your wallet.

The last one is probably not relevant for those that use an address indexer, as that always searches the entire history. But for those using bloom-filters based P2P restore, this info is pretty useful to limit the amount of time it takes to restore a wallet.

Either way, as far as I know there is no standard format we can put in a QR or similar in order to make backup fast and painless. Some wallets show the actual seed phrase as plain text in a QR, which is limited (read: lock-in) as that assumes derivation path. (see earlier topic).

When we move on to backup techniques that are less in use, for instance a NFC tag (example image), it would be much more powerful if different wallets can retrieve this info in a standardized way.

Again, this is separate from other info. Things like an address book, comments for transactions, or even the transactions themselves. Probably useful to backup too, but the biggest power I think lies in a standard that allows people to share the basics of a wallet in order to restore the funds themselves.

I wonder if there is an interest in this format, and most importantly if people like @cculianu are willing to implement that in wallets.

Naturally, if I missed an existing format that already does this, please do let me know. I love it when others do the work for me :wink:

5 Likes

Great idea, Tom.

Because AFAIK right now there is no such standard on any popular coin that is widely used, there is a huge chance that whatever mechanism you think up will end up becoming the de-facto standard.

Just try to make it as easy to implement, straightforward as you can, while remaining logical and you will be golden.

I think this sounds like a pretty good idea. It might be nice to allow multiple derivation paths under the one seed too.

{
  "12 seed words 1": [
    {
      "path": "derivation_path_1",
      "blockheight": xxxx 
    },
    {
      "path": "derivation_path_2",
      "blockheight": xxxx
    }
  ],
  "12 seed words 2": {
    // ...
  }
}
3 Likes

Thanks for kickstarting that Jim.

Not sure what the usecase is for multiple derivation paths. But its good to know that its pretty trivial to add in future, should the need arise.

It does make one wonder about size.
The 12-word (of 24-word) phrases are essentially just a human-readable version of a 1024-bits 128-bits number. If you assume the appropriate word-list on the other end, you can just ship a hexadecimal string and let the wallet convert it to a sentence again. The logic goes that 00000000000000000000000000000000 is much shorter than abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon abandon about. (see test-vectors).

So, a simple example could be;

{ "00000000000000000000000000000000": 
    { "path": "m/44'/0'/0'",
       "startheight": 700000
    } 
}

which (removing whitspace) is 81 bytes. The “ultralight” type of NFC tags have 64 bytes space only, so even this compressed version would not fit. Using 1-char variables gets us down to 67 chars.

My conclusion is that while a JSON is well supported, and readable, I think a serialized form which is much more space efficient is needed for things like QR codes and NFC tags.

2 Likes

Umm, a 1024-bit number would be 256 0s when encoded as a hexadecimal string. You could just write raw bytes to the NFC in which case you’d be writing an array of 128 bytes.
Got confused there, our secret is 128 bits, not bytes, so we need only 16 bytes, which can fit.

We could come up with a simple compact format, consistent with out TX format

  • version - 1 byte
  • secret - 16 bytes
  • note_length - 1 byte
  • note_string - variable
  • paths_count - 1-9 bytes compact size
    • path0
      • path_length - 1-9 bytes compact size
      • path_string - variable
      • path_start_height - 1-9 bytes compactsize
    • path1
    • pathN

So, for your example above it would be: 1 + 16 + 1 + 0 + 1 + 1 + 10 + 5 = 35 bytes for a wallet backup with 1 path stored.

The 35 byte blob could then be encoded using CashAddress specification, but with some dedicated prefix so it’s clear it’s not an address but a wallet backup blob.

Note could be anything, purpose of the wallet, or it could be a password hint - remind the user which password he set if bip39 encryption was used on top of the mnemonic.

2 Likes

Instead of using “block height of first received tx” I suggest the block height at which the wallet was created so it’s possible to back up the wallet right after creating it.

3 Likes

Fixed that, its naturally a 128bits number for a 12 word seed. Thanks for catching that!

1 Like

Yeah, I caught your error and then confused myself haha - the secret is 128 bits, not bytes, so you need only 16 raw bytes, or 32 hex chars to encode the seed (edited my above post).

Also, what derivation are we talking about? BIP39 is not reversible. Maybe you mean the checksum word which uses dictionary indexes.

Electrum-style can be though of like 12 x base_2048 numbers, and every such number needs 11 bits, so we get (2^11)^12 = 2^132 entropy

Electrum 2.0 derives keys and addresses from a hash of the UTF8 normalized seed phrase with no dependency on a fixed wordlist. This means that the wordlist can differ between wallets while the seed remains portable, and that future wallet implementations will not need today’s wordlists in order to be able to decode the seeds created today. This reduces the cost of forward compatibility.

With Electrum specification, you could replace words with their index in the wordlist and be able to generate the master key using normalized dictionary.

Hmm, but this is still irreversible - you can’t go back from master key to the mnemonic.

Source: Electrum Seed Version System — Electrum 3.3 documentation

Right, but if standard BIP39 mnemonic generation technique is used, then you could store the dictionary indexes as our “secret”, instead of actual words.

I like your thinking about the binary form following the same formatting of transactions. No need to invent anything new there. That sounds very appealing to me.

My thought with regards to seed phrase being compressable is indeed that each word is a dictionary word in a 2048-item list. So instead of having the full words written out wasting space, you just record the dictionary used (english or japanse etc) and store the 17-byte secret instead.
12 words worth 11 bits each. (slightly longer than the above mentioned 128 bits due to checksum)

Most wallets will not store the master key anyway, they store the mnemonic and derive the master key when needed. Which is then used to derive private keys from. So its a flow that is consistent.

Great for human readable stuff indeed.

1 Like

This popped today on Bitcoin SE: multi signature - I have lost 1 of my 2 of 3 multisig seeds. How do I restore to make a transaction? - Bitcoin Stack Exchange

Multisig wallets should back up not just the seed but all other participant’s xpubs

1 Like

Let me contribute to this discussion, as this is a topic about which I have a couple of ideas and perhaps some of the following points can be taken to achieve the development of a standard that implies a real improvement in the user experience and the quality and resilience of backups.

Currently, at the user level, there are several things that are interpreted as worthy of backup. Primarily, access to UTXOs owned by the user, but also transaction history and even other ancillary items (which may depend on the wallet support), such as contact list, transaction descriptions, historical price denominated in some fiat currency, initial BIP47 exchange, etc.

The current form, which was described above, based on the use of recovery phrases, focuses on trying to derive a series of addresses from which to try to obtain the addresses and use technologies such as SPV and query nodes that index that information.

However, if (when) network usage grows a lot, it is possible that the existence of index nodes may be compromised or other backup may be necessary, as perhaps scanning the blockchain (which may be pruned by nodes to save storage space) may become a costly process as more users and transactions exist on the network.

The minimum viable product of a wallet is simply a collection of the UTXO available to spend, updating each time it is spent, and having some way to communicate with the outside world when a payment comes in. With this in mind, it occurs to me that it would be nice to add to the proposal an encrypted file standard that, using the same retrieval phrase, can be stored by the user or the wallet provider (similar solutions to those currently provided by wallets like Zapit or Bitcoin.com), and contain all the information pertinent to that wallet.

This keeps privacy the same but facilitates recovery and avoids resynchronizing the wallet from scratch. Alternatively, this file could be hosted on services such as IPFS, so that various providers could offer to maintain the backup files for a fee or even offer some sort of free service (although explaining how this business model would work would be outside the scope of the discussion at hand).

My idea is to ensure that any changes made in this regard are durable over time and can be leveraged to the fullest, bearing in mind that many of the ways in which we are used to working, having a network with low economic activity, may not be feasible tomorrow when Bitcoin Cash is a massively used currency.

This is a very useful feature, but I believe it belongs in a separate CHIP.

IMO if I am correct, what @tom intends to create here is a simple standarization of basic backup format - that can be immediately used by all wallets without much added work - without many bells and whistles to create order out of chaos.

1 Like