Plato Data Intelligence.
Vertical Search & Ai.

How Ethereum can scale with SNARKs 101

Date:


The beginner guide I would have liked a few months ago before digging into this stuff

Marco De Rossi

What you’ll need:

  • a computer science background
  • the basics of Ethereum
  • the basics of calculus (constraints optimisation)

What you’ll get:

  • the basics of zero-knowledge SNARKs
  • the basics of Merkle trees
  • how Ethereum could scale to thousands of transactions per second thanks to SNARKs

SNARKs allow a Prover to prove to a Verifier that she/he has a solution W to the problem F with shared/known inputs X, without revealing W.

Finding the solution to the problem could require a huge amount of computational power and memory.

So the Verifier can basically be 100% sure that the Prover has worked properly (and found a solution), with neither re-doing the job by herself/himself to check the solution nor knowing the solution at all. It’s magic!

The process has 3 steps:

  • SETUP — The problem F (that need to be expressed as quadratic arithmetic program, see below) is prepared for SNARKs. This process is very high memory and computing intensive depending on the complexity of the problem (→ The number of inputs and constraints → The dimension of the matrix of the constraints satisfaction problem). The player who does the Setup (could be the Verifier itself) must be trusted by all parties, since the output of the Setup is used in the next phases. The setup is usually done using , a C++ library which is the most popular implementation for zkSNARKs.
  • PROVING — The Prover, who has a solution W for the problem F with shared inputs X (maybe she/he spent huge amounts of CPU and memory to find it!), uses libsnark and the output of the Setup phase to create a proof 𝚷. This process is definitely high memory and computing intensive (depending on the complexity of the problem, as above). The size of the output (i.e. proof 𝚷) is instead succinct and constant independently from the complexity of the problem. The Prover needs to trust who has done the Setup phase, since she/he uses its output…
  • VERIFYING — A Verifier — giving as input the output of the Setup phase, shared inputs X and proof 𝚷 – checks the proof. If the verification is successful, the Prover managed to prove to a Verifier that she/he has found the solution W to the problem F… without revealing W! The nice part is that not only the Proof is succinct and has always the same length.., the verification process is fast and not memory/computing intensive at all. Unlike the two previous phases… the verification can be easily done with a smartphone in milliseconds!

A good recap ():

How can this happen? Well, it’s Merlin magic! If you want to get the maths behind this, .

How can I transform my software to a quadratic arithmetic program?

As mentioned above, Setup phase’s problem F needs to be a quadratic arithmetic program. Rules of the game are tough:

  • Your software’s inputs should be numbers. Convert your stuff (arrays, strings, etc) to numbers. That’s trivial.
  • A “quadratically constrained system of equations” means:

where x is the n-dimensional vector of your inputs, m is the number of constraints (i.e. the number of equations of your system), C is the n-by-n coefficients Matrix and q is a n-dimensional coefficients vector. If you don’t like matrix and vectors, here is the n = 3 and m = 2 case (3 inputs, 2 constraints):

  • The implementation is an arithmetic circuit, which means that the outcome is Problem solved (the system is solved, i.e. all polynomials are equal to 0) or Problem not solved (all the other cases). In other words: “these inputs are/aren’t one of the solutions to this Problem”.
  • C₁, C₂, …, C𝚖, q₁, q₂, …, q𝚖 coefficients are the constraints of the system. This is basically what defines your software. Change them… and you’ll get another software! Getting back to how SNARKs work: C₁, C₂, …, C𝚖, q₁, q₂, …, q𝚖 are the input of the Setup phase. The output of the Setup phase (that you need for Proving and Verifying) is therefore strictly related to those C₁, C₂, …, C𝚖, q₁, q₂, …, q𝚖 and works only for that Problem. If you change them you’re defining another software/problem and you need to re-run the Setup phase! x₁, x₂, …, x𝗇 are the variables (i.e. what you have to guess to get a system’s solution). So when we say “Dear Prover, could you please find a secret solution W for problem F with shared/public inputs X” we mean for example “Dear Prover, can you find the x₁, x₂, …, x𝗇 values which solve the system with, for example, x₇ = 2393, x₅₂₆ = 5647?” You can do what you want with all x𝗇, except for x₇ and x₅₂₆, which are constrained to the shared/public inputs.

It’s a tough life but you can survive… If you need loops you can unfold them repeating the same operation many times. Or if you need for example x₁⁴ x₂⁵, you define a new input x₃ = x₁⁴ x₂⁵ and use x₃ in your constraints. It’s all about adding variables and constraints… Even for pretty simple softwares it’s easy to reach hundreds of millions or billions of inputs and constraints!

Want to know more? Read . And also check out this basic from ethereum/research.

What’s a Merkle tree?

An hash function is a rule that maps an input of arbitrary size to an output of fixed size. We could invent a pretty useless hash function “Concatenate the first two with the last two letters” which transforms “Woody Allen” to “Woen” and “Paul McCartney” to “Paey”.

A Merkle tree is a data structure where every parent is the hash of its two sons. At the top you find the Root, which is the hash of the two sons of level 1. At the bottom, every leaf is the hash of an external input.

Using our fictionary “Woody Allen”→”Woen” hash function:

When a leaf changes, the modification is propagated up to the Root. If ANTHONY changes, also ANNY (leaf), CENY and CECO (Root) change. Whichever leaf changes, the Root changes too.

You don’t need the entire tree to recalculate the Root. In our example, if ANTHONY changes and you know both JACO and CECILY, you can easily recalculate the Root even if you completely ignore JAMES, MARCO, JAES and MACO. For huge trees this saves a lot of time!

So what?

Merkle trees are great for data integrity checks. Usually: you know which is the valid Root, and you check that the received data matches that Root. For example: a trusted party who can’t give you the entire data set of the first names of people on Earth (no time, no bandwidth or maybe she/he hasn’t the data at all) gives you only the Root (e.g. “CECO”). Afterwords: you receive millions of first names, with reference to the leaf number, by thousands of untrusted parties. Well, since you have the correct Root you can check who you can rely on, who is giving you fake data…

Merkle trees are part of your life too! When you’re downloading a 3GB Torrent file, your file is divided in millions of little chunks. The hash of every chunk is stored in a leaf. Since you know which is the valid Root of the tree, every time you receive a chunk of the file by somebody, you can check if it’s correct. If it’s not, you can ask the same chunk to somebody else.

You can do that even if you haven’t download yet the entire tree/all the leaves: if you know that the Root is CECO and you trust JACO… when you receive the chunk ANTHONY you can verify it even if you haven’t downloaded yet the chunks MARCO and JAMES.

Why Merkle trees are useful in distributed ledger technology is straightforward: you use consensus protocols (slow, expensive) only for reaching consensus on the Root. Then the untrusted nodes of the network can efficiently and directly share data… and can sleep safe and sound thanks to integrity checks with the Root.

When God asked Ethereum to choose 2 superpowers among Security, Scalability and Decentralization… Ethereum sacrificed Scalability. Actually there is no strong cap on “transactions per second”: the cap concerns the amount of gas of each block — which is, simplifying, the amount of operations that I can do in each block. This limit is 8 million gas. That could mean many “tiny” transactions (no data attached to the transactions, no operations to be executed on that data) or few large transactions. It’s up to Ethereum’s nodes, which submit transactions, and to Ethereum’s miners, who include in the block the transactions which pay more.

A block is mined ~15 seconds. That means ~32 million gas per minute, which is definitely not enough if we want Ethereum’s dapps to go mainstream.

By the way: stop with those tedious comparisons between Ethereum and Visa. A centralized system will always be faster than Ethereum… by design! They do different stuff and you need them in different situations. If you don’t need decentralization and a trust-less environment… of course you should choose Visa. In short: the fact that your blender spins faster than your washing machine doesn’t mean you’ll clean your trousers in a blender!

Let’s put the puzzle together! Imagine you could “compress” many tiny transactions in one large transaction thanks to SNARKs. If the gas spent by this large transaction is less than the sum of the gas spent by the tiny transactions, that means you’re saving gas.

And saving gas means:

  • Users spending less for transactions overall → This would be a push for the entire ecosystem
  • Being able to put more stuff in a block → Ethereum spinning faster than your blender!

How does it work?

There are users, a relayer (or more relayers) who aggregates transactions and a smart contract.

  1. Users willing to play this game send their Ether (or tokens) to a publicly audited smart contract. For every new player a new leaf in a Merkle tree is created. The leaf includes information about the Ether’s owner (her/his address, which is also the public key), amount of Ether and nonce (the transactions’ counter of that account, which is 0 when the leaf is added)
  2. When A wants to send Ether to B (they both need to have a leaf/account in the smart contract), A packs a transaction, which includes the address of the fromaccount, the to account, the nonce of the from account, the amount of Ether to be transferred and the signature of the transaction (signed with the private key of the “from” account, obviously). She/he then sends the packed transaction to the relayer.
  3. The relayer aggregates all the transactions received in a given time window (e.g. one hour), updates the Merkle tree with the new balances’ amounts and creates a SNARK proof which proves that all signatures and the new Merkle tree’s root are valid. The relayer finally sends the new state and the proof to the smart contract.
  4. The smart contract validates the Proof on-chain. If it’s valid it saves the Merkle tree root of the New state in the internal memory of the contract.

Basically, the Merkle tree root depicts the entire state of all the accounts. And you can’t change it (= steal money) unless you can prove the validity of the signatures whose transactions lead to the New state summarized by the new root you’re submitting.

In a nutshell: users have super fast and almost free transactions, like on Coinbase, without needing to trust the relayer, who can’t do anything, unlike on Coinbase, without your signature.

It’s a non custodial side chain whose state is summarized by a Merkle tree root.

Let’s connect what we learnt above about SNARKs with what we just discussed about scaling. There are different ways to do that. I’ll compare 2 recipes: Vitalik’s and barryWhiteHat’s .

The SETUP is done by…

The guy who starts the project, who also creates the smart contract. The more auditable it is, the better.You should trust her/him… it’s a trusted setup!

The smart contract saves…

2 Merkle roots (bytes32 values): the first tree contains accounts’ addresses (public signatures), the second accounts’ balances and nonces

PROVING is done by…

The relayer

The relayer sends to the smart contract…

  • the 2 Merkle roots of the New state (addresses tree and balances+nonces tree)
  • the list of transactions, without signatures. “Each transaction costs 68 gas per byte. Hence, for a regular transfer, we can expect the marginal cost to be 68 * 3 (from) + 68 * 3 (to) + 68 * 1 (fee) + 68 * 4 + 4 * 2 (amount) + 68 * 2 (nonce), or 892 gas”

PROVING process’s known inputs are…

  • the 2 Old state Merkle roots
  • the 2 New state Merkle roots
  • transactions list

PROVING process proves that…

Given

  • the 2 Old state Merkle roots (already stored in the contract)
  • the 2 New state Merkle roots (sent in the aggr. transaction)
  • the transactions list (sent in the aggr. transaction)

… the relayer has valid signatures to move from state with 2 Old roots to state with 2 New roots with those transactions.

VERIFYING is done by…

The smart contract (coded in solidity, vyper, as you like!)

VERIFYING process’s known inputs are…

The same PROVING’s process known inputs, clearly…!

Limits to scalability

Every aggregated transaction uses 650k gas for SNARK verification (fixed cost) plus ~900 gas of marginal cost per transaction (It costs to send data!). So using the entire block the relayer can aggregate at most:

which means ~544 tx per second

barryWhiteHat’s

The SETUP is done by…

The guy who starts the project.

The smart contract saves…

1 Merkle root with the current State. Every leaf is the hashed state of an account.

Want to an account?

state = AccountState(pubkey, balance, nonce)
state.index = self._tree.append(state.hash())

PROVING is done by…

The relayer

The relayer sends to the smart contract…

  • proof 𝚷
  • the New state Merkle root
  • proof 𝚷

PROVING process’s known inputs are…

  • the Old state Merkle root
  • the New state Merkle root

PROVING process proves that…

Given

  • the Old Merkle root (already stored in the contract)
  • the New Merkle root (senti in the aggr. transaction)

… the relayer has a list of transactions with valid signatures to move from state with Old root to state with New root

VERIFYING is done by…

The smart contract (coded in solidity, vyper, as you like!)

VERIFYING process’s known inputs are…

The same PROVING’s process known inputs, clearly…!

Limits to scalability

The relayer is not sending transactions’ data to the smart contract (which is costly), so the limit is actually the amount of gas to verify the SNARK proof.

Mentioning Howard Wu’s about running SNARK’s PROVING phase on distributed systems, barryWhiteHat optimistically states that is possible to confirm 16666 transactions in a huge SNARK (1 billion constraints!).

barryWhiteHat also it’s possible to verify proof 𝚷 on-chain with 500k gas, which means that you can put 16 SNARKs (8 million/500k) per block, which is ~1.07 SNARKs per seconds… which means ~17,832 tx per second (16,666 * 1.07).

To infinity and beyond

  • All that glitters is not gold / 1. The amount of computing power and memory that you need in the Proving phase can be literally shocking. Especially in barryWhiteHat’s version, where part of the complexity is moved off-chain. barry writes “On a laptop with 7 GB of ram and 20 GB of swap space it struggles to aggregate 20 transactions per second”. Well, if the goal is 17,832 tx per second… LOL. This introduces non trivial parallel computation challenges. But if the average $ cost per transaction is far cheaper than the ordinary no-SNARKs option… the game is worth the candle.
  • All that glitters is not gold / 2. There is a relevant data availability issue! Since only the tree’s root is saved in the contract, you must be sure that an entire version of the tree (or, it’s the same, the entire transactions history) is always available. If data is not available the relayer, even with valid signed transactions, can’t do anything because she/he can’t prove Old State → Transactions → New State.
  • In order the relayer to be trustless and Ethers in the contract to have the same value of Ethers outside (liquidity problem)… users should be able to withdraw money from the smart contract when they want, without relying on a (specific) relayer. How? This is not in the scope of this 101 post, but you can read about this in the enclosed links.
  • Want to understand more about how the current State (addresses, balances and nonces) can be handled with a Merkle tree? Adding a leaf, updating a leaf, etc? Check out (test file ) which uses this underlying . Thanks HarryR!
  • Want to setup your personal Ethereum-SNARKs environment? Let’s start off-chain with C++ (Setup, Proving, Verifying) . Then you can move to Ethereum (don’t forget, only the Verification is done on-chain!) with Zokrates (, the ).
  • How about using RSA accumulators instead of Merkle trees? Google “rsa accumulators ethereum” to start…

Enjoy!

Twitter

Source: https://tokeneconomy.co/how-ethereum-can-scale-with-snarks-101-5b06ff048bb7?source=rss—-fbbd350c08fc—4

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?