Securing Smart Contract with Runtime Validation
Weihong Wang

Link: Securing Smart Contract with Runtime Validation, Li et al. 2020

It has been a popular trend regarding solving security problems in smart contracts. A smart contract is a program that encodes a set of transaction rules. Once it is deployed to a blockchain, its encoding rules are enforced by all blockchain network participants.

However, due to 1) the impossibility of changing it after deployment, 2)sometimes storing critical and valuable information, and 3) falsely treating errors as intended behavior of the smart contract, these shortcomings could lead to striking financial losses!

Like other programming languages, the applications will usually undergo multiple rounds of code testing to ensure their security before the final deployment. Many software defects that cause memory and threading errors can be detected dynamically and statically. Following are some ways of program verification.

  • Static analysis
  • Formally verification
  • Dynamic analysis (Runtime validation)

Among them, dynamic analysis faces excessive performance overhead, which is the reason why it is often too expensive for other domains. However, running a smart contract is costly too. It is questionable whether a smart contract with runtime verification still performs well.

Why is runtime validation in the blockchain practical?

In this paper, the authors clarify that they are using proof-of-work (PoW) as the consensus model of Ethereum. As known to all, Ethereum, with a PoW consensus model, has a very low TPS at around 30 in ERC20 implementation. The performance gets better, reaching around 3,000 after only running the transaction execution engine without consensus (tested by Parity).

Through this observation based on the PoW model, the authors conclude that the overhead of runtime verification will be negligible because the transaction execution engine is not the primary bottleneck.

Need to note: Ethereum’s transition to a proof-of-stake (PoS) consensus model finished on 15 September 2022. With a PoS consensus, Ethereum 2.0 enables up to 100,000 TPS compared to the 30 TPS experienced with Ethereum 1.0. But let us still have a look at how this idea goes.

What’s more, Ethereum clients store the blockchain state as a Merkle Patricia Tree on the disk. The modules for the storage layer are the most time-consuming. Therefore, a runtime validation tool should minimize the interaction with blockchain states accessing.

A motivation example

Let’s start with an example with some errors. Below is a simplified source code from the draft of ERC1202, a standard draft for a voting system.

image-20221111003828194

Figure 1. Simplified source code from a voting contract.

Potential errors

From the clip of the code, there are two problems in vote():

  1. There is no restriction on how many times the voter can use its weight. (line 12 fixes it)

    For example, issue A has three options [1, 2, 3]. The voter Bob has a vote with weight 10. In the original implementation, Bob can vote for 1 and then call the vote() function again to vote for 2. Therefore, Bob’s weight is double used.

  2. The type of total weighted count is uint256, as defined in line 5. There is no check of the integer overflow. (line 14 fixes it)

    For example, how about the weights owned by some people are very big? Cindy has a vote for A with weight $2^{256}$ -1. If option 1 in A already has some votes, whenever Cindy votes for 1, the value of weightedVoteCounts[A][1] will overflow.

Apply SOLYTHESIS

Then we will apply SOLYTHESIS to the contract and describe how it instruments the contract to nullify these errors. The main idea is to maintain the intermediate value and trigger updates and checks only when necessary.

The authors proceed in the following manner:

1) Specify Invariant

Find the invariants you want every intermediate step of the computation process in the smart contract to follow. In our example in Figure 1, it is to make sure that:

Equation 1: The total weighted count of an option on an issue == the sum of all weights of voters who voted for the option

Firstly, line 1 in Figure 2 defines an intermediate value s that corresponds to the sum of all weights of the voters that voted for issue and option pairs (right side of Equation 1 above). The free variable a iterates over issues; b iterates over voters; c iterates over options of an issue. The intermediate map s is referred to some state variables (e.g., weights, ballots) as well.

image-20221112165047046

Figure 2. Invariant for ERC1202.

Then, line 2 in Figure 2 uses a ForAllquantifier to check whether every changed total weighted count in our intermediate map s equals the state value weightedVoteCounts. It is used as a group of constraints on state variables and defined intermediate values in the following Step 4 Delta Check.

2) Instrument Delta Updates

s is changing during the process because the changes of the state variables sometimes will change the value of the map s. (For example, if voter Bob changes his choice from option 1 to 2 over issue A, we need to update the corresponding value of s).

  1. Store every intermediate value as a state variable.
  2. SOLYTHESIS performs static analysis to determine whether modifying the original value may cause the intermediate value to change.
  3. If so, update the intermediate value.

3) Compute Free Variable Bindings

In our invariants rules in Figure 2, we have several free variants like a, b, c and s. However, the original codes have only issues, voters, options, and weightedVoteCounts. We need to bind the free variants with the meaning in the implementation.

4) Instrument Delta Check

Maintain the changes of the free variables like issues and options in additional arrays x_arr and y_arr (array declarations are lines 3-4 in Figure 3), which may lead to constraint violations.

Use the ForAll quantifier to check the constraints defined in line 2 of Figure 2 at the end of the function. We will traverse the free variables arrays that indicate the changed issue and option pairs and compare the corresponding values in the intermediate map s and the original state variable weightedVoteCounts.

For example, there is an issue A with three options 1, 2, and 3. Bob voted for option 1 with weight 10 on issue A. And he changed his idea to vote for 2. Under this situation, two total weighted counts are changed.

  • Total weights of option 1 decrease (line 17 in Figure 3)
  • Total weights of option 2 increase (line 21 in Figure 3)
  • Total weights of option 3 remain unchanged.

SOLYTHESIS records the modified issue No. A and option No. 1 and 2 in x_arr and y_arr (lines 10-11, lines 14-15, lines 19-20 in Figure 3). These indexes will be used in the delta check. And it also updates the corresponding values in intermediate map s (line 17 and line 21 in Figure 3). Then, iteratively check (lines 25-26 in Figure 3) the constraints defined by line 2 in Figure 2 at the end.

Let’s put our example into the instrumented contract code generated by SOLYTHESIS.

image-20221111174751200

Figure 3. Instrumented voting contract

This is the whole idea of SOLYTHESIS!

Some optimizations in the implementation

The authors consider the efficient problems of the implementation caused by multiple function calls and expensive blockchain state loading costs. They combine SOLYTHESIS with the following features.

First, one transaction may include multiple function calls. Therefore, if we insert the invariant checks at the end of every function, we will trigger multiple times during a transaction with the possibility of false positives.

SOLYTHESIS builds the call graph of the smart contract. If the entrance status of a function never changes, SOLYTHESIS will prune away the instrumented check inside it. SOLYTHESIS only executes the instrumented constraint checks when the function is the execution entrance of a transaction.

Secondly, SOLYTHESIS uses global memory arrays to store free variable bindings instead of using intra-procedure volatile memory or the blockchain state to reduce state load times. It uses inline EVM assembly to initialize or load a global array. For example, in Figure 3, x_arr and y_arr should be declared global memory arrays.

Besides, for those state variable values accessed multiple times in one transaction and not related to the values of other variables, since blockchain state load/store operations are costly, SOLYTHESIS will cache these values by temporary variables. If multiple functions access the state variable value, SOLYTHESIS will create a temporary variable in the global memory to cache it; otherwise, it will be in the stack. At the end of the transaction, SOLYTHESIS will store the updated temporary values back in the blockchain state.

Here is our example of caching state variable values.

image-20221112182732470

Figure 4. Vote function with state variable caches

SOLYTHESIS under evaluation

In the evaluation section, the authors collect the 23 most widely used smart contracts in ERC20, ERC721, and ERC1202 standards. The specific invariants for ERC20 and ERC721 are shown in Figure 5 and Figure 6, respectively.

image-20221113163228182

Figure 5. Invariant for ERC20

image-20221113163212457

Figure 6. Invariant for ERC721

Taking the current capacity of CPU and disk storage devices and the cost of solving PoW problems into account, the overhead introduced by SOLYTHESIS is negligible. CPU usage of Parity (an Ethereum client) is lower than 4% for the SOLYTHESIS instrumented benchmarks. Besides, with the help of delta update and delta check, SOLYTHESIS can maintain a stable overhead despite a growing contract state. Therefore, adding runtime validation will improve the security significantly without downgrading the performance of Ethereum.

But in the cases where they remove the consensus layer, the average slowdown of TPS caused by the SOLYTHESIS instrumentation is 24%.

Conclusion, limitations, and future work

What does SOLYTHESIS give us?

It’s such an exciting idea if it is adopted in the blockchain system in the future. Generally speaking, given a smart contract and a set of user-defined invariants, SOLYTHESIS can produce a new enriched contract that will reject all transactions violating the invariants with these features.

  • Allow users to specify critical safety invariants, including quantifiers, which are especially useful under ledger scenarios because Ethereum maps the address to balance-like critical values.
  • Runtime validation at the Solidity level. By turning the offending transactions into no-ops, SOLYTHESIS can mitigate the detected errors.
  • Use delta update to determine whether the modification of the original variable causes the intermediate value to change.
  • Use delta check and quantifiers at the end of a transaction. By maintaining a set of changed state values and the corresponding intermediate values, utilize these indexes in the quantifiers to check potentially violated constraints at the end of a transaction.
  • Through call graph, global memory, and caching state values, SOLYTHESIS further minimizes the number of blockchain state accesses to improve performance.

Why is it not practical right now?

As the authors mention in their evaluation, SOLYTHESIS instrumentation will cause a 24% downgrading of the performance when the consensus layer is no longer the bottleneck. Besides, gas fee caused by additional instrumented code is not negligible. The tradeoff between security and performance/money cost needs further consideration.

Any possible follow-up improvement?

For their original implementation:

  1. Add array deduplication in delta check. SOLYTHESIS maintains multiple arrays of changed state values. Sometimes, the indexes in these arrays are paired to delegate a value in a multidimensional array. For example, array a is combined with issueId, and array b is combined with option. If a = [A, A, A], b = [1, 2, 1], the corresponding pairs are [A][1], [A][2], [A][1]. We can remove the duplicate elements to reduce computation.

New direction:

  1. A new language or modified Solidity. The instrumented contracts by SOLYTHESIS are unreadable… It may not be a good idea to insert too many lines of meaningless naming parameters into the code. Can we design a new language with embedded runtime validation, like require (specific invariants with intermediate values) controlled by (state variables)?
  2. Besides, according to the designed language above, it will be possible to introduce a new gas mechanism in EVM by distinguishing codes for runtime validation from others.
  3. Proposing some new ways of transferring resources with high security will be promising. For example, Move, a new programming language, is popular because of its novel resource-oriented programming design.
  4. Runtime validation support from other layers. Gather some events together and check them by the VM?