All posts
cover image
Patching a World
9 months ago By Vera

Why We Entered Maintenance Mode

Late Friday night, during the first week of playtests, we began to get reports that Sky Strife matches could not be joined, started, or finished. We quickly jumped online and confirmed ourselves that this was the case. There were no obvious reasons why this was happening and it was also the middle of the night on the weekend. We decided to throw up a maintenance mode banner while we diagnosed the issue.

Note: the contracts were actually live this entire time, we simply cut off access to our client. Yay onchain games

Our first hint towards a solution was the fact that we were able to complete all of these actions using in-browser burner wallets as opposed to Metamask. After some debugging we realized 1) these transactions were taking up to 45 million gas and 2) Metamask has a built in 30 million gas limit on any transaction. The game was technically working, but something was seriously wrong. Previously we had witnessed these same transactions taking ~1 million gas each. We determined that we would need to patch the affected systems, ideally without wiping any existing World state.

A Note on Gas Use

In general, our attitude towards gas optimization on Sky Strife has been lax. We far prefer speed of development and easy to understand code rather than esoteric Solidity that is technically more performant. We write our systems first in the quickest way possible so we can iterate on gameplay and general UX, then loop back to especially egregious gas offenders after things are solid. This is all to say that if these gas numbers seem insane, they are, but not as much in our style of development.

Finding the Problem

After the game was in maintenance mode we could take a breather an audit the affected systems. One thing we quickly noticed was that all of the systems used the KeysInTable module for onchain querying. If you haven’t used this, it is a module that stores a Set of keys that are contained in a given table. It is used to answer questions like “does key X exist in Y table”. We liberally used this across our systems as the speed of development with access to these queries is incredible.

Upon noticing this pattern something dawned on us: At the time Sky Strife was using an outdated version of MUD. One that did not include a key performance improvement related to writing to arrays. In our version, only one event was emitted when a table was updated, StoreSetRecord. It included all of the data for that specific key. Even if it was a large array…

One example of a table that we used KeysInTable on was the Player table. Players are created in Sky Strife each time a wallet registers for a match. When someone tries to register we need to know if a Player exists in that match already. If it already exists, they are not allowed to join. Needless to say, we have many Player records, which was creating massive arrays inside of the KeysInTable module.

In the latest version of MUD this is fixed by adding two new events for these exact situations: StoreSpliceStaticData and StoreSpliceDynamic. Unfortunately for us, a MUD upgrade in the middle of the playtest was not going to be possible without a full redeploy of the World.

We quickly tested our hypothesis by removing the uses of KeysInTable and sending transactions on a local dev node. We created dozens of matches without any gas use increase. We had found our culprit.

Emergency Smart Contract Dev

Simply removing the uses of KeysInTable would not be a solution. They were, after all, there for a reason: storing information about matches for use in downstream systems. The simplest solution we thought of was to replace our uses of generic KeysInTable indices with our own custom indices that we would keep up to date ourselves.

For example, instead of relying on an onchain query to get the players current in a match, we created a MatchPlayers: bytes32[] table that had a reference to all of the players currently in a match. We then updated that match players array every time a player joined a match. We could then reference that information later on without doing an expensive query. This sounds obvious now, but it is much simpler to not worry about keeping these tables updated manually and simply relying on Table storage hooks like KeysInTable.

We made a number of these “manual index” tables to support all of systems for joining, starting, and finishing matches. We ran our test suite and everything passed. We were able to run local matches without problems. Now we needed to get it in front of players.

Patching a World

The next obstacle in our path was the fact that, at the time, no official MUD2 dev tools existed for hot-swapping existing systems. By working with alvarius, we were able to create some impromptu scripts to accomplish this. Here is an example:

// SPDX-License-Identifier: MIT
pragma solidity >=0.8.0;

import "forge-std/Script.sol";

import { IWorld } from "../src/codegen/world/IWorld.sol";
import { Systems } from 
  "@latticexyz/world/src/modules/core/tables/Systems.sol";
import { FunctionSelectors } from 
  "@latticexyz/world/src/modules/core/tables/FunctionSelectors.sol";

import { MatchSystem } from "../src/systems/MatchSystem.sol";

import "forge-std/console.sol";

contract PatchMatchSystem is Script {
  function run() external {
    uint256 deployerPrivateKey = vm.envUint("PRIVATE_KEY");

    vm.startBroadcast(deployerPrivateKey);
    // Sky Strife playtest World address
    IWorld world = 
      IWorld(
        address(0xB41e747bC9d07c85F020618A3A07d50F96703A78)
      );

    // First 16 bytes are the namespace, 
    // second 16 are the system name
    // Root namespace is blank
    // Fetch the existing system for a sanity check
    bytes32 matchSystemId = bytes32(abi.encodePacked(
      bytes16(""), bytes16("MatchSystem")));
    (address addr, bool publicAccess) = 
      Systems.get(world, matchSystemId);
    console.log("MatchSystem address: %s, public access: %s", 
      addr, publicAccess);

    // Deploy the new system and hot-swap it in the World
    world.registerSystem(matchSystemId, 
      new MatchSystem(), publicAccess);
    (addr, publicAccess) = Systems.get(world, matchSystemId);
    console.log(
      "new MatchSystem address: %s, public access: %s", 
      addr, publicAccess);

    // In this case we added a new function to the system
    // We need to manually register the function selector as
    // this is normally done during the default deploy script
    // adminDestroyMatch was needed to end the bugged 
    // matches that people attempted to join before, 
    // and return the spent 🔮 to the match creators
    FunctionSelectors.set(
      world,
      MatchSystem.adminDestroyMatch.selector,
      matchSystemId,
      MatchSystem.adminDestroyMatch.selector
    );

    vm.stopBroadcast();
  }
}

With this code we were able to 1) redeploy the Match system and 2) register a new function selector that we added as part of the changes. Soon enough this will become a CLI tool built into MUD that can generate and run scripts like this.

Conclusion

In the end, we had a little over 2 days of downtime during the playtest because of these gas issues. This hurt momentum for the community and match creation took awhile to reach the same levels as before the downtime. As we approach mainnet launch, this amount of downtime is absolutely unacceptable for a live game. We’re also spending a tremendous amount of time optimizing gas use for all of our game systems now that we know that gameplay won’t be changing significantly before launch. Luckily we hit these issues now before the stakes are much higher!