The Time-Travelling Blockchain Developer
The blockchain marches on, ever forward, relentless, in an inexorable journey to add new transactions and to create new blocks to append to the decentralized ledger. It is immutable, creating a perpetual, permanent and fixed history of transactions.
The cryptographic state-machine infrastructure that powers this decentralized ledger has led to a wave of innovation that is unparalleled in the history of computing; however, this has also challenged the limits of conventional software development and specifically debugging tools and techniques.
Bugs and Debugging on a Chain
State Machines can make perfectly accurate mistakes; the code is written by human beings after all and Bugs are a fact of human endeavor. However, bugs on a blockchain have enormous repercussions due to the immutable nature of the underlying ledger, as any of us who are familiar with the DAO are fully aware.
As an industry, it behooves us to address this head-on. We need to ensure that developers have the best possible tools that they require; and that our software development approaches have the professional maturity that our customers and users demand of us.
When we step back and analyze the context of software bugs on a chain, we quickly realize that there are in reality two related questions – first, identifying the bug, and second, understanding how exactly the state machine got into that situation.
Traditionally, logs tend to be effective at giving us the identity of the actual problem, and as an industry we have gotten good at capturing exceptions and errors in our logs. However, logs address at best one half of the problem. The logs simulate the effect of leaving bread crumbs for the developer, and the optimistic hope is that they are adequate at giving the developer a compass in order to predict and forecast the various scenarios that the code is being executed in the Mainnet.
Orienting based on the information about the identity of the actual problem, the developer then sets about in a tight inner loop of setting breakpoints, running to the cursor, stepping in and out of methods. Hours, and often days are spent in this cycle of software archeology; however as soon as a new relatively complex problem occurs in the Testnet or Mainnet we are forced back to square one.
Now, to make matters worse, there are usually a few of the insidious bugs where the observable effects are often separated in time and/or space from the location of the bug. At this point, the traditional approaches pretty much end up with the developers basically abdicating and providing the classic response: ‘but, it works on my machine …’.
A New Approach – Time-Travel Debugging
What if we could accurately and repeatably record and re-examine events, in order to examine step-by-step how the state of the chain changes and when and why certain actions are being executed on the chain?
Yes, what if those obscure bugs that we have all encountered late at night and fervently wished that we could precisely reproduce could be a matter of playing the tape backward and forwards?
Time-Travel Debugging greatly simplifies debugging by enabling developers to easily step forwards and backward through the code execution. Time Travel Debugging aids the developer by enabling them to “rewind” their debugging, rather than trying (often unsuccessfully) to reproduce the bug. Further, by going back in time and with a greater understanding of the context, the developer can “replay” the scenario as many times as they want to while they examine every step that leads up to the bug and figure out how to best to fix the issue with the code.
Time-Travel Debugging accelerates the developer’s edit, compile, build and debug loop by empowering the developer to record an execution of the running code on the blockchain, and then subsequently re-run it without regard to the past or the future; it does this efficiently by adding as little overhead as possible by capturing code execution in trace files.
What makes Time-Travel Debugging unique
Historically, telemetry and logs have been used to enable a lightweight, scenario-driven approach to detect the identity of the problem with the code. However, the ‘unknown-unknown’ scenarios are the Achilles Heel of this approach; issues arise in unexpected code paths, with no telemetry.
Dumps have been another low-tech approach towards debugging; there is no requirement for any upfront coding, and successive dumps or snapshots are used to create the illusion of a simple time-series view.
Live debugging is the traditional practice of forward-only debugging; a combination of breakpoints, watchpoints and other general runtime-debugger features serve as markers to help the developer find bugs. The developer runs the code until it hits a breakpoint and analyzes any issues that appear in that particular section of code. In some cases, this works perfectly well, however, usually, the developer does not know which relevant section of the code caused the error: a major problem when the code is running on a blockchain.
Traditional debuggers enable the developer to step through line-by-line and be on the lookout for a bug; however, this approach is much harder to apply in the case of hard to reproduce bugs, as the developer has little or no information available about the circumstances leading to the bug.
Compounding this, and often to great detriment, are the devious bugs mentioned earlier, where the observable effects are separated in time and/or space from the location of the bug. A classic example of this is where the state read methods (which may actually be bug-free) access corrupted state resulting from a function or functions (where the bug was present) invoked earlier in time.
Time-Travel Debuggers are the single most helpful solution for these types of failures as the programmer can walk through the program’s execution backward, as well as forwards, in order to home in on a point of interest, enabling them to find the root cause from two ends of the program instead of one.
Time-Travel Debugging enables a powerful set of techniques and tools that excel at helping developers fix complex bugs. Further, in those scenarios where a developer is unable to figure out why a certain bug is happening, they can share the information with a co-worker who can look at and examine with fidelity what he/she was looking at. This makes for significantly easier collaboration and team development.
Looking ahead
Welcome to the world of the time-traveling developer.
Our goal is to improve the lives of developers and once again Neo is raising the bar for the developer experience.
We are excited to deliver Time-Travel Debugging to our developers and to our communities worldwide. For the first time, blockchain developers can go forward and backward in time to understand, analyze and fix complex bugs, all while working at the source code level. We are pushing the envelope as we continue to deliver on our promise of the most developer-friendly blockchain platform.
The Neo team continues on our mission to bring millions of developers to Neo and the blockchain world, and to realize the vision of the Smart Economy.