Comment by SebastianFish
5 years ago
I think the problem this product is trying to solve, making it easier to implement rigorous testing, is an important one. When I talk to junior developers I compare enterprise software development to painting. When you paint a room, most of your time is actually spent taping out all of the edges and boundaries. The verb painting is a misnomer. Similarly, it isn't a fast or easy for devs to see all of the boundary conditions in inherited code bases and write meaningful tests for them.
You mention in the description about testing code for all possible inputs. Are you all actually doing this behind the scenes with an SMT solver? Are you using some other proof assistant (COQ, HOL, etc) to be able to use proof-by-induction techniques and thus manage the path explosion problem?
I tried out an example comparing two multiplication algorithms (see below) and the tool said it couldn't find any issues in about a minute. Curious whether it will scale to larger problem sizes.
// implementation 1 int mult(int x, int y) { return x*y; } // implementation 2 int slow_mult(int x, int y) { int result=0; if(x<0) { for(int i=x; i<0; i++) { result-=y; } } else { for(int i=0; i<x; i++) { result+=y; } } return result; }
Yeah I think that's a great analogy, we noticed the same thing when we used to work together at our last jobs which is what motivated to start this venture.
On the implementation side, yes we're using Z3, you can have a poke around at the core symbolic executor at https://github.com/SymbolicaDev/Symbolica
In terms of managing the path explosion problem, we have a few techniques that we've prototyped locally. For instance we believe we can do quite a bit by canonicalising and caching past results. We've also made our executor amenable to parallelisation from the start. We hope to start using these techniques in our hosted version soon.
With your example, maybe I'm missing something here, but they look like they are functionally equivalent to me. When I just tried running some values by hand they even appear to overflow in the same way. What was the assertion that you were making and what was the result you were expecting?
I think the surprise comes from how long it takes to get the answer, not that it's incorrect.
By comparison, gcc is able to reduce those two implementations to the same thing and prove them equal in a fraction of the time (https://godbolt.org/z/6oMEd8ebY). If it takes Symbolica a minute for the same tiny example, how does it work on real codebases?
Ah right, makes sense, thanks. Most of the time on the playground is spent booting a build server (we're using GitHub actions behind the scenes just for this part of the playground) and then building the code. I just ran this example myself and the build part took around 50 seconds before it actually sent the bitcode off to be analysed.
We need to add some more detailed diagnostic outputs to the actual execution part so that we can see the timings more clearly, but as a rough approximation from looking at the logs the execution API took about 10 seconds to run and some of that time would be in waiting for the message to be dequeued and for the function to wake up and process the job. So it's taking maybe around 5 seconds right now to analyse this one.
I know that's still quite slow, but we've not yet spent any real time on optimisations to the execution part so we're quite confident we can make this bit faster. As for the constant build overhead, this is something that's only really applicable to our playground, as users would likely build their code locally or have already built it as part of a CI run. Given all the feedback on this slowness though we are planning to make some improvements to this build part of the playground too to help reduce this constant overhead, or at least make it more obvious that this is what is causing the initial part of the waiting time.
This is interesting, it seems like it may in fact be testing by looping, given it took a minute, and didn't conclude. For hardware design there are logic equivalence checking tools that work very differently - they essentially work by reducing a function (piece of hardware) to a minimized boolean equation. The method of reduction ensures the same logic, no matter how coded, results in the same boolean equation, and then comparing the equations is direct and does not require testing all input values. Hardware verification also has a higher bar than software, where code coverage analysis can require 100% on line, branch, value, condition, expression, functional, etc. Its a lot of work, but does give very high confidence there are zero bugs.
Hey, the reason for the long times on the playground is actually that we’re having to wait for build servers to boot up and then build the code before we can finally ship it off to our backend for analysis. We’ve had quite a bit of feedback on this slowness and we hope to eliminate some of this constant factor boot up and compile time soon, and for what remains make it clear that this is what’s happening in the status box.
So although it might seem like it, we’re not actually doing any looping, we’re doing constraint solving as you’ve mentioned.