Comment by drob518
5 months ago
SAT, SMT, and constraint solvers are criminally underutilized in the software industry. We need more education about what they are, how they work, and what sorts of problems they can solve.
5 months ago
SAT, SMT, and constraint solvers are criminally underutilized in the software industry. We need more education about what they are, how they work, and what sorts of problems they can solve.
At least personally, I've been very underwhelmed by their performance when I've tried using them. Usually past a few dozen variables or so is when I start hitting unacceptable exponential runtimes, especially for problem instances that are unsatisfiable or barely-satisfiable. Maybe their optimizations are well-suited for knapsack problems and other classic OR stuff, but if your problem doesn't fit the mold, then it's very hit-or-miss.
I'm surprised to hear this. Modern SAT solvers can easily handle many problems with hundreds of thousands of variables and clauses. Of course, there are adversarial problems where CDCL solvers fail, but I would be fascinated if you can find industrial (e.g. human written for a specific purpose) formulas with "dozens of variables" that a solver can't solve fairly quickly.
One thing that I spent a particularly long time trying to get working was learning near-minimum-size exact languages from positive and negative samples. DFAMiner [0] has a relatively concise formulation for this in terms of variables and clauses, though I have no way to know if some other reformulation would be better suited for SAT solvers (it uses CaDiCaL by default).
It usually starts taking a few seconds around the ~150-variable mark, and hits the absolute limit of practicality by 350–800 variables; the number of clauses is only an order of magnitude higher. Perhaps something about the many dependencies in a DFA graph puts this problem near the worst case.
The annoying thing is, there do seem to be heuristics people have written for this stuff (e.g., in FlexFringe [1]), but they're all geared toward probabilistic automata for anomaly detection and similar fuzzy ML stuff, and I could never figure out how to get them to work for ordinary automata.
In any case, I eventually figured out that I could get a rough lower bound on the minimum solution size, by constructing a graph of indistinguishable strings, generating a bunch of random maximal independent sets, and taking the best of those. That gave me an easy way to filter out the totally hopeless instances, which turned out to be most of them.
[0] https://github.com/liyong31/DFAMiner
[1] https://github.com/tudelft-cda-lab/FlexFringe
I've worked on a model with thousands of variables and hundreds of thousands of parameters with a hundred constraints. There are pitfalls you need to avoid, like reification, but it's definitely doable.
Of course, NP hard problems become complex at an exponential rate but that doesn't change if you use another exact solving technique.
Using local-search are very useful for scaling but at the cost of proven optimality
I think this hits the nail on the head: performance is the obstacle, and you can't get good performance without some modeling expertise, which most people don't have.
Hence my call for more education.
I wish I knew better how to use them for these coding problems, because I agree with GP they're underutilized.
But I think if you have constraint problem, that has an efficient algorithm, but chokes a general constraint solver, that should be treated as a bug in the solver. It means that the solver uses bad heuristics, somewhere.
I'm pretty sure that due to Rice's theorem, etc., any finite set of heuristics will always miss some constraint problems that have an efficient solution. There's very rarely a silver bullet when it comes to generic algorithms.
5 replies →
Well, they aren’t magic. You have to use them correctly and apply them to problems that match how they work. Proving something is unsat is worst case NP. These solvers don’t change that.
Of course they aren't magic, but people keep talking about them as if they're perfectly robust and ready-to-use for any problem within their domain. In reality, unless you have lots of experience in how to "use them correctly" (which is not something I think can be taught by rote), you'd be better off restricting their use to precisely the OR/verification problems they're already popular for.
1 reply →
In what way? They're useful for toy problems like this but they're very slow on larger problems.
SAT solvers are used daily to generate solutions for problems that have literally millions of variables. So, what you said is just wrong on the face. Yes, some talented people can write custom code that solves specific problems faster than a general purpose solver, particularly for easy special cases of the general problem, but most of the time that results in the programmer recreating the guts of a solver customized to a specific problem. There’s sort of a corollary of Greenspun’s Tenth Rule that every sufficiently complicated program also contains an ad hoc, informally-specified, bug-ridden, slow-implementation of half of a SAT or SMT solver.
I mean right tool for the right job. Plenty of formulations and problems (our job has plenty of arbitrarily hard graph algorithms) that have 90% of the problem just being a very clever reduction with nice structure.
Then the final 10% is either NP hard, or we want to add some DSL flexibility which introduces halting problem issues. Once you lower it enough, then comes the SMT solvers.
Define large. We've written model which solves real business issues in 8K lines of MiniZinc and it wasn't slow.
The conventional wisdom is the larger you make an NP hard problem, the slower is going to get. Irregardless of algorithm.
What are some good books to get started on the subject?