← Back to context

Comment by mdaniel

3 years ago

> Hasn't Cyc impressively demonstrated just how incredibly difficult and costly it is to formalize even the most basic matters of daily life?

I would offer that the "cost/benefit" analysis for such a formalism exists on at least two axes: the concept domain which one is attempting to formalize, and the benefit (and/or size of consumers) of any such working system

I can wholly understand that trying to translate the entirety of English into a formal logic system sounds overwhelming. But to side with a sibling commenter, why not at least start with the tax code which is a personal pain point, has (presumably) a correct outcome for some cases, and is mostly algorithms-in-English

And then, for the consumer side: ok, if I snapped my fingers and Cyc existed and worked I struggle to think how exactly my life would change. If the formally-specified tax code existed and worked I wouldn't have to rage-upvote almost every comment on the annual tax hatred thread

I would even offer that an incomplete version could still be useful if one left "fuzzy" variables in the corpus, and said "welp, we can't define what a $Person is because of the hundreds of years of precedent, so you'll need an actual Judge for that". I don't meant to say that 50% of the corpus can be undefined variables, that's just silly, but I'd hope the tax code isn't built upon 50% undefined behavior, even if accountants want you to think it is

> why not at least start with the tax code

There have been many such attempts (e.g. NKRL by Zarri et al., also funded by EU). There are even societies that have been dealing with such issues for many decades (e.g. http://www.iaail.org). The formalization of law and language is only one of the issues. Like many previous attempts, this one suffers from the fuzziness of human language (even in the case of tax code). Fuzziness is not a drawback; it is what makes it possible to communicate efficiently in the first place. In order for us to communicate effectively, we need an enormous amount of tacit knowledge about our environment that our culture and life experience brings. If one tries to formalize the language, as in the present approach, one must also take this knowledge into account, down to the last detail (an "upper ontology" is by far not sufficient for this, and Cyc after decades is still not finished). And the tacit knowledge and also the moral valuation of the same change over time. And there are things like https://en.wikipedia.org/wiki/Sorites_paradox which stand in the way of a complete formalization. Lenat's 1990 book addressed many of the issues, but also his more recent talks are very informative where he demonstrates how they had to extend the Cyc representation language to cope with the problem, and why e.g. RDF triples are not enough.