Comment by llm_trw
15 days ago
You can generate the data synthetically.
We never had the budget to do it but I do have some notes somewhere on a 2d context free grammar to generate arbitrarily nested rows/columns and a css styling that got applied to the xhtml output of the grammar. It dynamically generated as much high quality synthetic data as you wanted - but the IBM and similar data sets were plenty big enough for what we could do even on specialist models.
It depends on what you're doing really. I thought that we'd done pretty well, then someone on HN reached out with a table that spanned 50 pages and I just gave up.
Feel free to drop an email if you'd like a quick chat. I find the state of table models particularly abysmal for how important they are.
No comments yet
Contribute on Hacker News ↗