Comment by patcon

6 months ago

Neat! Might this even be useful to impute missing data for a sparse network of votes, for a system like this (pol.is) whose goal is to do dimensional reduction and visualise the opinion space of divisive social topics: https://gwern.net/doc/sociology/2021-small.pdf

200 voters on 50 statements would fall within the 10,000 sample threshold. This is well within the bounds of some existing conversations with open data, so it could be tested... Potential values on each statement are agree/disagree/pass (+1/-1/0)

https://github.com/compdemocracy/openData/blob/master/brexit...

2 comments

patcon

noahho 6 months ago

Looks like a great use case! We have a method specifically for imputation in the tabpfn-extensions package (https://github.com/PriorLabs/tabpfn-extensions/blob/dbc3f5da...). It needs some cleaning up before I want to highlight in the notebooks and docs.

burglarhobbit 6 months ago

> 200 voters on 50 statements would fall within the 10,000 sample threshold.

I think you misinterpreted. 1 voter on 50 statements with (+1/-1/0) would be 1 datapoint with 50 features. 200 voters would be 200 rows with 50 features so you would not need to be concerned about the 10,000 sample threshold. Hope that helps your study.