Comment by jacob019

6 days ago

Found the web interface: https://ux.priorlabs.ai/ Really cool!

Just playing around with regression mode...

    A very simple dataset, powers of two: 
    1:2, 2:4, 3:8, 5:32, 6:64, 7:128 (missing the #4 value)
    Predictions (1-10): 
    1.582 5.236 13.150 22.943 37.584 67.475 109.945 155.322 218.001 10,300.425
    Error (1-10): 
    -26.4% 23.6% 39.2% 30.3% 14.9% 5.2% -16.4% -64.8% -134.9% -240.9% 

... well, it has a positive slope

Let's see what happens if we copy the exact same values in the dataset 10 times first.

    Predictions (1-10): 
    1.993 3.967 7.986 18.138 31.965 64.140 128.125 126.607 130.667 161.756 
    Error (1-10): 
    -0.3% -0.8% -0.2% 11.8% -0.1% 0.2% 0.1% -102.2% -291.8% -533.1%

Interesting, repeated values give the model a lot more confidence of the known values. The interpolated #4 value is still off by 12%. It does not extrapolate well at all.

Looking forward to trying it on real world data with more features.

Yes! This makes sense from a learning perspective: More samples add additional evidence the datapoint is actually what you observed - based on one sample the model is closer to a mean regression (which would translate to more balanced class probabilities in classification). Transformers have trouble counting repeated entries (there was a famous failure case of ChatGPT, asking it to count the number of 1s and 0s in a string). This model has some tricks to solve this.