← Back to context

Comment by fogleman

2 years ago

I think this is flawed. You quickly end up on a color that's clearly not "blue" or "green" and you're unlikely to keep hitting "this is green" several times in a row, conceding that ok, fine, maybe this is blue, whatever. You're basically measuring how many times people are willing to click the same button in a row.

Edit: Possible improvements: changing the wording to "this is MORE green" and "this is MORE blue" and randomizing the order in which they are shown, somehow. I realize you're just doing some kind of binary search, narrowing the color range.

This is not to mention color calibration of your monitor, or your eyes adjusting / fatiguing to the bold color over time...

The order is randomized. Hit reset and you'll get a different sequence. The sequence is also adaptive (not a binary search---it's hitting specific points of the tail of a sigmoid in a logistic regression it's building as you go along). Try it a few times and you'll see how reproducible it is for you.

It of course depends on the calibration of your monitor. One of the reasons I did this project is I wanted to see if there were systematic differences in color names and balance in the wild, for example, by device type (desktop vs. Android vs. iPhone), time of day (night mode), country (Sapir-Whorf), etc.

  • The sequence itself should be converging however, right? I feel that there should be some random jumps outside of the current confidence interval so that contextual aspects can be filtered out or at least recognized.

    • Yes, exactly this. Because it seems to be converging right now, I quickly get the feeling that there's no meaningful choice, after the first three prompts you end up with something that's neither green nor blue. Re-taking the test gave me a very different score.

      It might work better for me to do some contrastive questioning: show a definite green followed by an intermediary color, then a definite blue followed by an intermediate color.

      2 replies →

I'd prefer blue/green/neither.

With the third colour, I just thought "no, that's teal", and my decision was (as you suggested) semi-arbitrary.

  • It is common practice in psychometrics to use two levels in a forced choice and model responses as a logistic regression, which is what's done here. Adding an N/A option turns the thing into an ordered logistic regression with unknown levels, which is tricky to fit, but it's possible. Having done a lot of psychophysics, having more options generally doesn't make the task easier.

    • Sounds like psychometrics is unsuitable for modeling this problem, according to what you're saying. When you have a hammer everything looks like a nail.

    • The way that XKCD did it is the best, you ask people to give a name to each color then the responses are entirely natural and unprompted.

      I don’t think that forced choice can give accurate results if a substantial number of people perceive green and blue as being non-adjacent - i.e. there exists a color between green and blue (turquoise/cyan/teal).

      Otherwise it’s like asking people whether a color is red or yellow, when it’s clearly a shade of orange.

      3 replies →

    • That’s why I took the test 5 times, and my scores varied between 63% and 69% “green” so I took the average at 66.4

    • Are you sure that it is common practice for a problem that has three valid answers A, B and C, to only allow people to answer A or C?

      Your website is not talking about "levels" of colour.

      It's asking "is this blue or green", not "is this closer to blue or closer to green".

      The question (1) "is this blue or green" has three valid answers: blue, green or neither.

      The question (2) "is this closer to blue or green" only has two valid answers.

      I would assume that with these types of surveys, the first thing to do is to qualify the proper categorization of the question.

      Sorry to say, but to me it seems that almost all of the confusion in the discussion here is because you're asking question (1) (which has three valid answers) but expecting an answer from (2) (which indeed has two valid answers).

  • Then everything past the first blue would be neither and you wouldn’t have anything interesting. All those other colors are different shades of teal.

  • But this choice has very limited impact; as you are already in a very narrow window of color

I definitely have the bias you mention. In my case I don't think it's mainly due to not wanting to push the same button many times in a row, but because I compare with the previous color, so if previously I was already somewhat unsure but I chose green and now it became slightly bluer, it "must" be blue, right?

I think I can get over it, but it requires conscious effort and even then, who knows. Bias is often unconscious.

Another possible improvement would be to alternate the binary search colors with some randomly-generated hues. Even if those answers are outright ignored, and the process becomes longer, I think they would help to alleviate that bias. At least you wouldn't be directly comparing to the previous color.

VFX engineer here. Yes we used to cailbrate monitors and work in the dark.

However one of the key people that built our colour pipeline was also colour blind, so its not actually a requirement, so long as you use the right tools.

Most people aren't that sensitive to colour, especially if its out of context. a minority of people aren't that good at relative chromaticity as well (as in is this colour bluer/greener/redder than that one) But a lot of people are.

Language affects how you perceive colour as well.

But to say the experiment is flawed I think misses the nuance, which is capturing how people see colour _in the real world_. Sure some people will have truetone on, or some other daily colour balance fiddling. But thats still how people see the world as it is, rather than in isolation.

  • I once worked for a company that had a designer who was color blind. He would always show up wearing the exact same outfit every day: turns out that he was REALLY color blind, and so he just gave up and bought 7 long sleeved shirts and 7 pants, all black. Didn't work out so well for him in the designs... most companies don't want monochrome websites.

One issue with it: I did it 3 times and got 3 very different results.

  • Likewise. I think for me there's quite a wide band of colours in the middle that I consider to be "neither/either", so I'm basically just picking a random answer for those.

    A modified version of the test that finds two boundaries (green/neither/blue) could be interesting.

    Or maybe it just needs to take more samples, in a more random order.

  • Same. Some of them are neither obviously blue nor obviously green, so what the test was measuring for me was what I was thinking about at the time, the decision I'd previously made, whether my mouse was currently hovering over "blue" or "green", etc.

>I think this is flawed. You quickly end up on a color that's clearly not "blue" or "green" and you're unlikely to keep hitting "this is green" several times in a row, conceding that ok, fine, maybe this is blue, whatever.

I agree with you, the whole thing is flawed when it could be better. When you ask the question "is my blue your blue?", you are evoking the old philosophical question, and it's a question about color perception, not words. This test did not test color perception, it tested "what word do you use?"

I think of blue as a pure color, and green as a wide range of colors all the way to yellow, to me another pure color. so if there's any green at all in it, I'm going to call it green. (maybe it's left over from kindergarten blending "primary colors". also, while I like green grass, I don't like green as a color, so any green I see is a likely to make me think, ew, green) But in terms of what I see, I can only assume I'm seeing the same thing as everybody else is because the test is not testing it. Just because I call something green doesn't mean I don't see all the blue in it.

>Edit: Possible improvements: changing the wording to "this is MORE green" and "this is MORE blue" and randomizing the order in which they are shown, somehow. I realize you're just doing some kind of binary search, narrowing the color range.

yes, the test should show you pure blue, then a turquoise mix, then pure green, and a ... etc. It should also retest you on things you already answered to measure where you are consistent.

  • I do think that the philosophical question could potentially be approachable in a modern context;

    Show people a colour and map their brain activity - the level of similarity between two people's colour perceptions should be reflected by similarities in the activity.

    • Why do you think that would be the case?

      One persons ‘blue’ activity could be different than another’s while still being the same wavelength of light and general perception.

      11 replies →

Agreed. It would be more accurate to show the final gradient (without the curve) and let people choose where is the boundary. It wasn't even clear what the actual task is

Yeah, it felt like a trick question to me.

Because the second color I saw was somewhat like turquoise and the site is called 'Is My Blue Your Blue,' I decided that everything that you say yes to colors would be blue and everything else would be green. I never saw a green until the result was displayed :D