← Back to context

Comment by pvaldes

7 hours ago

More points for though

1) The model use incomplete data. The data used to train the model is based in 7800 species alive. After wikipedia, Gastropoda have more than 75000 species alive, plus 15000 fossil species known. (We can assume safely that this is a snail, but remember that some cephalopods also have coiled shells).

2) The model use spurious data. All clams and Tusk shells must be removed (because we want to classify a snail). This means that the number of snails available to train the model is much lower than 7800. Including non-snails just gives us a false confidence in the strength of our model.

3) The model covers only one couple traits in this species, but this particular traits can vary within members of the same species. Taxonomy uses thousands of traits to classify a mollusc and some are particularly fastididious. Dozens of items only to describe the shell. Often the soft parts are needed (Is the penis shaped like a club? this genus, shaped like a whip? this other one; the penis in your sample is contracted because you didn't put to sleep the animal first with mint crystals, though luck, we'll never know).

4) The model is based in extant alive species, but we want to identify a fossil. Alive species have non-distorted shells. Fossils often lose their shape by the weight of sediments and compression. Only the thickest shells would keep its real height/wide proportions.

5) The model ignores important details. The species found in the desert has a very evident shell groove at the top of the spire, that the targeted species does not have. This alone, tells a newbie taxonomist that the result is wrong.

And to end this, 6) the model ignores all knowledge about the species and its habitat

Sphincterochila candidissima is a western Mediterranean species. It lives from Spain to Libia. The fossil is from Saudi Arabia.