When We See A Moving Scene We Perceive It As Some Segmentation But This Is Neith

When we see a moving scene, we perceive it as some segmentation. But this is neither image segmentation based on image intensity, neither motion segmentation based on motion. Somehow, our brain is capable of combining all the existing information to give a better segmentation than either the intensity based or the motion based segmentation. How does the brain combine the two dierent segmentations cues into a single segmentation?

