According to professor James Elder, co-author of a York University paper, Deep convolutional neural networks (DCNNs) could not view objects in the same way that humans do (through configural shape perception), which could be harmful in real-world AI applications.
The collaborative study by Elder, who holds the York Research Chair in Human and Computer Vision and is Co-Director of York’s Centre for AI & Society, and Assistant Psychology Professor Nicholas Baker at Loyola College in Chicago, a former VISTA postdoctoral fellow at York, was published in the Cell Press journal iScience.
The study used unique visual stimuli known as “Frankensteins” to investigate how the human brain and DCNNs comprehend holistic, configural object features.
Frankensteins are simply objects that have been taken apart and put back together the wrong way around. As a result, they have all the right local features, but in the wrong places.”
James Elder, Study Co-Author and Professor, York University
The researchers identified that DCNNS are not confused by Frankensteins, while the human visual system is confused, revealing an insensitivity to configural object properties.
Elder points out, “Our results explain why deep AI models fail under certain conditions and point to the need to consider tasks beyond object recognition in order to understand visual processing in the brain.”
These deep models tend to take ‘shortcuts’ when solving complex recognition tasks. While these shortcuts may work in many cases, they can be dangerous in some of the real-world AI applications we are currently working on with our industry and government partners.”
James Elder, Study Co-Author and Professor, York University
A traffic video safety system is one such application. “The objects in a busy traffic scene—the vehicles, bicycles, and pedestrians—obstruct each other and arrive at the eye of a driver as a jumble of disconnected fragments,” details Elder.
The brain needs to correctly group those fragments to identify the correct categories and locations of the objects. An AI system for traffic safety monitoring that is only able to perceive the fragments individually will fail at this task, potentially misunderstanding risks to vulnerable road users.”
James Elder, Study Co-Author and Professor, York University
According to the investigators, changes to training and architecture that were intended to make networks more brain-like did not result in configural processing, and none of the networks could accurately predict trial-by-trial human object judgments.
“We speculate that to match human configural sensitivity, networks must be trained to solve a broader range of object tasks beyond category recognition,” notes Elder.
Source:
Journal reference:
Baker, N., et al. (2022) Deep learning models fail to capture the configural nature of human shape perception. iScience. doi.org/10.1016/j.isci.2022.104913.