Protect computer vision from adversary attacks
Advances in computer vision and machine learning have enabled a wide range of technologies to perform sophisticated tasks with little or no human supervision. From autonomous drones and self-driving cars to medical imaging and product manufacturing, many computer applications and robots use visual information to make critical decisions. Cities increasingly rely on these automated technologies for public safety and infrastructure maintenance.
However, compared to humans, computers see with a kind of tunnel vision that makes them vulnerable to attacks with potentially catastrophic results. For example, a human driver, seeing graffiti covering a stop sign, will recognize it anyway and stop the car at an intersection. In contrast, the graffiti could cause a self-driving car to miss the stop sign and drive through the intersection. And, while the human mind can filter out all sorts of unusual or extraneous visual information when making decisions, computers get stuck on tiny deviations from expected data.
Indeed, the brain is infinitely complex and can simultaneously process a multitude of data and past experiences to arrive at almost instantaneous decisions adapted to the situation. Computers rely on mathematical algorithms trained on sets of data. Their creativity and cognition are limited by the limits of technology, mathematics, and human foresight.
Malicious actors can exploit this vulnerability by changing the way a computer sees an object, either by modifying the object itself or an aspect of the software involved in the vision technology. Other attacks can manipulate the decisions the computer makes about what it sees. Either approach could spell disaster for individuals, cities or businesses.
A team of researchers from Bourns College of Engineering at UC Riverside is working on ways to thwart attacks on computer vision systems. To do this, Salman Asif, Srikanth Krishnamurthy, Amit Roy-Chowdhuryand Chengyu song first determine which attacks work.
“People would want to do these attacks because there are a lot of places where machines are interpreting data to make decisions,” said Roy-Chowdhury, the principal investigator of a recently concluded DARPA AI Explorations program called Techniques. for Machine Vision Disruption. “It could be in an adversary’s interest to manipulate the data on which the machine makes a decision. How does an adversary attack a data stream to make decisions wrong? »
An adversary would inject malware into the software of an autonomous vehicle, for example, so that when the data arrives from the camera, it is slightly disrupted. As a result, models installed to recognize a pedestrian fail and the system either hallucinates an object or fails to see one that exists. Understanding how to generate effective attacks helps researchers design better defense mechanisms.
“We’re looking to perturb an image so that if it’s analyzed by a machine learning system, it’s miscategorized,” Roy-Chowdhury said. “There are two main ways to do this: deepfakes where someone’s face or facial expressions in a video have been altered in a way to deceive a human, and adversarial attacks where an attacker manipulates how the machine makes a decision but a human is usually not wrong. The idea is that you make a very small change in an image that a human cannot perceive but an automated system will and make an error.
Scroll to continue
Roy-Chowdhury, his collaborators and their students found that the majority of existing attack mechanisms aim to misclassify specific objects and activities. However, most scenes contain multiple objects, and there is usually a relationship between the objects in the scene, which means some objects occur more frequently than others.
People who study computer vision call this co-occurrence “context.” Group members demonstrated how to design contextual attacks that change the relationships between objects in the scene.
“For example, a table and a chair are often seen together. But a tiger and a chair are rarely seen together. We want to handle all of this together,” Roy-Chowdhury said. “You could change the stop sign to a speed limit sign and remove the crosswalk. If you changed the stop sign to a speed limit sign but left the crosswalk, a self-driving car’s computer might still recognize that as a situation where it needs to stop.
Earlier this year, at the conference of the Association for the Advancement of Artificial Intelligence, researchers showed that for a machine to make a bad decision, it is not enough to manipulate a single object. The group has developed a strategy for designing adversarial attacks that modify multiple objects simultaneously in a consistent way.
“Our main idea was that successful transfer attacks require holistic manipulation of the scene. We learn a context graph to guide our algorithm on which objects should be targeted to fool the victim model, while maintaining the overall context stage,” said Salman Asif.
In a paper presented this week at the Conference on Computer Vision and Pattern Recognition, the researchers, together with their collaborators from PARC, a research division of the Xerox company, take this concept further and propose a method where the attacker does not have access to the victim. Computer system. This is important because with each intrusion the attacker risks being detected by the victim and defending himself against the attack. The most successful attacks are therefore likely to be those that do not probe the victim’s system at all, and it is crucial to anticipate and design defenses against such “zero-query” attacks.
Last year, the same group of researchers exploited contextual relationships over time to design attacks against video footage. They used geometric transformations to design highly effective attacks on video classification systems. The algorithm leads to successful disruptions in surprisingly few attempts. For example, adversarial examples generated from this technique have better attack success rates with 73% fewer attempts compared to leading methods for video adversarial attacks. This allows for faster attacks with far fewer probes into the victim system. This paper was presented at the premier machine learning conference, Neural Information Processing Systems 2021.
The fact that adversary context-aware attacks are much more powerful on natural images with multiple objects than existing ones that focus primarily on images with a single dominant object opens the way for more effective defenses. These defenses can take into account the contextual relationships between objects in an image, or even between objects across a scene in images by multiple cameras. This offers potential for the development of much more secure systems in the future.
Transferred from UC Riverside