Geometries of data, from molecules to threat detection
by Ashley Yeager
We’re used to objects having a certain length, width and depth. They usually exist in time as well, giving them four dimensions.
But to Mauro Maggioni, a professor of mathematics, computer science and electrical and computer engineering (ECE) at Duke, four dimensions aren’t enough to understand how enzymes and proteins work, or to distinguish tanks from trees in military images.
Maggioni works with the geometry of high-dimensional data—millions of points graphed in hundreds or thousands of dimensions. For example, to understand the motion of biomolecules, researchers simulate countless possible molecule configurations. Even on powerful computers, these simulations take a very long time to run.
Collaborating with chemists, Maggioni analyzes all the coordinates of atoms in a molecule and identifies geometric and dynamical properties that let him build low-dimensional representations of the molecule’s dynamics. The simplified representations can then speed up simulation time, and understand and map out the complex landscapes accessed by the molecule, he says.
Another of Maggioni’s “big data” projects is studying the hundreds of colors in what is known as hyperspectral imaging. Right now, images from smartphones and hand-held cameras have three basic colors, all in visible light. Hyperspectral images, on the other hand, combine colors from across the electromagnetic spectrum—everything from radio to ultraviolet radiation—and provide clues to the chemical properties of photographed materials.
“With such multi-dimensional data coordinates, you can tell the perfect time to harvest a crop, see if a gas leak is really chemical warfare, or monitor concentration of key chemicals in blood,” Maggioni said. He is also working with ECE professors David Brady and Larry Carin to design faster and more accurate X-ray scanners, which use new algorithms and imaging designs to screen airport baggage for weapons and explosives.
One feature of many of these data sets is that they change with time and have different levels of granularity, or bits of data, at various stages. Maggioni’s algorithms allow him to detect and measure these changes in molecular configuration space, hyperspectral movies or networks and network traffic.
“The algorithms often enjoy a sort of universality dear to mathematicians, in that they are easily adapted to different types of data, as they attempt to capture crucial properties of data by learning them from the data itself,” Maggioni said.