Searching for Genes, Photo Files, and Landmines — In Haystacks

December 10, 2008

Lawrence Carin's graduate training was in electromagnetics and wave analysis-- a fitting choice for someone who remembers trying to assemble and plug in electronic contraptions as a five year old and who grew up tinkering with radios and motors. Overall, his life was fairly standard preparation for the electronic and computer engineering field, but where his work has led is anything but typical. Carin's graduate work was focused on traditional physics in the context of improving land mine detection techniques. But the longer he worked in the field, the more he came to appreciate that such topics cannot be considered in isolation.

"I increasingly found that if I wanted to do what I thought was interesting and had the most impact, I needed to understand the statistics of the problem." So, over time he retrained himself as an applied statistician, and in the process he became one. "I'm very much out on a limb now," he says with a laugh, "I'm very far from where I started."

Lawrence CarinToday, though still heavily involved in work on the detection of landmines and other explosives, Carin's research is largely on the statistical side, specifically, focused on developing innovative techniques to comb large datasets for useful information. From research on explosives detection he has branched into a range of projects including identifying genes responsible for certain illnesses, improving methods for shrinking military and medical image files, and streamlining medical diagnoses. The theme that ties these seemingly disparate projects together is a statistical concept known as sparseness. The term refers to miniscule numbers of data points of interest in large data sets.

"It's really a needle in the haystack kind of thing," says Carin.

All of his work now focuses on finding those useful needles and putting them to use, not only in academic research, but also through a private company called the Signal Innovations Group that he helped found. If it's not clear how techniques for hunting landmines could help make better photo files or track down genes responsible for a common cold, that's to be expected. If the connections were obvious, it would have all been done before. At this point in his career, with a team of competent students and a company taking those techniques ready for prime time to the market, Carin says his main role is birthing those novel ideas for which haystack needles to go after and how to find them. Read on to learn where his data processing ideas have led, and why.

Statistical Gene Analyses: An Uncommon Look at the Common Cold
One of the newest areas where Carin is developing methods and algorithms for exploiting sparse data is in processing genetic information. The project is supported by the Defense Advanced Research Projects Agency (DARPA) with a goal of identifying genes that, when active, or turned on, indicate someone is about to get sick. Unlike so much genetic research focusing on life-threatening conditions such as cancer, the target for this work is common viruses such as colds. The initial motivation for the project is that when the military sends out teams on missions, it can at times cause great problems if even one person becomes sick before the mission is complete. The trouble increases, of course, if even more are about to succumb to some virus.

"Even if it's not life threatening," says Carin, "This can significantly undermine a mission."

Spotting such problems in time to properly deal with them is difficult, because typically when a person begins to show clear signs of a virus, the condition is already well along and not much can be done. Knowing earlier might open better treatment options, or at the very least, give leaders the option of removing a person from a mission.

For this project, the needle and haystack challenge is that there are tens of thousands of functioning genes in humans, and the team needs to find the one or two that are good indicators of virus infection. To do that, Carin teamed with a group of doctors at the Duke Medical School to examine a group of 22 patients in what is known as a challenge study. The research involved actually inoculating the volunteers with a virus, which is the challenge, in this case, a form of the common cold called the rhinovirus. During the study all the patients stayed in a hotel and the team took blood samples at regular intervals for analyses of specific genes. The team then ran algorithms designed to identify genes whose behavior changes over time after inoculation with the virus. For example, a number of genes responsible for inflammatory responses turn on well before a person gets the stuffy nose or other symptoms of inflammation.

"We're just looking for glinting needles in this mess," says Carin.

Though they are still analyzing data, the researchers have already identified genes tied to the onset of virus symptoms. Based on identification of these genes during the study, they were able to successfully predict who would and who would not get sick--not perfectly, says Carin, but better than anticipated. They found that about half of the participants had some genetic resistance to the virus, evidenced by fewer genes being turned on in response to the inoculation, and later by an absence of sickness. For those who did eventually get sick, certain genes turning on did turn out to be a good indicator of what was to come.

Ultimately, Carin hopes to identify genes that are indicators for a wide range of viruses, so that more universal tests can be established. Beyond basic viruses, Carin says the gene sifting work might lead to ways to identify people who have been exposed to biological weapons, prior to the onset of visible symptoms. In such cases, the early indication could save a life rather than just a mission. The work could also offer clues that other researchers can use to develop new treatments for viruses. Oddly enough, the gene research helped to improve techniques for compressing photographic images.

Compressive Sensing: A Picture's Worth … a Lot More With a Smaller File Size
Another recently sprouted branch of the Carin research program focuses on the increasingly hot field of compressive sensing. This is a complex art of creating computer image files that are initially stored in a compressed, or minimized, state, instead of compressing larger files after the fact, as occurs with JPEGs and other popular file formats. Though on the surface this work may seem decidedly unlike a hunt for genes, from a statistical perspective the challenges involved are quite similar.

Most people are familiar at least with the concept of compressing larger photo files such as TIFFs (Tagged Image File Format) into the much smaller and more manageable JPEG (Joint Photographic Experts Group) format. Such compression involves various mathematical transformations of the data in the initial file, and removing the repetitive or unnecessary information, which can be as much as 80 percent of the original volume. While such compression is essentially a matter of convenience for most photographers, in other applications, reliable photo compression is all but necessity. Two key examples are the military, which has to compress a nearly endless stream of surveillance photos effectively in order to make information in the photos accessible.

"The Department of Defense has a tremendous amount of data," says Carin, "If they don't compress it they just can't operate." Medical facilities face similar problems because they have to be able to shrink massive files for MRI images in order to make them manageable. These and other fields would welcome any improvement to compression technology, because it could reduce long-term storage needs. But such advancement would also reduce computing power requirements and initial storage needs, all of which would lead to simpler, cheaper image collection.

So, a few years ago, people began to wonder if the compression process could be skipped, with cameras and other devices simply recording the critical data in compressed form from the outset, without having to perform compression conversions. From such thoughts the field of compressive sensing has emerged. With compressive sensing, a camera records data akin to, but distinct from, the information found in a standard JPEG compressed file.

The key challenge is then taking the limited data available in the compressively sensed file and turning it into an image that matches the original subject.

"It turns out that once again exploiting this idea of sparseness is how you do it," says Carin, "so it's precisely the same technology as with our gene expression work."

With any given compressively sensed file, there is a nearly endless array of images the compressed file could represent, and taken collectively, these possible images are the proverbial haystack in this research. The needle is the one image among these myriad possibilities that is the original scene. It can be identified because it has the greatest degree of compression between it and the compressively sensed file, so it is this singular solution that Carin's algorithms are designed to search out. Once this problem is solved, the original image can then be assembled and displayed. Carin's group has developed one of the most widely used algorithms for compressive sensing, and improvements are still coming.

Though the work has been funded largely through military grants, medical uses such as improving Magnetic Resonance Images (MRIs) are likely to be a major commercial application. Besides simply making it easier to work with MRI images because file sizes would be smaller, compressive sensing would also dramatically reduce the time needed to create an image, because less information would have to be gathered and stored.

That could mean less time inside an MRI machine and less time holding your breath, which would be an extremely welcome advance, as anybody who has fought against claustrophobia and fear of the unsettling sounds involved during an MRI will attest.

Though the computational technology is advanced enough for commercial application, military, medical and other applications are still some time off because devices that exploit the current algorithms haven't yet been developed.

"That's at its absolute infancy, "says Carin, "a lot of work still need to be done."

Saving Lives with Statistics: Detecting Explosives
Landmines are a global scourge that kill and maim thousands of people each year, but the technologies for identifying and disabling the millions and millions of active mines found in dozens of countries remain sadly inadequate.

Addressing this problem through the development and improvement of methods for detecting landmines is how Carin began his career. Such work still garners much of his laboratory group's attention and now also includes efforts aimed at related problems of detecting unexploded ordnance such as bombs, as well as the unconventional munitions such as car bombs.

land mineAs with his other research, the algorithms Carin's group develops to detect landmines and other weapons are focused on exploiting sparseness. In this case, of course, the mine or unexploded bomb is the sought after needle in a variety of haystacks ranging from ocean sediments to desert sands. With the detection of life-threatening devices, obviously the stakes are higher than in some of Carin's other pursuits. However, there are similarities.

With compressive sensing, the challenge is that all the data about an original image are not available, so mathematical inferences must be made based on the data that were collected. Likewise, with weapons detection, algorithms have to account for inevitably incomplete information about a given site. Because you can't sense everything about everything, one challenge is determining which manageable set of parameters can be measured.

Answering that question varies according to each situation. Plastic mines require different techniques and algorithms than metal ones. Underwater mine detection using acoustic signals is vastly different from detection on land using radar. Carin and his team have developed algorithms for numerous different scenarios, and those charged with searching an area for dangerous devices can choose according to the types of devices likely present and the environment where they are working. But choosing the right methods is not always straightforward.

So, besides creating the algorithms for detection work, Carin also does research in the separate but related area of sensor management. This involves developing programs that analyze situations and equipment available and then advise field users on which types of detectors to use, and in what sequence, to accomplish the most effective and efficient sweep of an area.

"For the landmine problem and the IED threat, no single sensor is going to do the job," says Carin, "That's a guarantee."

Carin's work, coupled with that of numerous other research groups around the world, has enabled major advances in the technologies needed to reduce the impacts of mines and other explosives, saving countless lives in the process. He even helped found a company called the Signal Innovations Group to commercialize his group's programs. But, there's still much to be done.

"It's like cancer research," says Carin, "It will never be solved, but you make progress."

On the Similarities Between Enemy Planes and The Beatles
Besides work to analyze data in search of landmines and other explosives, Carin also develops improved methods for identifying targets of interest from sonar and radar data by analyzing past datasets to spot patterns common to such targets. To illustrate to military personnel and other the process of finding those patterns, Carin decided to use an entirely different type of data, namely music.

The idea was that the waveforms found in a particular type of music could be analyzed to find other similar types of music in a way analogous to finding targets of interest. The analysis would have nothing to do with catalog information about the music, but would focus on patterns in the music itself. For instance, if some of The Beatles jazzier tunes were used as a starting point, the analysis would choose other jazz music as being similar, not other rock songs.

The thinking was that military representatives could appreciate the technique more directly by seeing this application to music and hearing for themselves how similar the songs resulting from a search sounded to the original song or songs analyzed. But from that initial demonstration work, the method took on a life of its own.

"Venture capitalists started asking questions about how good the method really did compared to what is out there," says Carin, "And it turns out it's probably better than or equal to any other technique."

The result is that Carin is now in the process of setting up a private company that will make the music search technique available for use on Amazon, iTunes, or other music services to help customers find music they may like based on songs they've already identified as their favorites.

Carin's technique offers the potential for more accurate predictions, and also means that even independent music not commercially available can be analyzed to aid in the search, because no other information besides the song itself is needed for an analysis. Carin has even found that the analyses work on speech, meaning that the technique can even be applied to recordings of comedians.

"It wasn't our intention to get into this when we set out," says Carin, "but in any case we're now setting up a venture to try to go into the music space in a serious way."

Improved Healthcare Through Statistics
Carin's team has been able to apply it's sensor management work originally developed for landmine detection to healthcare. Just as using every possible detector isn't possible at a given site, it's not feasible for doctors to use every sensor at their disposal on every patient. Body scans can answer a huge range of medical questions, but they are expensive and require a great deal of time and work, so everyone that enters the door of a hospital can't simply be sent in for a scan. Instead, doctors need to first take simpler measurements such as blood pressure, pulse, temperature, and blood tests to answer as many questions as possible, only resorting to more extensive testing for the most difficult diagnoses. Carin has developed techniques that guide healthcare providers in choosing the best sequence of tests to follow.

"Tell me the suite of tools a medical center has, and the cost to deploy each," he says, "and I can tell them how to optimally sense to give the best diagnosis at the lowest cost."

The techniques have already been used successfully, for instance to improve diabetes screening, and further applications of the research are likely in coming years.