A new computational model can draw upon normally incompatible data sets, such as satellite imagery and social media posts, to answer questions about what is happening in targeted locations.
This new model can serve as a tool for identifying violations of nuclear nonproliferation agreements, researchers said.
“Our goal was to develop a working framework that uses information from a variety of sensors and data sources to identify these potential violations of nuclear nonproliferation,” said Hamid Krim, co-author of a paper on the work, a professor of electrical and computer engineering at North Carolina State University and director of the VISSTA Laboratory. “Some of these data may be conventional, such as Geiger counter readings or multispectral data from satellite imagery. But many of these data sources may be nontraditional, such as social media posts. And these sources provide a wide variety of data that are not normally compatible, such as the text included on Twitter posts and the images posted on Flickr.
“By making these different inputs compatible with each other, we are able to accept a broader range of data inputs and use that data in a meaningful way that, ultimately, can help authorities reach more reliable conclusions,” Krim said.
The model can be used to work with any data that can be identified as coming from the targeted area, researchers said. For example, satellite images are clearly identifiable, but they may also draw on social media posts actively or passively tagged as coming from the relevant area.
The question then becomes: How do you work with incompatible data? Researchers used identifying a flood as a case in point. They chose a flood because data on flooding is not classified, whereas data regarding nuclear activity is.
The first step in the process is to use mathematical equations to translate each type of data into a useful format. For example, images may be run through models to determine whether they are images of flooding, whereas text posts may be run through models to determine whether they include references to flooding. Once those data streams are translated into a neutral format – meaning they indicate flooding or no flooding – they can be compared to each other to answer basic questions such as: Do the data support each other?
But it’s not quite that simple. For example, people may be tweeting about a flood that is taking place hundreds of miles away, which could skew any calculation by the overarching model. To address this, the researchers incorporated mathematical elements that account for the complexity of the data they are drawing on.
“Addressing complexity is particularly important in the context of nonproliferation enforcement,” Krim said. “Relevant data inputs may include photos of particular types of technology, references made in conversations caught on audio, and so on. A model like the one we developed needs to be flexible enough to account for the variability and complexity of both varied types of data and the varied clues we are looking for.”
The researchers tested their model using data from a 2013 flood that took place in Colorado, and were able to resolve the incompatibility of multi-modal data in order to accurately estimate the location of the flooding.
Next steps for the project include evaluating nuclear facilities in the West to identify common characteristics that may also be applicable to facilities in more isolated societies, such as North Korea.
“We want to find ways of transferring information from known environment to a hidden one,” Krim said. “How can we determine what information and which models are transferable from one place to another, given incompatible or inconsistent data? What’s normal, and what’s not? It’s not an easy problem.”