Here’s how intelligence agencies can search for foreign documents without learning the language
WASHINGTON — The intelligence community now has a tool that allows English-speaking users to search for information in foreign-language text and speech.
The new tool was developed by Raytheon BBN Technologies in partnership with Intelligence Advanced Research Projects Activity – an organization within the Office of the Director of National Intelligence that develops technologies to solve some of the intelligence community’s toughest problems.
Essentially, once English-speaking users enter an English search query, the program sifts through foreign-language documents and records to find relevant results, translating those phrases into English before presenting the results to the user. It’s an “English-in, English-out” tool, and the company says its system allows operators to search for foreign documents, find results, and understand their context and meaning without having to speak. language, according to a Jan. 31 announcement.
Raytheon said it used Kazakh, Pashto, Somali, Swahili and Tagalog as low-data foreign languages for its machine learning algorithm, which was also tested against Farsi, Bulgarian, Lithuanian and Georgian.
“The system is designed to be applied to any foreign language,” John Makhoul, Raytheon BBN program manager, said in a statement. “Low-resource languages present a particular challenge for research and translation technologies due to a lack of data for training systems. Raytheon BBN met this challenge by developing techniques to overcome the lack of data and applied them to an end-to-end system that exceeded program goals.
The solution is part of IARPA’s Machine Translation for Retrieving English Information in Any Language, or MATERIAL, program launched in 2017. Raytheon is one of four prime contractors. developing solutions, including Johns Hopkins University, Columbia University, and the University of Southern California Institute for Information Science. Each seller received a training data set to develop machine learning solutions. MIT Lincoln Laboratory, University of Maryland Center for Advanced Study of Language, National Institute of Standards and Technology, and Tarragon Consulting assembled the test and evaluation team that evaluated performance.
“The tools and techniques developed through the program will enhance our ability to find, review and analyze foreign language content without the need to learn the language,” Carl Rubino, IARPA MATERIAL program manager, said in a statement. “For low-resource languages where expertise is minimal, these new features provide a significant advantage.”
Raytheon Technologies did not disclose the value of its contract with IARPA, although it noted that it had been working with the organization on the solution for four years. In a statement at the time of the initial awards, Columbia University said his grant was $14 million.
Nathan Strout is the editor of C4ISRNET, where he covers the intelligence community.