Title: Content Extraction Analytics
Description: This 21 May 2009 presentation from the NSA’s Center for Content Extraction includes a slide that shows that the communications of 122 heads of government were stored in the agency’s central “Target Knowledge Base”, 11 of whom are named. The collection for most of these targets was automated: see the Der Spiegel article ‘A’ for […]
Document: Human Language
technology
Center for Content Extraction
Content Extraction Analytics
SIGDEV End-to-End Demo
21 May 2009
Derived From: NSA/CSSM 1-52
Dated: 20070108
Declassily On: 20330108
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08
Introduction to Content Extraction
■ New technologies can find Essential
Elements of Information in documents
■ The Center for Content Extraction provides
"one stop shopping" for these technologies
at NSA
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320108
TOP SECRET//COMINT//REL TO USA, A US, CAN, GBR, NZL//20320L08
Extraction can benefit SIGDEV from end to end
■ Selection
■ Translation & Transliteration
■ Analysis
■ Interpretation/Enrichment
■ Retrieval
■ Storage & Distribution
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08
STAIRS Partners
S (Marina, CEA)
T (Cybertrans)
A (SNA/Paintlball, Synapse)
I (Nymrod/Thundercloud)
R (Journeyman/CPE)
S (GoldenRetriever, Sociopath)
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08
TOP SECRET//COM1NTWREL TO USA, AUS, CAN, CBR, NZL//20320I08
Implementation: CCE Extraction Architecture (LexHound)
Subscription Based
Customers - extracted
report/transcript content
Marina (com m s tracki ng)
Synapse/EKS (link analysis)
Nymrod (Mame Matching)
Web Service
On
Demand
Customers
Web Services
LexHound Web Demo
CAMT (translation)
TKB (target knowledge base)
SNA (social network analysis)
GIS (geo mapping)
NTÛC (terror cell tracking)
Heresyitch(UCcollateral)
Go Id en Retr i e ve r ( record building)
Reports
Ingester
1 r *
Dispatcher
'
Task Manager
"FT—r
—
y 1
—
Segmenter
' I \ \
\ Bitractor(s)
1 *
Transformer
Rfenderer
Sender
Output
TOP SECRET//COMINT//REL TO ESA, AES, CAN, CBR, NZL//20320L08
TOP SECRET//COMINT//REL TO USA, A US, CAN, GBR, NZL//20320L08
Elaboration: The Central Importance of Storage
□ Each of the STAIRS Steps exploits
stored information
■ Selection Dictionaries ("get it")
■ Linguistic Glossaries for Translation
■ Wikis etc for enrichment ("know it")
□ Manual record-formation is slow, prone
to omissions and inconsistencies
■ <200K Person Targets in TKB
■ Growth ~ = 20K/year
□ Automatic extraction accelerates storage
■ >3000K Citation Records in Nymrod Entity DB
■ Growth ~= 1000K/year
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08
Machine vs. Manual Chief-of-State Citations
Nymrod (machine-extracted) Citations Last TKB
Name Role Cod Cites Manual Update
1 Abdullah Badawi Malaysian Prime Minister cos > 100 10/15/200 7
2 Abdullahi Yusuf Somali President cos > 300 N/A
3 Abu Mazin (Mahmud Abbas) PA President cos >200 5/20/2009
4 Alan Garcia Peruvian President cos > 100 N/A
5 Aleksandr Lukashenko Belarusian President cos > 50 N/A
6 Alvaro Colom Guatemalan President cos >200 N/A
7 Alvaro Uribe Colombian President cos > TOO N/A
8 Amadou Toumani Toure Malian President cos > 50 N/A
9 Angela Merkel German Chancellor cos > 500 N/A
10 Bashar al-Asad Syrian President cos > &00 N/A
122 Yuliya Tymoshenko Ukrainian Prime cos >200 N/A
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08
I {uman Language
Technology