Title: Content Extraction Analytics

Description: This 21 May 2009 presentation from the NSA’s Center for Content Extraction includes a slide that shows that the communications of 122 heads of government were stored in the agency’s central “Target Knowledge Base”, 11 of whom are named. The collection for most of these targets was automated: see the Der Spiegel article ‘A’ for […]

Document: Human Language
technology

Center for Content Extraction

Content Extraction Analytics

SIGDEV End-to-End Demo

21 May 2009

Derived From: NSA/CSSM 1-52
Dated: 20070108
Declassily On: 20330108

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08

Introduction to Content Extraction

■ New technologies can find Essential
Elements of Information in documents

■ The Center for Content Extraction provides
"one stop shopping" for these technologies
at NSA

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320108

TOP SECRET//COMINT//REL TO USA, A US, CAN, GBR, NZL//20320L08

Extraction can benefit SIGDEV from end to end

■ Selection

■ Translation & Transliteration

■ Analysis

■ Interpretation/Enrichment

■ Retrieval

■ Storage & Distribution

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08

STAIRS Partners

S (Marina, CEA)

T (Cybertrans)

A (SNA/Paintlball, Synapse)

I (Nymrod/Thundercloud)

R (Journeyman/CPE)

S (GoldenRetriever, Sociopath)

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08

TOP SECRET//COM1NTWREL TO USA, AUS, CAN, CBR, NZL//20320I08

Implementation: CCE Extraction Architecture (LexHound)

Subscription Based
Customers - extracted
report/transcript content

Marina (com m s tracki ng)
Synapse/EKS (link analysis)
Nymrod (Mame Matching)

Web Service
On

Demand

Customers

Web Services

LexHound Web Demo
CAMT (translation)

TKB (target knowledge base)

SNA (social network analysis)

GIS (geo mapping)

NTÛC (terror cell tracking)
Heresyitch(UCcollateral)

Go Id en Retr i e ve r ( record building)

Reports


Ingester
1 r *
Dispatcher
'

Task Manager

"FT—r


y 1


Segmenter

' I \ \

\ Bitractor(s)

1 *

Transformer

Rfenderer

Sender

Output

TOP SECRET//COMINT//REL TO ESA, AES, CAN, CBR, NZL//20320L08

TOP SECRET//COMINT//REL TO USA, A US, CAN, GBR, NZL//20320L08

Elaboration: The Central Importance of Storage

□ Each of the STAIRS Steps exploits
stored information

■ Selection Dictionaries ("get it")

■ Linguistic Glossaries for Translation

■ Wikis etc for enrichment ("know it")

□ Manual record-formation is slow, prone
to omissions and inconsistencies

■ <200K Person Targets in TKB

■ Growth ~ = 20K/year

□ Automatic extraction accelerates storage

■ >3000K Citation Records in Nymrod Entity DB

■ Growth ~= 1000K/year

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08

Machine vs. Manual Chief-of-State Citations

Nymrod (machine-extracted) Citations Last TKB
Name Role Cod Cites Manual Update
1 Abdullah Badawi Malaysian Prime Minister cos > 100 10/15/200 7
2 Abdullahi Yusuf Somali President cos > 300 N/A
3 Abu Mazin (Mahmud Abbas) PA President cos >200 5/20/2009
4 Alan Garcia Peruvian President cos > 100 N/A
5 Aleksandr Lukashenko Belarusian President cos > 50 N/A
6 Alvaro Colom Guatemalan President cos >200 N/A
7 Alvaro Uribe Colombian President cos > TOO N/A
8 Amadou Toumani Toure Malian President cos > 50 N/A
9 Angela Merkel German Chancellor cos > 500 N/A
10 Bashar al-Asad Syrian President cos > &00 N/A
122 Yuliya Tymoshenko Ukrainian Prime cos >200 N/A

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20320L08

I {uman Language
Technology

e-Highlighter

Click to send permalink to address bar, or right-click to copy permalink.

Un-highlight all Un-highlight selectionu Highlight selectionh