Title: Tech Strings in Documents (aka Tech Extractor) 2009

Release Date: 2015-07-01

Document Date: 2009-12-01

Description: This December 2009 NSA presentation explains how keyword selection works within XKeyScore: see the Intercept article XKEYSCORE: NSA’s Google for the World’s Private Communications, 1 July 2015.

Document: xks-tech-extractor-2009-p1-normal.gif:
TOP SECRET//COMINT//REL TO USA. AUS, CAN, GBR. NZU/20291123

Tech Strings in
Documents

(aka Tech Extractor)

December 2009xks-tech-extractor-2009-p2-normal.gif:
TOP SECRET//COMINT//RELTO USA, AUS, CAN. GBR. NZL

What is the Tech. Extractor?

r

The "Tech Extractor" is a way of finding
valuable intelligence based on keywords in
the content of DNI sessions but it is a
departure from traditional "soft selection"
which tends to bring back a lot of junk.xks-tech-extractor-2009-p3-normal.gif:
TOP SECRET//COMINT//RELTO USA, AUS, CAN. GBR. NZL

What is soft selection?

Soft selection, aka content based selection,
is an approach at targeting traffic by looking
for keywords or phrases rather than specific
E-mail accounts

Content based selection has suffered
because of the poor design of content
based selection enginesxks-tech-extractor-2009-p4-normal.gif:
TOP SECRET/.'COMINT//RELTO USA, AUS. CAN. GBR. NZL

Soft Selection vs Surgical Selection

r

Existing selection techniques are blunt instruments
XKEYSCORE contextual dictionaries provide an
extremely sharp knife to make accurate selection
decisions

‘That’s not a knife....THAT’S a knife!1xks-tech-extractor-2009-p5-normal.gif:
Communication vs DNI Content

¥

Selection engines in use today were based on
designs built to handle TELEX traffic
TELEX is a highly formatted content rich type of
traffic that does not resemble raw DNI seen with
Internet traffic

Raw Internet traffic contains HTML, web-pages,
raw base-64 encoded documents etc.

When analysts think of DNI “content” they are
more referring to “communication content” then
raw DNI content.xks-tech-extractor-2009-p6-normal.gif:
Communication vs DNI Content

&

If an analyst tasks a Boolean equation “bomb”
and “chemical” they likely want to see all
communication that mentions ‘bomb’ and
‘chemical’ and not all web pages, news stories,
blog posts etc. where those two words appear
What we need is a context-aware scanning
engine that knows where it is inside of the raw
DNI in order to properly apply analyst taskingxks-tech-extractor-2009-p7-normal.gif:
I7/COMINT//I

What is the Tech Extractor

r

The Tech Extractor was X-KEYSCORE’s
first stab at context-aware scanning and it
only focuses on three contexts:

E-mail Bodies

■ Chat Bodies

■ Document Bodies:

Microsoft Word, Excel, PowerPoint, Project, Visio
Adobe PDF, Postscript
Rich Text Format (RTF)xks-tech-extractor-2009-p8-normal.gif:
TOP SECRET//COMINT//RELTO USA. AUS, CAN. GBR. NZL

How does the Tech Extractor work?

The Tech Extractor works by scanning a list
of keywords against those three contexts
and then tagging the results.

It’s important to note that this is not “filtering
and selection” and we’re not forwarding any
data home

XKS is simply tagging sessions with
meta-data, much like we do with
appids+fingerprintsxks-tech-extractor-2009-p9-normal.gif:
How does the Tech Extractor work?

r

After the meta-data tag is applied, analysts
can then use that meta-data tag as part of a
USSID-18 compliant query for traffic
It’s important to note, just like
AppIDs+Fingerprints, Tech Extractor tags
aren’t necessarily USSID-18 compliant by
themselves. You may need to add a valid
foreign IP address, MAC address or country
code before you query!xks-tech-extractor-2009-p10-normal.gif:
TOP SECRET//COMINT//RELTO USA, AUS, CAN. GBR. NZL

Where does XKS get its list of terms?

Analysts provide the XKS team with lists of
terms, called “Tech Dictionaries” which can
contain multiple category names (aka “Tech
Names”

Only after the XKS team is supplied with
those terms can the system begin scanning
and tagging.xks-tech-extractor-2009-p11-normal.gif:
Aflficlimtfli

TOP SECRET//COMINT//RELTO USA. AUS. CAN. GBR. NZL

j a & » « g|ïy ;ü[ët ® p^~ • ® i o’ •



DATETIME DATETIME END TECH NAME TECH VALUE
2008-01-01 04:55:00 2008-01-01 04:55:01 wireless WIMAX
2008-01-01 04:55:00 2008-01-01 04:55:01 satellite DVB
2008-01-01 04:55:00 2008-01-01 04:55:01 mac 00200E00953C

ID

TECH

FILENAME

NIB

Ranchor

Line

KHI.doc

NIB

Ranchor

Line

KHI.doc

NIB

Ranchor

Line

fc'wI.doc

This document would have been nearly
impossible to find without the context aware
tasking. The terms ‘wimax’ and £dvb’ are too
generic for CADENCE style tasking and the
MAC address hit on an anchorless regular
expression, impossible with current
corporate scanning engines

TOP SECRET//COMINT//REL TO USA. AUS, CAN, GBR. NZLxks-tech-extractor-2009-p12-normal.gif:
TOP SECRET/.'COMINT//RELTO USA. AUS. CAN. GBR. NZL

Ira



Context-Aware Tagging

E

subject: NFF-66024-GCC-KHI

' From:

■To:

i Cc:



t(

Date: Tue Dec 30 1057 48 GMT20C6

Event T
emailj

Fm City
KLOSTf

HTML Plain Text

Model: 6300
WON: 66024

ASC:GCC-KHI

Symptom: 4100

Comments: no fault found phone is working properly kindly confirm the fault in detail when and in which condition it
creates problem related to mention symptomxks-tech-extractor-2009-p13-normal.gif:
Full Foreign Language Support

r

Supports full foreign language tagging
and querying

Ex look for common Arabic expressions
in E-mails coming from the Pakistan
tribal regions:

, UIS Webmail Display ® Windows Live' Mail Unknowii

^J^grr.ail com)

Medium riskYou may not know this sender Mark as safelMatk as unsafe

Sent Thu 1/01/0912:07 PM



TOP SECRETHCOMINTf/REL TO USA, AUS, CAN, GBR, NZLxks-tech-extractor-2009-p14-normal.gif:
TOP SECRET/.'COMINT/^RELTO USA. AUS. CAN. GBR, NZL

Live Demo

/


Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p1-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p2-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p3-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p4-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p5-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p6-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p7-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p8-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p9-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p10-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p11-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p12-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p13-normal.gif)

Download Document

Tech Strings in Documents (aka Tech Extractor) 2009 (xks-tech-extractor-2009-p14-normal.gif)

e-Highlighter

Click to send permalink to address bar, or right-click to copy permalink.

Un-highlight all Un-highlight selectionu Highlight selectionh