Title: Content acquisition optimisation

Release Date: 2013-10-14

Description: This Special Source Operations presentation describes the NSA’s problems with overcollection of data from email address books and buddy lists: see the Washington Post article NSA collects millions of e-mail address books globally, 14 October 2013.

Document: TOP SECRET//SI//NOFORN

Content Acquisition Optimization

TOP SECRET//SI//NOFORN

TOP SECRET//SI//NOFORN

Yahoo Webmessenger

• Update data sent to individuals logged into Yahoo's Instant
Messenger service online

- Online contact status, unread emails in Yahoo inbox

- Usually small sessions (2-4kB)

• Sporadic collection (30,000 - 60,000 sessions per day)

• Intermittent bursts of collection against contacts of targets

- Large numbers of sessions (20,000+) against a single targeted selector

- Not collected against the target (online presence/unread email from target)

- No owner attribution (metadata value limited to fact-of comms for emails,
online presence events for buddies)

• Over a dozen selectors detasked in two weeks

- Because a target's contact was using/idling on Yahoo Webmessenger

- Several very timely selectors (Libyan transition, Greek financial related)

TOP SECRET//SI//NOFORN

TOP SECRET//SI//NOFORN

Address Books

• Email address books for most major webmail are collected as
stand-alone sessions (no content present*)

• Address books are repetitive, large, and metadata-rich

• Data is stored multiple times (marina/mainway, pinwale, clouds)

• Fewer and fewer address books attributable to users, targets

• Address books account for ~ 22% of SSO's major accesses (up
from ~ 12% in August)

Access(10 Jan 12) Total Sessions Address Books " Provider Collected A;ributed A;ributed%
US-3171 1488453 237067 (16% of traffic) Yahoo 444743 11009 2.48%
DS-200B 938378 311113 (33% of traffic) Hotmail 105068 1115 1.06%
US-3261 94132 2477 (3% of traffic) Gmail 33697 2350 6.97%
US-3145 177663 29336 (16% of traffic) Facebook 82857 79437 95.87%
US-3180 269794 40409 (15% of traffic) Other 22881 1175 5.14%
US-3180 (16 Dec 11) TOTAL 289318 3257738 91964 (32% of traffic) 712366 (22% of traffic) TOTAL 689246 95086 13.80%

TOP SECRET//SI//NOFORN

TOP SECRET//SI//NOFORN

Buddy Lists,Inboxes

• Unlike address books, frequently contain content data

- Offline messages, buddy icon updates, other data included

- Webmail inboxes increasingly include email content

- Most collection is due to the presence of a target on a buddy list where the
communication is not to, from, or about that target

• NSA collects, on a representative day, ~ 500,000 buddylists and
inboxes

- More than 90% collected because tasked selectors identified only as
contacts (not communicant, content, or owner)

• Identifying buddylists and inboxes without content (or without
useful content) an ongoing challenge

TOP SECRET//SI//NOFORN

TOP SECRET//SI//NOFORN

Scenario:

@yahoo

* I Sep 2011^^^^^^^^|@yahoo.com (tasked S2E, asw
Iran Quds Force) has his/her Yahoo account hacked by an
unknown actor, sends out spam email to his/her contact list:

TOP SECRET//SI//NOFORN

Scenario:

@yahoo

@yahoo.com has a number of Yahoo groups in his/her
contact list, some with many hundreds or thousands of
members



At DS-200B in particular, collection spiked as:

- The initial spam messages were sent (and collected)

- Inboxes of email recipients were viewed by contact list

- Messages were sometimes viewed, but more often sent as precached
views on Google and Yahoo (along with inboxes)

- Inboxes where the recipient did not delete the spam message continued to
be collected every time they were viewed

- Some recipients added

@yahoo.com to their address books

(possibly as a spam defeat?) - address books were collected every time

TOP SECRET//SI//NOFORN

TOP SECRET//SI//NOFORN

Scenario:

@yahoo

DS-200B Collection By Day -11 Sep - 24 Sep (in MB)

120000 i
100000
80000
60000
40000
20000
0 n

Ov Ov Ov Ov Ov Ov Ov Ov Ov \\ Ov Ov A Ov

n^N

# ¿P J" J* J* # ^ ¿P #
Cp Cp ojv Cp Cp Cp Cp Cp Cp cp cp cp cp cp

DS-200B Collection By Hour - 18 Sep - 23 Sep (in MB)

40000 i
30000 J
20000
10000

0 n

9/18/2011 9/18/2011 9/19/2011 9/19/2011 9/20/2011 9/20/2011 9/21/2011 9/21/2011 9/22/2011 9/22/2011 9/23/2011 9/23/2011
0:00 12:00 8:00 20:00 8:00 20:00 8:00 20:00 8:00 20:00 8:00 20:00

TOP SECRET//SI//NOFORN

TOP SECRET//SI//NOFORN

Scenario:

@yahoo.com emergency detasked from DS-200B and
US-3171 at 13:04Z on 20 Oct

• Numerous first-order address books and inboxes collected

meant tasked selectors on address books or buddy lists of
contacts of | @yahoo.com also affected:

- @yahoo.com and ^^^^^^|@gmail.com emergency
detasked off US-3171 at 13:10Z on 20 Sep

• Memorializing to PINWALE only address books and inboxes
owned by target selectors would have reduced PINWALE
volumes 90%+

- Site XKEYSCOREs would buffer data for SIGDEV purposes

- Metadata from known owner address books and inboxes stored regardless

TOP SECRET//SI//NOFORN

TOP SECRET//SI//NOFORN

Mobile IMAP

• IMAP protocol used by email clients
to fetch mail from server(s)

• Not designed for devices with
intermittent connections (i.e. mobile
phones)

• Android implementation in
particular uses a lot of bandwidth

AO CAPABILITY_______________________

A1 LOGIN
A2 CAPABILITY
A3 EXAMINE INBOX
A4 LIST "" INBOX
A5 LIST "" "INBOX.%"

A6 SEARCH SINCE 15-Aug-2011 UNDELETED ALL
A7 FETCH 17 (ENVELOPE INTERNALDATE RFCS22.SIZE
AS FETCH 17 (BODY.PEEK[HEADER])

A9 CLOSE
A10 LOGOUT

TOP SECRET//SI//NOFORN


e-Highlighter

Click to send permalink to address bar, or right-click to copy permalink.

Un-highlight all Un-highlight selectionu Highlight selectionh