DARPA Memex: How It Works and What It’s Up To — Really

Written by Jim Kelly

DARPA Memex is a super tech tool for deeply searching the Darknet. Jim Kelly does a deep dive on Memex, its projects and the tech at hand. [analysis]

20,000 leagues under the sea darknet memexaNewDomain — In the famous 19th century novel, 20,000 Leagues Under the Sea, Captain Nemo plunged in his daring excursion to the dark depths in order to reach the bottom of the sea.

The Pentagon, similarly, has grandiose and profound plans for the development of the most advanced domain-specific Internet search engine and tools in history. That’s DARPA Memex. It will, in short,be capable of tracking and tracing everything that everyone is doing. It’ll do that, DARPA says, via the Dark Internet, or Dark Net.

Hacks, Cyberwarfare and Memex

To truly understand DARPA Memex and the Pentagon’s goals for it, you have to first talk cyberwarefare. The two are of course related, and Memex’s greatest asset will be to scour the deepest depths of the Darknet, a topic I covered awhile back.

It’s been said, in intelligence communities, that data breaches “are like waves in the ocean. They constantly batter at our shores until the water level slowly overtakes the country.” Following recent breaches of the Pentagon, the Office of Personnel Management (OPM) and major governmental and private sector infrastructures, it’s clear that cyberwarefare and the security needed to defend against it are paramount.

Cyber defense that can combat nation-state sponsored hackers, terror groups, organized crime and hacktivists seeking to disrupt the balance of global cyber security through targeted attacks must be developed, and Memex is the highest caliber tool to date.

Really, we’ve only seen the tip of the iceberg of cybewarfare. There is a buildup of “Cold War” proportions in this field, and the US is drawing battle lines against critical threats to itself and other G20 nations. The recent hacks for troves of information is just the beginning.

Forget Ashley Madison. Data breaches endanger US security in more ways than you think

Most recently, the Ashley Madison hack rang every media source in the world, and it follows on the private heels of Sony, JP Morgan, Target, Home Depot and travelers of major air carriers. In these cases the adversary uses prolonged internal reconnaissance to identify valuable data, exfiltrate said data over an extended period, and then moves laterally to another system to repeat the process until all valuable information is harvested or the breach is detected.

The combined attacks against all of these corporate entities will not have as significant an impact on national security as the game-changing OPM breach, revealed this summer.

In June 2015, OPM disclosed an October 2014 breach of systems maintained at a Department of the Interior shared-services data center, which led to the exposure of an estimated 4.2 million personal records. Applicants for clearances complete a 127-page Standard Form-86, which contains all of their personal information, work history, family, associates, deviances and proclivities. Consequently, an unknown adversary now possesses the granular personal information belonging to the 19.7 million U.S. citizens who have requested or possessed a security clearance since the year 2000.

Even worse, the amount and detail of information about each victim increases in proportion to the individual’s level of security clearance. This means that individuals with the highest clearance levels are at the greatest risk of exploitation.

Numerous officials, including U.S. Intelligence Chief James Clapper, have attributed the attack to China. FireEye, iSight Partners and other firms attribute the attack to a Chinese state sponsored APT group, referred to as “Deep Panda.” “Deep Panda” steals PII from U.S. commercial and government networks for Chinese intelligence and counter-intelligence purposes.

A cyber arms race buildup?

Most recently, the leading non-profit, non-partisan think tank, the Institute for Critical Infrastructure Technology (ICIT), issued a Summer 2015 series of authoritative white papers scrutinizing major cyber invasions against the nation’s infrastructure. The papers have been shared with the US.Senate, the US House of Representatives, federal intelligence agencies and critical infrastructure leaders.

ICIT is a tactical, bipartisan forum of public/private thought leaders composed of federal agency executives, legislative community members, university faculty and industry leaders to share insights on various cybersecurity topics.

On August 6, 2015, ICIT issued a chilling white paper, entitled, “How to Use Encryption and Privacy Tools to Evade Corporate Espionage,” describing leading world economic and political powers as carrying out a cyber arms race and buildup of Cold War proportions:

The threat is much greater than you can imagine. We have passed the escalation phase and have engaged directly into full confrontation in the cyberwar. State-sponsored hacking groups are regularly committing targeted and complex attacks against governments, businesses, and individuals.”

The brief was directed to high-risk targets including law firms, pharmaceutical researchers and investment banking firms, among others, to minimize their attack surface while anonymizing themselves from the hacktivists and nation states wishing to cause them harm.

The paper continues:

The first possibility is that you and your business are already breached in some way and have been for some time now. Somewhere in your system (at home, the office, your cellphone/tablet, or even your smartwatch) state-sponsored hackers from China, the Eastern Bloc, North Korea, or even Iran have placed software that allows them to quietly watch your every online move and record it all; thereby, stealing away information that provides them with a decided advantage in business negotiations or outright stealing intellectual property to copy it with impunity.  Hacker groups like Anonymous, The Syrian Electronic Army, The Chaos Computer Club (Europe), and Tarh Andishan (Iran) may be siphoning off the your organization’s most treasured secrets for no other reason than to expose them to the world and embarrass those you protect.”

On July 15, 2015, at House Oversight hearing on Cybersecurity at the Department of the Interior, Representative Will Hurd (R-TX) remarked,

It is no secret that Federal agencies have a long way to go to improve their cybersecurity posture. We have years and years of reports highlighting the actions and vulnerabilities of Federal agencies. We also have years and years of recommendations from IGs, GAO, and experts in and out of the Government on how to address these vulnerabilities. Simply put, we know what needs to be done; we just need to do it.”

Enter: DARPA Memex …

To regain control of the battlefield, DARPA claims it has created and developed the world’s most sophisticated search engine to combat human trafficking occurring on the Darknet, which comprises TorFreenet and I2pe. It’s dubbed Memex. Christopher White, DARPA program manager, wrote:

Memex seeks to develop software that advances online search capabilities far beyond the current state of the art … The goal is to invent better methods for interacting with and sharing information, so users can quickly and thoroughly organize and search subsets of information relevant to their individual interests.”

Initially, this new super tech will be used for governmental surveillance and intelligence gathering to identify and bust human trafficking rings and to battle the Islamic State of Iraq and the Levant (ISIL). DARPA Director Dr. Arati Prabhakar commented earlier this year at the First Annual Future of War Conference:

My first tour at DARPA was in the Cold War, and at that time [we] worked against the one monolithic existential threat and everything else was just sort of backseat … We don’t really have the luxury of dealing with (just) one kind of national security threat today.”

The “Memex” mnemonic combines “memory” and “index,” terms inspired by Vannever Bush’s Atlantic Monthly 1945 article, “As We May Think,” which spoke of the development of a memory machine during World War II, and Douglas Engelbart’s Mother of All Demos from 1968, which aimed to improve the democratization of an index and better methods for discovering data and sharing information.

darknet memex sex traffickingMemex vastly exceeds the shortcomings of the current state-of-the-art, “one-size-fits all” approach of indexing and web search. Current web-scale commercial providers, such as Google, rely on linear processes and advertising for these functions.

According to DoD white papers, the purpose of the development of Memex is aimed at technological superiority. In short, it’s meant to revolutionize science, devices and systems. Addressing privacy concerns by privacy advocates supporting the Fourth and Fifth Amendment interests, the DoD has specifically indicated that the agency is not interested in de-anonymizing or attributing identity to servers or IP addresses not publicly available.

According to DARPA, the vision of Memex’s benefits that will allow the evolution of a state-of-the-art technology include:

  1. Creating next-generation search technologies to exponentially advance the richness of discovery, organization and presentation of domain-specific content.
  2. Developing a new domain-specific search platform to discover relevant and useful content.
  3. Expanding current search capabilities to the deep web and Darknet.
  4. Improved interaction between military, government and commercial enterprises to organize data discoverable on the Internet.

Memex Combats Sex Trafficking

Memex’ primary objective right now is fighting human trafficking. That work includes identification of forums, chats, advertisements, hidden services and job postings that deal in such matters, DARPA says. The commercial sex trade has significant business and web presence relevant to military, law enforcement and intelligence services, and it is a clear moral issue on which to found a new technology.

According to the Pentagon, defeating human trafficking with Memex involves developing:

  1. Domain-specific indexing, referring to scalable web crawling infrastructure for link discovery and information extraction and overcoming counter-crawling measures, including bans on robot behavior, paywalls, human detection, member-only forums and non-HTML content.
  2. Domain-specific searching, including designing query language for crawling and information extraction algorithms.
  3. Domain-specific applications such as Counter Human Trafficking, and, during the life of the program, may include possible indexing for found data, missing persons and counterfeit goods or the next Silk Road.

These domains refer to the previously unexplored targets immersed deeply in the Darknet where criminal activities have abounded in recent years due to the near undetectability in an underground trafficking network. Memex will provide tracing through artificial intelligence, which, in part, robotically maps the whispers, trends and patterns in the Darknet galaxy.

The program has no limits in its scope and depth. It plans to reach “the bottom of the sea” in a once nearly indecipherable and anonymized realm of data, according to former DoD electrical engineer, John Galinski, Founder and CEO of Global Data Sentinel (GDS), a global cybersecurity firm. Notably, GDS recently hired ValerMemex designie Plame, a former career covert CIA operative, to its advisory board. Ms. Plame authored the NYT bestselling memoir “Fair Game: My Life as a Spy, My Betrayal By the White House.”

Seventeen private companies have bid to get involved in the Memex project …

Solicitations of bids for involvement in the Memex project commenced back in February 2014, and they have since attracted around 17 companies, we’ve learned. Each is developing highly refined versions of Memex for its use and application in defense, education, law enforcement and academia in a massive plan to commercialize Memex in various industry applications.

Federally-funded research and development centers and government entities were encouraged to apply depending upon certain criteria, including nondisclosure agreements, security regulations and other governing applicable statutes. Some of the organizations are universities, including Carnegie Mellon University, MIT Research Laboratory, New York University and Stanford University. Other participants include ASA and research-focused firms.

According to the Upstart Business Journal, some of the the standout applications include:

  • MIT Lincoln Laboratory, which built the Text.jl natural language processing tool. “Text.jl provided numerous tools for text processing optimized for the Julia language. Functionality supported include algorithms for feature extraction, text classification, and language identification.”
  • Stanford University built the DeepDive trained system, “which means that it uses machine-learning techniques to incorporate domain-specific knowledge and user feedback that can deal with noisy and imprecise data by producing calibrated probabilities” in its searches.
  • SRI International built two infrastructure tools with the Tor Project, the US Navy and others. “HSProbe tests whether specified Tor hidden services (.onion addresses) are listening on one of a range of pre-specified ports, and optionally, whether they are speaking over other specified protocols.”
  • Diffeo, based in Cambridge, Mass., is creating a framework library components to build “search applications that learn what users want by capturing their actions,” according to the site.
  • Hyperion Gray, based in Arlington, Va., is developing several projects around data collection, including SourcePin, to assist users in discovering websites with relevant information to a topic or domain. SourcePin will allow users to leverage an advanced web crawling system to produce more results than the conventional processes in less time.

However, the long-term prospects of this super technology becoming accessible to the commercial public may be three years away, according to sources with the Tor Project.

In my view, Memex will reveal the once invisible and chilly underside of the iceberg that is Darknet, exposing its frozen mass and allowing it to finally see the light of day. This can’t be a bad thing. Can it?

For aNewDomain, I’m .

Images in order: Darknet by Daniel Rehn via Flickr; 20,000 Leagues Under the Sea via Wikimedia Commons; P1100368 by Women’s eNews via Flickr; Memex via Wikimedia Commons