Extracting Personal Information from Large Language Models Like GPT-2

Researchers have been able to find all sorts of personal information within GPT-2. This information was part of the training data, and can be extracted with the right sorts of queries.

Paper: “Extracting Training Data from Large Language Models.”

Abstract: It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model.

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.

We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.

From a blog post:

We generated a total of 600,000 samples by querying GPT-2 with three different sampling strategies. Each sample contains 256 tokens, or roughly 200 words on average. Among these samples, we selected 1,800 samples with abnormally high likelihood for manual inspection. Out of the 1,800 samples, we found 604 that contain text which is reproduced verbatim from the training set.

The rest of the blog post discusses the types of data they found.

More on the SolarWinds Breach

The New York Times has more details.

About 18,000 private and government users downloaded a Russian tainted software update –­ a Trojan horse of sorts ­– that gave its hackers a foothold into victims’ systems, according to SolarWinds, the company whose software was compromised.

Among those who use SolarWinds software are the Centers for Disease Control and Prevention, the State Department, the Justice Department, parts of the Pentagon and a number of utility companies. While the presence of the software is not by itself evidence that each network was compromised and information was stolen, investigators spent Monday trying to understand the extent of the damage in what could be a significant loss of American data to a foreign attacker.

It’s unlikely that the SVR (a successor to the KGB) penetrated all of those networks. But it is likely that they penetrated many of the important ones. And that they have buried themselves into those networks, giving them persistent access even if this vulnerability is patched. This is a massive intelligence coup for the Russians and failure for the Americans, even if no classified networks were touched.

Meanwhile, CISA has directed everyone to remove SolarWinds from their networks. This is (1) too late to matter, and (2) likely to take many months to complete. Probably the right answer, though.

This is almost too stupid to believe:

In one previously unreported issue, multiple criminals have offered to sell access to SolarWinds’ computers through underground forums, according to two researchers who separately had access to those forums.

One of those offering claimed access over the Exploit forum in 2017 was known as “fxmsp” and is wanted by the FBI “for involvement in several high-profile incidents,” said Mark Arena, chief executive of cybercrime intelligence firm Intel471. Arena informed his company’s clients, which include U.S. law enforcement agencies.

Security researcher Vinoth Kumar told Reuters that, last year, he alerted the company that anyone could access SolarWinds’ update server by using the password “solarwinds123”

“This could have been done by any attacker, easily,” Kumar said.

Neither the password nor the stolen access is considered the most likely source of the current intrusion, researchers said.

That last sentence is important, yes. But the sloppy security practice is likely not an isolated incident, and speaks to the overall lack of security culture at the company.

And I noticed that SolarWinds has removed its customer page, presumably as part of its damage control efforts. I quoted from it. Did anyone save a copy?

EDITED TO ADD: Both the Wayback Machine and Brian Krebs have saved the SolarWinds customer page.

Another Massive Russian Hack of US Government Networks

Another Massive Russian Hack of US Government Networks

The press is reporting a massive hack of US government networks by sophisticated Russian hackers.

Officials said a hunt was on to determine if other parts of the government had been affected by what looked to be one of the most sophisticated, and perhaps among the largest, attacks on federal systems in the past five years. Several said national security-related agencies were also targeted, though it was not clear whether the systems contained highly classified material.

[…]

The motive for the attack on the agency and the Treasury Department remains elusive, two people familiar with the matter said. One government official said it was too soon to tell how damaging the attacks were and how much material was lost, but according to several corporate officials, the attacks had been underway as early as this spring, meaning they continued undetected through months of the pandemic and the election season.

The attack vector seems to be a malicious update in SolarWinds’ “Orion” IT monitoring platform, which is widely used in the US government (and elsewhere).

SolarWinds’ comprehensive products and services are used by more than 300,000 customers worldwide, including military, Fortune 500 companies, government agencies, and education institutions. Our customer list includes:

  • More than 425 of the US Fortune 500
  • All ten of the top ten US telecommunications companies
  • All five branches of the US Military
  • The US Pentagon, State Department, NASA, NSA, Postal Service, NOAA, Department of Justice, and the Office of the President of the United States
  • All five of the top five US accounting firms
  • Hundreds of universities and colleges worldwide

I’m sure more details will become public over the next several weeks.

Sidebar photo of Bruce Schneier by Joe MacInnis.

A Cybersecurity Policy Agenda

The Aspen Institute’s Aspen Cybersecurity Group — I’m a member — has released its cybersecurity policy agenda for the next four years.

The next administration and Congress cannot simultaneously address the wide array of cybersecurity risks confronting modern society. Policymakers in the White House, federal agencies, and Congress should zero in on the most important and solvable problems. To that end, this report covers five priority areas where we believe cybersecurity policymakers should focus their attention and resources as they contend with a presidential transition, a new Congress, and massive staff turnover across our nation’s capital.

  • Education and Workforce Development
  • Public Core Resilience
  • Supply Chain Security
  • Measuring Cybersecurity
  • Promoting Operational Collaboration

Lots of detail in the 70-page report.

Finnish Data Theft and Extortion

The Finnish psychotherapy clinic Vastaamo was the victim of a data breach and theft. The criminals tried extorting money from the clinic. When that failed, they started extorting money from the patients:

Neither the company nor Finnish investigators have released many details about the nature of the breach, but reports say the attackers initially sought a payment of about 450,000 euros to protect about 40,000 patient records. The company reportedly did not pay up. Given the scale of the attack and the sensitive nature of the stolen data, the case has become a national story in Finland. Globally, attacks on health care organizations have escalated as cybercriminals look for higher-value targets.

[…]

Vastaamo said customers and employees had “personally been victims of extortion” in the case. Reports say that on Oct. 21 and Oct. 22, the cybercriminals began posting batches of about 100 patient records on the dark web and allowing people to pay about 500 euros to have their information taken down.

Open Source Does Not Equal Secure

Way back in 1999, I wrote about open-source software:

First, simply publishing the code does not automatically mean that people will examine it for security flaws. Security researchers are fickle and busy people. They do not have the time to examine every piece of source code that is published. So while opening up source code is a good thing, it is not a guarantee of security. I could name a dozen open source security libraries that no one has ever heard of, and no one has ever evaluated. On the other hand, the security code in Linux has been looked at by a lot of very good security engineers.

We have some new research from GitHub that bears this out. On average, vulnerabilities in their libraries go four years before being detected. From a ZDNet article:

GitHub launched a deep-dive into the state of open source security, comparing information gathered from the organization’s dependency security features and the six package ecosystems supported on the platform across October 1, 2019, to September 30, 2020, and October 1, 2018, to September 30, 2019.

Only active repositories have been included, not including forks or ‘spam’ projects. The package ecosystems analyzed are Composer, Maven, npm, NuGet, PyPi, and RubyGems.

In comparison to 2019, GitHub found that 94% of projects now rely on open source components, with close to 700 dependencies on average. Most frequently, open source dependencies are found in JavaScript — 94% — as well as Ruby and .NET, at 90%, respectively.

On average, vulnerabilities can go undetected for over four years in open source projects before disclosure. A fix is then usually available in just over a month, which GitHub says “indicates clear opportunities to improve vulnerability detection.”

Open source means that the code is available for security evaluation, not that it necessarily has been evaluated by anyone. This is an important distinction.

More on the Security of the 2020 US Election

Last week I signed on to two joint letters about the security of the 2020 election. The first was as one of 59 election security experts, basically saying that while the election seems to have been both secure and accurate (voter suppression notwithstanding), we still need to work to secure our election systems:

We are aware of alarming assertions being made that the 2020 election was “rigged” by exploiting technical vulnerabilities. However, in every case of which we are aware, these claims either have been unsubstantiated or are technically incoherent. To our collective knowledge, no credible evidence has been put forth that supports a conclusion that the 2020 election outcome in any state has been altered through technical compromise.

That said, it is imperative that the US continue working to bolster the security of elections against sophisticated adversaries. At a minimum, all states should employ election security practices and mechanisms recommended by experts to increase assurance in election outcomes, such as post-election risk-limiting audits.

The New York Times wrote about the letter.

The second was a more general call for election security measures in the US:

Obviously elections themselves are partisan. But the machinery of them should not be. And the transparent assessment of potential problems or the assessment of allegations of security failure — even when they could affect the outcome of an election — must be free of partisan pressures. Bottom line: election security officials and computer security experts must be able to do their jobs without fear of retribution for finding and publicly stating the truth about the security and integrity of the election.

These pile on to the November 12 statement from Cybersecurity and Infrastructure Security Agency (CISA) and the other agencies of the Election Infrastructure Government Coordinating Council (GCC) Executive Committee. While I’m not sure how they have enough comparative data to claim that “the November 3rd election was the most secure in American history,” they are certainly credible in saying that “there is no evidence that any voting system deleted or lost votes, changed votes, or was in any way compromised.”

We have a long way to go to secure our election systems from hacking. Details of what to do are known. Getting rid of touch-screen voting machines is important, but baseless claims of fraud don’t help.

On Blockchain Voting

On Blockchain Voting

Blockchain voting is a spectacularly dumb idea for a whole bunch of reasons. I have generally quoted Matt Blaze:

Why is blockchain voting a dumb idea? Glad you asked.

For starters:

  • It doesn’t solve any problems civil elections actually have.
  • It’s basically incompatible with “software independence”, considered an essential property.
  • It can make ballot secrecy difficult or impossible.

I’ve also quoted this XKCD cartoon.

But now I have this excellent paper from MIT researchers:

“Going from Bad to Worse: From Internet Voting to Blockchain Voting”
Sunoo Park, Michael Specter, Neha Narula, and Ronald L. Rivest

Abstract: Voters are understandably concerned about election security. News reports of possible election interference by foreign powers, of unauthorized voting, of voter disenfranchisement, and of technological failures call into question the integrity of elections worldwide.This article examines the suggestions that “voting over the Internet” or “voting on the blockchain” would increase election security, and finds such claims to be wanting and misleading. While current election systems are far from perfect, Internet- and blockchain-based voting would greatly increase the risk of undetectable, nation-scale election failures.Online voting may seem appealing: voting from a computer or smart phone may seem convenient and accessible. However, studies have been inconclusive, showing that online voting may have little to no effect on turnout in practice, and it may even increase disenfranchisement. More importantly: given the current state of computer security, any turnout increase derived from with Internet- or blockchain-based voting would come at the cost of losing meaningful assurance that votes have been counted as they were cast, and not undetectably altered or discarded. This state of affairs will continue as long as standard tactics such as malware, zero days, and denial-of-service attacks continue to be effective.This article analyzes and systematizes prior research on the security risks of online and electronic voting, and show that not only do these risks persist in blockchain-based voting systems, but blockchains may introduce additional problems for voting systems. Finally, we suggest questions for critically assessing security risks of new voting system proposals.

You may have heard of Voatz, which uses blockchain for voting. It’s an insecure mess. And this is my general essay on blockchain. Short summary: it’s completely useless.

2020 Was a Secure Election

2020 Was a Secure Election

Over at Lawfare: “2020 Is An Election Security Success Story (So Far).”

What’s more, the voting itself was remarkably smooth. It was only a few months ago that professionals and analysts who monitor election administration were alarmed at how badly unprepared the country was for voting during a pandemic. Some of the primaries were disasters. There were not clear rules in many states for voting by mail or sufficient opportunities for voting early. There was an acute shortage of poll workers. Yet the United States saw unprecedented turnout over the last few weeks. Many states handled voting by mail and early voting impressively and huge numbers of volunteers turned up to work the polls. Large amounts of litigation before the election clarified the rules in every state. And for all the president’s griping about the counting of votes, it has been orderly and apparently without significant incident. The result was that, in the midst of a pandemic that has killed 230,000 Americans, record numbers of Americans voted­ — and voted by mail — ­and those votes are almost all counted at this stage.

On the cybersecurity front, there is even more good news. Most significantly, there was no serious effort to target voting infrastructure. After voting concluded, the director of the Cybersecurity and Infrastructure Security Agency (CISA), Chris Krebs, released a statement, saying that “after millions of Americans voted, we have no evidence any foreign adversary was capable of preventing Americans from voting or changing vote tallies.” Krebs pledged to “remain vigilant for any attempts by foreign actors to target or disrupt the ongoing vote counting and final certification of results,” and no reports have emerged of threats to tabulation and certification processes.

A good summary.

Sidebar photo of Bruce Schneier by Joe MacInnis.

Cybersecurity Visuals

Cybersecurity Visuals

The Hewlett Foundation just announced its top five ideas in its Cybersecurity Visuals Challenge. The problem Hewlett is trying to solve is the dearth of good visuals for cybersecurity. A Google Images Search demonstrates the problem: locks, fingerprints, hands on laptops, scary looking hackers in black hoodies. Hewlett wanted to go beyond those tropes.

I really liked the idea, but find the results underwhelming. It’s a hard problem.

Hewlett press release.

Sidebar photo of Bruce Schneier by Joe MacInnis.

Swiss-Swedish Diplomatic Row Over Crypto AG

Swiss-Swedish Diplomatic Row Over Crypto AG

Previously I have written about the Swedish-owned Swiss-based cryptographic hardware company: Crypto AG. It was a CIA-owned Cold War operation for decades. Today it is called Crypto International, still based in Switzerland but owned by a Swedish company.

It’s back in the news:

Late last week, Swedish Foreign Minister Ann Linde said she had canceled a meeting with her Swiss counterpart Ignazio Cassis slated for this month after Switzerland placed an export ban on Crypto International, a Swiss-based and Swedish-owned cybersecurity company.

The ban was imposed while Swiss authorities examine long-running and explosive claims that a previous incarnation of Crypto International, Crypto AG, was little more than a front for U.S. intelligence-gathering during the Cold War.

Linde said the Swiss ban was stopping “goods” — which experts suggest could include cybersecurity upgrades or other IT support needed by Swedish state agencies — from reaching Sweden.

She told public broadcaster SVT that the meeting with Cassis was “not appropriate right now until we have fully understood the Swiss actions.”

Sidebar photo of Bruce Schneier by Joe MacInnis.

CEO of NS8 Charged with Securities Fraud

CEO of NS8 Charged with Securities Fraud

The founder and CEO of the Internet security company NS8 has been arrested and “charged in a Complaint in Manhattan federal court with securities fraud, fraud in the offer and sale of securities, and wire fraud.”

I admit that I’ve never even heard of the company before.

Sidebar photo of Bruce Schneier by Joe MacInnis.

US Space Cybersecurity Directive

US Space Cybersecurity Directive

The Trump Administration just published “Space Policy Directive – 5“: “Cybersecurity Principles for Space Systems.” It’s pretty general:

Principles. (a) Space systems and their supporting infrastructure, including software, should be developed and operated using risk-based, cybersecurity-informed engineering. Space systems should be developed to continuously monitor, anticipate,and adapt to mitigate evolving malicious cyber activities that could manipulate, deny, degrade, disrupt,destroy, surveil, or eavesdrop on space system operations. Space system configurations should be resourced and actively managed to achieve and maintain an effective and resilient cyber survivability posture throughout the space system lifecycle.

(b) Space system owners and operators should develop and implement cybersecurity plans for their space systems that incorporate capabilities to ensure operators or automated control center systems can retain or recover positive control of space vehicles. These plans should also ensure the ability to verify the integrity, confidentiality,and availability of critical functions and the missions, services,and data they enable and provide.

These unclassified directives are typically so general that it’s hard to tell whether they actually matter.

News article.

Sidebar photo of Bruce Schneier by Joe MacInnis.

North Korea ATM Hack

North Korea ATM Hack

The US Cybersecurity and Infrastructure Security Agency (CISA) published a long and technical alert describing a North Korea hacking scheme against ATMs in a bunch of countries worldwide:

This joint advisory is the result of analytic efforts among the Cybersecurity and Infrastructure Security Agency (CISA), the Department of the Treasury (Treasury), the Federal Bureau of Investigation (FBI) and U.S. Cyber Command (USCYBERCOM). Working with U.S. government partners, CISA, Treasury, FBI, and USCYBERCOM identified malware and indicators of compromise (IOCs) used by the North Korean government in an automated teller machine (ATM) cash-out scheme­ — referred to by the U.S. Government as “FASTCash 2.0: North Korea’s BeagleBoyz Robbing Banks.”

The level of detail is impressive, as seems to be common in CISA’s alerts and analysis reports.

Sidebar photo of Bruce Schneier by Joe MacInnis.

Review: Cyber Smart

Cyber Smart

Do you believe you’re not interesting or important enough to be targeted by a cybercriminal? Do you think your personal data doesn’t hold any value? Bart R. McDonough proves why those beliefs are wrong in his book Cyber Smart: Five Habits to Protect Your Family, Money, and Identity from Cyber Criminals.

McDonough, CEO and Founder of Agio, is a cybersecurity expert, speaker and author with more than 20 years of experience in the field, and this is his debut book.

Cyber Smart: Five Habits to Protect Your Family, Money, and Identity from Cyber Criminals

He starts by debunking the most common cybersecurity myths, like the one mentioned above. Whether you like it or not, you are important, and your data is important. Also, everything has a price.

McDonough explains all the possible risks and threats you could encounter in a connected world, who are the bad actors, what their goals are and, most importantly, their attack methods.

The author presents five golden rules – or, as he calls them, “Brilliance in the Basics” habits – you should be complying with to maintain a good cybersecurity hygiene: update your devices, enable two-factor authentication, use a password manager, install and update antivirus software, and back up your data.

The second half of the book gives you detailed and specific recommendations on how to protect your:

  • Identity
  • Children
  • Money
  • Email
  • Files
  • Social media
  • Website access and passwords
  • Computer
  • Mobile devices
  • Home Wi-Fi
  • IoT devices
  • Your information when traveling.

McDonough doesn’t use scare tactics that could possibly make you want to forego all technology and go live in the woods. On the contrary, he wants you to embrace it and understand that even if the online world poses so many threats, there’s a lot you can do to protect yourself.

Who is this book for?

You don’t need to be a cybersecurity professional to understand this book. Its language is simple and it offers many comprehensible everyday examples and detailed tips. It’s a book you should definitely have in your home library, also for future reference.

The author has a very clear message: don’t just sit back and hope bad actors will pass you over. Be proactive and take all the possible and necessary steps to secure your data and your devices.

Three Areas to Consider, to Focus Your Cyber-Plan

The administrator of your personal data will be Threatpost, Inc., 500 Unicorn Park, Woburn, MA 01801. Detailed information on the processing of personal data can be found in the privacy policy. In addition, you will find them in the message confirming the subscription to the newsletter.