When the news media reports on data breaches and other forms of cybercrime, the center of the story is usually a major software company, financial institution, or retailer. But in reality, these types of attacks are merely part of the damage that global hackers cause on a daily basis.
Town and city governments are becoming a more common target for online criminals. For example, a small city in Florida, Riviera Beach, had their office computers hacked and ended up paying $600,000 to try to reverse the damage. Hackers saw this as a successful breach and are now inspired to look at more public institutions that could be vulnerable.
Why are cities and towns so susceptible to hacking, how are these attacks carried out, and what steps should administrators take to protect citizen data?
How Hackers Choose Targets
While some cybercriminals seek out exploits for the sole purpose of causing destruction or frustration, the majority of hackers are looking to make money. Their aim is to locate organizations with poor security practices so that they can infiltrate their networks and online systems. Sometimes hackers will actually hide inside of a local network or database for an extended period of time without the organization realizing it.
Hackers usually cash in through one of two ways. The first way is to try to steal data, like email addresses, passwords, and credit card numbers, from an internal system and then sell that information on the dark web. The alternative is a ransomware attack, in which the hacker holds computer systems hostage and unusable until the organization pays for them to be released.
City and town governments are becoming a common target for hackers because they often rely on outdated legacy software or else have built tools internally that may not be fully secure. These organizations rarely have a dedicated cybersecurity team or extensive testing procedures.
The Basics of Ransomware
Ransomware attacks, like the one which struck the city government of Riviera Beach, can begin with one simple click of a dangerous link. Hackers will often launch targeted phishing scams at an organization’s members via emails that are designed to look legitimate.
When a link within one of these emails is clicked, the hacker will attempt to hijack the user’s local system. If successful, their next move will be to seek out other nodes on the network. Then they will deploy a piece of malware that will lock all internal users from accessing the systems.
At this point, the town or city employees will usually see a message posted on their screen demanding a ransom payment. Some forms of ransomware will actually encrypt all individual files on an operating system so that the users have no way of opening or copying them.
Ways to Defend Yourself
Cybersecurity threats should be taken seriously by all members of an organization. The first step to stopping hackers is promoting awareness of potential attacks. This can be done through regular training sessions. Additionally, an organization’s IT department should evaluate the following areas immediately.
- Security Tools: City governments should have a well-reviewed, full-featured, and updated virus scanning tool installed on the network to flag potential threats. At an organization level, firewall policies should be put in place to filter incoming traffic and only allow connections from reputable sources.
- Web Hosting: With the eternal pressure to stick to a budget, cities often choose a web host based on the lowest price, which can lead to a disaster that far exceeds any cost savings. In a recent comparison of low cost web hosts, community-supported research group Hosting Canada tracked providers using Pingdom and found that the ostensibly “free” and discount hosts had an average uptime of only 96.54%.For reference, 99.9% is considered by the industry to be the bare minimum. Excessive downtime often correlates to older hardware and outdated software that is more easily compromised.
- Virtual Private Network (VPN): This one should be mandatory for any employee who works remotely or needs to connect to public wi-fi networks. A VPN encodes all data in a secure tunnel as it leaves your device and heads to the open internet. This means that if a hacker tries to intercept your web traffic, they will be unable to view the raw content. However, a VPN is not enough to stop ransomware attacks or other forms of malware. It simply provides you with an anonymous IP address to use for exchanging data.
Local governments need to maintain a robust risk management approach while preparing for potential attacks from hackers. Most security experts agree that the Riviera Beach group actually did the wrong thing by paying out the hacker ransomware. This is because there’s no guarantee that the payment will result in the unlocking of all systems and data.
During a ransomware attack, an organization needs to act swiftly. When the first piece of malware is detected, the infected hardware should be immediately shut down and disconnected from the local network to limit the spread of the virus. Any affected machine should then have its hard drive wiped and restored to a previous backup from before the attack began.
Preparing for different forms of cyberattack is a critical activity within a disaster recovery plan. Every organization should have their plan defined with various team members assigned to roles and responsibilities. Cities and towns should also consider investing in penetration testing from outside groups and also explore the increasingly popular zero-trust security strategy as a way to harden the network. During a penetration test, experts explore potential gaps in your security approach and report the issues to you directly, allowing you to fix problems before hackers exploit them.
With ransomware attacks, a hacker looks to infiltrate an organization’s network and hold their hardware and data files hostage until they receive a large payment. City and town government offices are becoming a common target for these instances of cybercrime due to their immature security systems and reliance on legacy software.
The only way to stop the trend of ransomware is for municipal organizations to build a reputation of having strong security defenses. This starts at the employee level, with people being trained to look for danger online and learning how to keep their own hardware and software safe.
About the author: A former defense contractor for the US Navy, Sam Bocetta turned to freelance journalism in retirement, focusing his writing on US diplomacy and national security, as well as technology trends in cyberwarfare, cyberdefense, and cryptography.
Copyright 2010 Respective Author at Infosec Island
Splash one for GDPR
Ireland’s Data Protection Commission (DPC) has ordered the country to delete 3.2 million people’s personal data after ruling that its national ID card scheme was “unlawful from a data-processing point of view”.
Speaking to the Irish Times, data protection commissioner Helen Dixon described the scheme as “unlawful” and has ordered Ireland’s Department of Social Protection to stop collecting and processing people’s personal data for the project.
Laws underpinning the ID card, Dixon said, had been misinterpreted by the Irish state to give it total freedom to do as it pleased with the data it hoovered up when that was not the case. In a statement about the Public Services Card, the DPC said: “In practical terms, a person’s capacity to access public services both offline and online is now contingent, in an ever-increasing range of contexts, on obtaining and producing a PSC [Public Services Card].”
The Republic of Ireland’s total population is around 4.8 million, meaning around three-quarters of the Emerald Isle’s inhabitants had signed up to the scheme. It was used for everything from “the issuing of driver’s licences or passports, to decisions to grant or suspend payments or benefits under the social protection code, to the filing of appeals against decisions about the provision of school transport.”
But the DPC found that the state was effectively acting as if data protection laws didn’t apply to it at all.
The Department’s blanket and indefinite retention of underlying documents and information provided by persons applying for a PSC contravenes Section 2(1)(c)(iv) of the [Irish] Data Protection Acts, 1988 and 2003 because such data is being retained for periods longer than is necessary for the purposes for which it was collected.
However, in the detail the DPC did admit that data collected by the Department of Social Protection could be used for its intended purpose – just not for other government departments.
“Ultimately, we were struck by the extent to which the scheme, as implemented in practice, is far-removed from its original concept,” thundered the DPC. “Instead, the card has been reduced to a limited form of photo-ID, for which alternative uses have then had to be found.”
The scrapping of the scheme has close parallels with UK attempts at a national ID card, an idea enthusiastically promoted by Tony Blair’s New Labour government of the 2000s which was instantly scrapped by the Conservative-Lib Dem coalition that came to power in 2010.
Concerningly, Conservative-leaning think tanks have forgotten the £300m wasted on UK ID cards and have begun reheating calls to bring them back as some kind of technological wand that will magically solve all government administration woes. ®
Greetings from Portsmouth, New Hampshire!
Hard to believe we’re in mid-August already, but here we are â€” dare I say it? â€” nearing the end of summer. There’s nothing quite like August in the Seacoast. There’s a sort of bittersweet beauty to it: As the daylight wanes, temperatures cool down, harvested fruits and vegetables taste delicious, while the crickets lend a soothing symphony of sound at night. But I sense this is just the calm before what will likely be a stormy fall in the privacy world, particularly here in the U.S.
To wit: The California Legislature reconvened Monday ahead of what will surely be a busy month debating the merits of several amendments to the California Consumer Privacy Act. Of course, time is of the essence as the legislative body has one month to finalize and pass any changes. And there’s quite a bit up in the air right now.Â
Fortunately, CCPA co-architect Mary Stone Ross shared her thoughts with us on the top issues she’ll be watching in the coming weeks. How will “personal information” and “deidentified” be defined? Will the powerful chairwoman of the Judiciary Committee, Hannah-Beth Jackson, pull back any bills if industry successfully pushes for changes to them? Will retailers be barred from selling personal information collected via loyalty card programs? What about the attorney general’s preliminary rules? Will the governor sign any amendments that come out of this session prior to the Oct. 13 deadline? And will there be other “Sacramento shenanigans”?
The legislative timeline is tight. Aug. 30 will be the last day for fiscal committees to meet and report bills, and, according to Ross, all committees must meet prior to Sept. 3, and any amendments on the floor must come before Sept. 6. The last day of the session is Sept. 13, so it will be a notable day for those tracking the CCPA. Gov. Gavin Newsom then has one month to sign any amendments into law.Â
Also CCPA-related, we featured an interesting angle this week from Annie Bai and Peter McLaughlin. “There is a simple and innocuous-sounding CCPA requirement stating that requests for access and deletion must be ‘verified,'” they write. “However, the law does not clarify what qualifies as verified.”Â
Bai and McLaughlin argue that this provision will create significant business risk, citing a recent study on using data subject access requests under the EU General Data Protection Regulation to fraudulently access to other users’ personal information. Surely, many companies have not yet implemented robust DSAR-verification programs, opening them and their customers up to identity theft and unauthorized access. “In the name of empowering consumers,” Bai and McLaughlin point out, “the [CCPA] is actually introducing threat vectors that can be manipulated by fraudsters. This presents a considerable risk to organizations by enabling a data breach while ostensibly trying to comply with the law and support a customer’s data access request.” This means businesses “need to be vigilant as they set up their consumer-response processes,” Bai and McLaughlin suggest.
The IAPP-EY Annual Privacy Governance Report 2017 found that DSARs were among the top-three most difficult GDPR obligations for those surveyed. Verification adds one more layer of complexity. The IAPP has published several articles on operationalizing DSAR responses, like this one for the GDPR and this one from the IAPP’s Rita Heimes. Privacy tech vendors are also developing various DSAR response technology, some of which can be found in the new Privacy Tech Vendor Report 2019. All in all, there’s lots to consider.Â
As we near the unofficial end to summer, hopefully you can enjoy some downtime â€” maybe even head to the lake or the beach â€” because we’re in for a busy fall!Â
Reuters reports the Irish Data Protection Commission has almost concluded its first investigation into a multinational organization under the EU General Data Protection Regulation. Irish Data Protection Commissioner Helen Dixon said in an interview with The Irish Independent the officeâ€™s first major GDPR ruling will likely involve its probe into WhatsApp; however, a formal decision is still a ways off. â€œIâ€™d like to say that we could do it in 48 hours, but it has to be in the order of months, to be done in the way that it has to be done,â€� Dixon said. â€œI will have to allow them a period of time to respond. I would have to consider their responses.â€�
We know, there’s lots of privacy news, guidance and documentation to keep up with every day. And we also know, you’re busy doing all the things required of the modern privacy professional. Sure, we distill the latest news and relevant content down in the Daily Dashboard and our weekly regional digests, but sometimes that’s even too much. To help, we offer our top-five, most-read stories of the week.Â
- Privacy Perspectives: “What one CCPA co-architect will watch closely with Sacramento back in session,” by former Californians for Consumer Privacy President Mary Stone Ross
- Privacy Perspectives: “Why the CCPA’s ‘verified consumer request’ is a business risk,” by Annie Bai, CIMP, CIPP/US, and Peter McLaughlin, CIPP/US
- Daily Dashboard: “Study: Majority of cookie notices are not GDPR compliant“
- Daily Dashboard: “Human transcription of audio chats draws scrutiny from regulators, lawmakers“
- Daily Dashboard: “Organization’s interference with DPO could lead to GDPR fine“
Security experts as rock stars
You could be forgiven for expecting a rock band to take the stage.
The arena filled with people. Laser lights danced across the assembled throng. A bass back-beat thumped somewhere mysterious. A mighty roar from the speakers while this reporter fumbled for earplugs, a moment too late. A man took the stage, armed only with a head mic and a clicker.
Not a rock star. A security expert. He spoke of secure software development and deployment best practices. He spoke of automation. Of changing security culture.
‘Deeply concerned’ UK privacy watchdog thrusts probe into King’s Cross face-recognizing snoop cam brouhaha
ICO wants to know if AI surveillance systems in central London are legal
The UK’s privacy watchdog last night launched a probe into the use of facial-recognition technology in the busy King’s Cross corner of central London.
It emerged earlier this week that hundreds of thousands of Britons passing through the 67-acre area were being secretly spied on by face-recognizing systems. King’s Cross includes Google’s UK HQ, Central Saint Martins college, shops and schools, as well as the bustling eponymous railway station.
“I remain deeply concerned about the growing use of facial recognition technology in public spaces, not only by law enforcement agencies but also increasingly by the private sector,” said Information Commissioner Elizabeth Denham in a statement on Thursday.
“We have launched an investigation following concerns reported in the media regarding the use of live facial recognition in the King’s Cross area of central London, which thousands of people pass through every day.”
The commissioner added her watchdog will look into whether the AI systems in use at King’s Cross are on the right side of Blighty’s data protection rules, and whether the law as a whole has kept up with the pace of change in surveillance technology. She highlighted that “scanning” people’s faces as they go about their daily business is a “potential threat to privacy that should concern us all. That is especially the case if it is done without people’s knowledge or understanding.”
Earlier this week, technology lawyer Neil Brown of decoded.legal told us that businesses must have a legal basis under GDPR to deploy the cameras as it involves the processing of personal data. And given the nature of encoding biometric data – someone’s face – a business must also have satisfied additional conditions for processing special category data.
“Put simply, any organisations wanting to use facial recognition technology must comply with the law – and they must do so in a fair, transparent and accountable way,” Denham continued.
“They must have documented how and why they believe their use of the technology is legal, proportionate and justified. We support keeping people safe but new technologies and new uses of sensitive personal data must always be balanced against people’s legal rights.”
This comes after London Mayor Sadiq Khan demanded more information on the use of the camera systems, and rights warriors at Liberty branded the deployment “a disturbing expansion of mass surveillance.”
Argent, the developer that installed the CCTV cameras, admitted it uses the tech, and insisted is there to “ensure public safety.” It is not exactly clear how or why the consortium is using facial recognition, though.
A Parliamentary body, the Science and Technology Select Committee, urged in mid-July for a “moratorium on the current use of facial recognition” tech, and “no further trials” until there is a legal framework in place. And privacy campaign groups have tried to disrupt police trials. The effectiveness of these tests have also proved dubious.
Early last month, researchers from the Human Rights, Big Data & Technology Project at the University of Essex Human Right Centre, found that using the creepy cams are likely illegal, and the success rates are highly dubious.
Use of the technology in the US has also been contentious, and on Wednesday this week, the American Civil Liberties Union said tests showed Amazon’s Recognition systems incorrectly matched one in five California politicians with images of 25,000 criminals held in a database. ®
Greetings from Brussels!
There have been multiple examples of “credential stuffing” referenced in the media lately. For those unfamiliar with the term, credential stuffing is the automated injection of breached or dumped username/password pairings in order to fraudulently gain access to user accounts across different web services.
Transport for London recently experienced such an attack that resulted in its online Oyster smartcard system being pulled offline for two days; this meant that customers were unable to access their accounts either to check their balance or credit their travel cards. TfL initially cited “performance-based issues” before clarifying that their system has been compromised.
According to TfL, a small number of accounts were accessed â€œafter their login credentials were compromised when using non-TfL websites.â€� In other words, the customer intrusions were the result of those individuals who used email address and password combinations for their Oyster accounts that were used for one or more hacked websites. It emerged later that only 1,200 customers were affected by the breach. TfL has stated that no customer payment details were accessed and that ultimately their network was not compromised. However, 6 million online users were also affected by the disruption to the online service. Not a small issue â€” customer care must have been inundated with commuter complaints. TfL said they were in touch with the National Cyber Security Centre, as well as the ICO.
The Daily Dashboard recently reported on a separate credential stuffing incident at U.S.-based State Farm Insurance. This is perhaps more alarming as it involves an insurance company. Here, too, no actual fraudulent activity was detected in those accounts that were affected.
This hacking technique is on the rise, and the extent of “credential extraction” that has happened over recent years through any number of breaches means that digital services are increasingly at risk from the credential stuffing phenomenon: arguably a relatively simplistic way to compromise websites and user accounts. For such a simple technique, credential stuffing is frustratingly difficult to combat. Moreover, the key culprit or willing accessory â€” if you like â€” in this stuffing debacle is really “ourselves”; it is simply human weakness to use repeat logins and passwords for multiple accounts. The best way to protect against credential stuffing attacks is to use unique passwords for each of your digital accounts. Often, this is more efficient with a password manager system and two-factor authentication when available. I also refer to my notes back in June when I discussed the newly expected privacy features of â€œSign in with Appleâ€� under iOS13, which will include randomly generated email address IDs for the purpose of logins.
The dangers of credential dumping are not only restricted to obtaining access to your online accounts along with personal and or confidential financial data. The end result can also potentially lead to unbridled access to your devices and possibly shared networks for other nefarious purposes and transactions with far-reaching consequences. And while companies are constantly improving their detection and blocking of credential stuffing attempts, there is no foolproof way of defending against such attacks. The onus is largely on the user, to think smart and own their privacy controls.
The U.K. Information Commissionerâ€™s Office released resources to help town and parish councils comply with the EU General Data Protection Regulation. ICO Senior Policy Officer Stacey Egerton wrote in a blog post the agency has worked with 50 local councils to help better understand the challenges they face as they work to ensure GDPR compliance. In response to their collaborations, the ICO released a fact sheet for local councils on the use of personal devices, a resource pack on data audits and six steps for local councils to shore up their data-sharing practices.
The Irish Data Protection Commission has asked Facebook about how it had paid contractors to transcribe usersâ€™ audio, Politico reports. The tech company said it was allowed to perform the activity after consumers opted in to the practice. “We are now seeking detailed information from Facebook on the processing in question and how Facebook believes that such processing of data is compliant with their GDPR obligations,” the agency said in a statement. In response to the Facebook report, U.S. lawmakers have renewed their calls for new privacy legislation. Meanwhile, Microsoft updated its privacy notice to state that staff members and contractors may listen to recordings gathered by Cortana devices and Skype Translator products.
Microsoft has identified and patched several vulnerabilities in the Windows Remote Desktop Services (RDS) component — formerly known as Terminal Services — which is widely used in corporate environments to remotely manage Windows machines. Some of the vulnerabilities can be exploited without authentication to achieve remote code execution and full system compromise, making them highly dangerous for enterprise networks if left unfixed.
All the flaws have been discovered internally by Microsoft during hardening of the RDS component, so no public exploits are available at this time. However, Microsoft researcher Justin Campbell said on Twitter that his team “successfully built a full exploit chain using some of these, so it’s likely someone else will as well.”
The Irish Data Protection Commission released guidance to help data controllers better understand the data breach notification requirements found within the EU General Data Protection Regulation. The guide covers notifications controllers need to send to both the data subjects affected by a breach and to the agency itself. The agency offers a definition of what constitutes a breach and when a controller has to notify the agency. The document also contains a link to the DPCâ€™s breach notification page.
The Berlin Commissioner for Data Protection and Freedom of Information plans to issue a large fine to a company for violations of the EU General Data Protection Regulation, t-online.de reports. The agencyâ€™s spokeswoman, Dalia Kues, said the organization could be fined tens of millions of euros for the infraction; however, the agency cannot release the name of the company for legal reasons. Kues also confirmed Berlin Data Protection Officer Maja Smoltczyk issued two GDPR fines totaling 200,000 euros against another unnamed organization. The second company has the opportunity to appeal the penalties. (Original article is in German.)
Sometimes it seems like all authenticators are compromised. Passwords, identity documents and even knowledge-based authentication â€” a plethora of these and other authenticators are readily available on the web or the dark web.
The terrible beauty of the California Consumer Privacy Act is that innumerable companies will soon be required to undertake totally novel consumer-facing responsibilities. In the name of empowering consumers, the law is actually introducing threat vectors that can be manipulated by fraudsters. This presents a considerable risk to organizations by enabling a data breach while ostensibly trying to comply with the law and support a consumerâ€™s data access request.Â Â Â Â Â Â Â Â Â Â Â Â Â
There is a simple and innocuous-sounding CCPA requirement stating that requests for access and deletion must be â€œverified.â€� However, the law does not clarify what qualifies as verified.
So, similar to what weâ€™ve seen for the EU General Data Protection Regulation, companies will be taking on a range of low-tech solutions to satisfy the verification requirement. Current concerns center on responding to requestors invoking their privacy rights â€” without a serious contemplation of what it is to respond to a right to access or delete to an imposter. Those who have been in this space for a while will recall the 2004 ChoicePoint breach in which the data aggregation company inadequately screened its customers such that identity thieves were able to set up fake businesses as a way to buy personal information.
The terrible beauty of the California Consumer Privacy Act is that innumerable companies will soon be required to undertake totally novel consumer-facing responsibilities.
The GDPR contains a similar requirement and so presents a learning opportunity for those seeking compliance with the CCPA. Researchers James Pavur and Casey Knerr recently found just how easy it is to take advantage of the European right to a data subject access request. They conducted an exploratory social engineering experiment (“GDPArrrrr: Using Privacy Laws to Steal Identities“), finding that some companies acted upon receipt of the most basic, easily forged or obtainable documents â€” even a postmarked envelope or heavily redacted photocopy of a bank statement.
The current compliance responses are manual, naive and easily manipulable. Pavur and Knerr believe the fact that online identity verification is fraught with fraud.
As privacy pros, we know that mounds of personal information, including document images themselves, have been compromised by hackers and are available in dark web marketplaces. This is exactly why regulated industries, such as financial services, spend billions on robust identity verification tools and services. Where identity theft has traditionally been monetized into fraudulent accounts, target companies have heightened awareness of how identity data can be manipulated. They have invested in anti-fraud, know-your-customer and anti-money-laundering compliance programs, so why would an e-commerce or retail outfit naively think that they can sufficiently identify CCPA requestors? Pavur and Knerr saw this in a sectoral pattern of responses to their experiment, with larger and/or regulated entities being firmer with their identity verification requirements.
Now we hear echoes of ChoicePoint because fulfilling a fraudulent request for access equates to giving a third-party unauthorized access to personal information. A consumer file that is released to the wrong party can be misused for tax, insurance and other financial frauds: spear phishing, stalking and other crimes. It could even be the basis for the CCPA right of private action.Â
Fraudulent requests for deletion can affect the value of a companyâ€™s data holdings and its analytics operations. As presciently noted by Pavur and Knerr, â€œWe would expect future work which replicates a more sophisticated attacker â€” either via technical means (such as email account hijacking or spoofing) or via physical means (such as passport forgery) â€” would likely have substantially higher success rates than this baseline threat model.”
The mandated right to access can be a boon to consumers, but manipulated on a broad scale, it can also be a boon to malfeasance.
Imagine getting consumer files across every retail and services sector out there by flooding companies with massive automated requests for access to personal information. Just like spam, there will be companies that respond if the requests have the veneer of legitimacy. It is a new door for improper data access â€” not a back door, but an actual, legit front door â€” for fraudsters to obtain all manner of valuable personal information.
The emphasis now is to bend over backward to help consumers to invoke their new rights, but if this is not done well, consumers will ultimately be hurt by fraudsters tampering with their data using the consumer request mechanism.
Companies need to be vigilant as they set up their consumer response processes. This â€œverified consumerâ€� part is no small thing. It requires a robust commitment to accurately sourcing your verification data, skill in identifying dubious requests, and some healthy skepticism wouldnâ€™t hurt. The emphasis now is to bend over backward to help consumers to invoke their new rights, but if this is not done well, consumers will ultimately be hurt by fraudsters tampering with their data using the consumer request mechanism.
Itâ€™s ironic that this next-gen data breach could arise out of well-meaning efforts to comply with a new privacy law. But thatâ€™s the kind of big data world we live in. A gap in expertise of this breadth â€” fraudsters will find a way to take advantage of this gap. With awareness and commitment, organizations will be able to dedicate resources to address such requests properly. Concurrently, perhaps this will be a topic of guidance from the California attorney generalâ€™s office.
Until then, as a sage friend put it, â€œWhy do more work (to acquire PII) when you can just ask?â€�
Fortune reports British Airways had a previously undiscovered vulnerability in its data security that potentially exposed customers’ private information. Cybersecurity firm Wandera found that the airline’s check-in links, which include passenger details, such as last names and confirmation numbers in the URLs, are unencrypted. “We take the security of our customersâ€™ data very seriously. Like other airlines, we are aware of this potential issue and are taking action to ensure our customers remain securely protected,” a British Airways representative said. Also in the U.K., facial-recognition images, fingerprint scans and other personal information belonging to 1 million people were found on an unsecured public database used by police and banks. Meanwhile in the U.S., fast-food chain Sonic got final approval for a $4.3 million settlement related to a 2017 credit breach.
You’re all set for your long summer vacation. Suddenly a text arrives. It’s the CEO. ‘Data strategy by Friday plz’
Fret not. Here’s a gentle guide to drawing up a plan to take the pain away from your info management
Go back 15 years and big data as a concept was only just beginning.
We were still shivering in an AI winter because people hadn’t yet figured out how to funnel large piles of data through fast graphics processors to train machine-learning software. Applications that gulped down data by the terabyte were thin on the ground. Today, these gargantuan projects are driving the reuse of data within organizations, enabling staff to access a central pool of information from multiple systems. The aim of the game is to give your employees a competitive advantage over their rivals.
Capturing, managing, and sharing data brings challenges involving everything from privacy and security to quality control, ownership, and storage. To surmount those challenges, you need what some folks call a “data strategy.” And you’re probably wondering, rightly, what on Earth does that actually mean?
First, why do you need one?
A strategy at the top coordinates things at the bottom. Without a top-level data strategy, you may end up with multiple projects overlapping each other, creating data sets that are too similar or otherwise in conflict with each other. A data strategy should define a single source of truth in your organization, thus avoiding any inconsistencies and duplication. This approach also reduces your storage needs by making people draw from the same data pool and reuse the same information.
It also helps with compliance. Take, for example, GDPR and similar privacy regulations that give people the right to request and delete data you hold on them. How will your organization painlessly and reliably find these requested records if your piles of data are not properly classified and managed centrally somehow? Your data strategy should set out how you classify and organize your internal information.
A data strategy can also lower the response times for requests for data. If you know what kinds of data you have, where it is all stored and how it is all organized, and that it is clean and and easily accessible, as demanded by your strategy, you can answer queries fast. This is something McKinsey reckons can help reduce your IT costs and investments by 30 to 40 per cent. Finally, a data strategy should define how you determine and maintain the provenance of your enterprise data so that you always know exactly where it came from.
Inside a data strategy
Back to the crucial question at hand: what exactly is a data strategy? According to SAS [PDF], it should cover five components: identification, storage, governance, provisioning, and process. We’ll summarize them here:
The strategy should insist data is consistently classified and labeled, and define how to do exactly that, so that smoothly and easily sharing information across systems and teams is possible. This must include a standard definition for metadata, such the type of each information record or object, its sensitivity, and who owns it.
This is something you should implement as soon as possible: for example, slipshod front-office data entry today will wreck the quality of your back-end databases tomorrow. Applications should be chosen and configured so that users enter information and metadata consistently, clearly, and accurately, and only enter relevant information.
This data needs to live somewhere, and your strategy should document where information is held and how it is backed up. Whether it is structured or unstructured, the data should at least be stored in its original raw format to avoid any loss in quality. You could even consider placing it all in what is now called a data lake. To give you an idea of what we mean by this, Goldman Sachs, in its attempt to become the Google of Wall Street, shunted more than 13PB of material into a data lake built on top of Hadoop. This deep pool holds all sorts of material, from research notes to emails and phone call recordings, and serves a variety of folks including analysts, salespeople, equities traders, and sales bots. The investment bank also uses these extensive archives for machine learning.
Data lakes are often built separately to core IT systems, and each one maintains a centralized index for the data it holds. They can coexist with enterprise data warehouses (EDWs), acting as test-and-learn environments for large cross-enterprise projects while EDWs handle intensive and important transactional tasks.
Physically storing and backing up this data in a reliable and resilient manner is a challenge. As the amount of information that can be hoarded increases, factors like scale-out, distributed storage, and the provisioning of sufficient compute capacity to process it all must feature in your data strategy. You must also specify the usual table stakes of redundancy, fail overs, and so on, to keep systems operational in the event of disaster.
For on-premises storage, you can use distributed nodes in the form of just-a-bunch-of-disks (JBOD) boxes sitting close to your compute nodes. This is something Hadoop uses a lot, and it’s good for massively parallel data processing. The downside is that JBOD gear typical needs third-party management software to be of any real use. An alternative approach is to use scale-out network-attached storage (NAS) boxes that include their own management functionality and provide things like tiered storage. On that subject, consider all-flash storage or even in-memory setups involving SAP HANA for high-performance data-crunching.
That’s fine, you say, but what about servers with direct-attached storage containing legacy application data? You don’t necessarily need to ditch all that older kit and bytes right away. With data repositories now so large, it’s difficult to store them in one place, and with data living in so many areas of an organization, you may find that data virtualization is an option. Data virtualization can be used to create a software layer in front of these legacy data sources, allowing new systems to interface with the old.
You don’t need to do it all on-premises, of course. Cloud-based data lakes are a thing, though you may need to compose them from various services. For example, Amazon uses CloudFormation templates to bolt together separate services such as Lambda, S3, DynamoDB, and ElasticSearch to create a cloud-based data lake. Goldman Sachs uses a combination of AWS and Google Cloud Platform as part of its data strategy.
Finally, you need to include forward plans for your organization: you may not need, say, real-time analytics right now, though you may need to add this capability and others like it in future. Give yourself room in your data strategy to expand and adapt, using gap analysis to guide your decisions.
With all sorts of information flowing in, you don’t want to end up with a cesspool of unmanaged and unclean records, owned and accounted for by no one. Experts call this a data swamp, and it has little value. Avoid this with the next part of your data strategy: governance.
Governance is a huge topic and with so many elements to it, the Data Management Body of Knowledge [PDF], aka the DAMA DMBOK, is a good place to start when drafting your strategy. McKinsey, meanwhile, advises making people accountable for data by grouping information into domains – such as demographics, regulatory data, and so on – and put someone in charge of each area. Then, have them all sit under a central governance unit that dictates things such as policies, tools, processes, security, and compliance.
Your data strategy should not focus on just importing, organizing, storing, and retrieving information. You must document how you will process your pool of raw data into something useful for customers and staff: how will your precious information be transformed, presented, combined and assembled, or whatever else is needed to turn it into a product. The aim here is to plan an approach that does not involve any overlapping efforts or duplicated code, applications, or processes. Just as you strive to reuse data across your organization without duplication, you should develop a strategy that ensures the same applies to your processes: well-defined pipelines or assembly lines that turn raw data into polished output.
Finally, you need to think about getting the data where it is needed. This involves not just defining sets of policies and processes on how data will be used, but also – potentially – changes to your IT infrastructure to accommodate them. Goldman Sachs, for example, published a set of APIs that allowed customers and partners to see and access data in addition to internal users.
Writing it all up
It’s one thing to have aspirations about strategy. Now you have to write it down and stick to it. Don’t be afraid to keep it simple. Be realistic. Make it crystal clear so that staff reading it know exactly what needs to be done. Grandiose and poorly defined initiatives are costly and difficult to implement. With vague goals and big price tags, you can quickly run into trouble with budgets and deadlines.
Break your data strategy and accompanying infrastructure changes into discrete goals, such as providing new reporting services, reaching a particular level of data quality, and setting up pilot projects. Identify the domains of data you intend to create, and develop a roll-out plan for each domain. Dedicate individual teams to each domain: they own it, they clean and maintain it, they govern it, and they provide it.
Your data lake doesn’t have to start off as some vast ocean. It can grow in size and functionality over time, starting off as a resource of raw data that data scientists can experiment with. Later, as it matures, you can integrate it with EDWs and perhaps even use it to replace some operational data stores. There’s nothing wrong with a medium-sized data pond to start with. Goldman Sachs’ data lake contained just 5PB back in 2013.
An effective data strategy will mean tweaking your organization, your governance process, and your data gathering and management processes. More than that, though, it will mean taking a long, hard look at your infrastructure. It isn’t feasible for most companies to wipe their entire IT portfolio clean and start again, though you can modernize parts of it as your strategy evolves. ®
If a software vulnerability can be detected and remedied, then a
potential intrusion is prevented. While not all software
vulnerabilities are known, 86
percent of vulnerabilities leading to a data breach were
patchable, though there is some
risk of inadvertent damage when applying software patches. When new
vulnerabilities are identified they are published in the Common
Vulnerabilities and Exposures (CVE) dictionary by vulnerability
databases, such as the National Vulnerability Database (NVD).
The Common Vulnerabilities Scoring System (CVSS) provides a metric
for prioritization that is meant to capture the potential severity of
a vulnerability. However, it has been criticized
for a lack of timeliness, vulnerable population representation,
normalization, rescoring and broader expert consensus that can lead to
For example, some of the
worst exploits have been assigned low CVSS scores. Additionally,
CVSS does not measure the vulnerable population size, which many
practitioners have stated they expect it to score. The design of the
current CVSS system leads to too
many severe vulnerabilities, which causes user fatigue.
To provide a more timely and broad approach, we use machine learning
to analyze users’ opinions about the severity of vulnerabilities by
examining relevant tweets. The model predicts whether users believe a
vulnerability is likely to affect a large number of people, or if the
vulnerability is less dangerous and unlikely to be exploited. The
predictions from our model are then used to score vulnerabilities
faster than traditional approaches, like CVSS, while providing a
different method for measuring severity, which better reflects
Our work uses nowcasting to address this important gap
of prioritizing early-stage CVEs to know if they are urgent or not.
Nowcasting is the economic discipline of determining a trend or a
trend reversal objectively in real time. In this case, we are
recognizing the value of linking social media responses to the release
of a CVE after it is released, but before it is scored by CVSS. Scores
of CVEs should ideally be available as soon as possible after the CVE
is released, while the current process often hampers prioritization of
triage events and ultimately slows response to severe vulnerabilities.
This crowdsourced approach reflects numerous practitioner observations
about the size and widespread nature of the vulnerable population, as
shown in Figure 1. For example, in the Mirai
botnet incident in 2017 a massive number of vulnerable IoT
devices were compromised leading to the largest Denial of Service
(DoS) attack on the internet at the time.
Figure 1: Tweet showing social commentary
on a vulnerability that reflects severity
Figure 2 illustrates the overall process that starts with analyzing
the content of a tweet and concludes with two forecasting evaluations.
First, we run Named Entity Recognition (NER) on tweet contents to
extract named entities. Second, we use two classifiers to test the
relevancy and severity towards the pre-identified entities. Finally,
we match the relevant and severe tweets to the corresponding CVE.
Figure 2: Process overview of the steps
in our CVE score forecasting
Each tweet is associated to CVEs by inspecting URLs or the contents
hosted at a URL. Specifically, we link a CVE to a tweet if it contains
a CVE number in the message body, or if the URL content contains a
CVE. Each tweet must be associated with a single CVE and must be
classified as relevant to security-related topics to be scored. The
first forecasting task considers how well our model can predict the
CVSS rankings ahead of time. The second task is predicting future
exploitation of the vulnerability for a CVE based on Symantec
Antivirus Signatures and Exploit DB. The rationale is that eventual
presence in these lists indicates not just that exploits can exist or
that they do exist, but that they also are publicly available.
Predicting the CVSS scores and exploitability from Twitter data
involves multiple steps. First, we need to find appropriate
representations (or features) for our natural language to be processed
by machine learning models. In this work, we use two natural language
processing methods in natural language processing for extracting
features from text: (1) N-grams features, and (2) Word embeddings.
Second, we use these features to predict if the tweet is relevant to
the cyber security field using a classification model. Third, we use
these features to predict if the relevant tweets are making strong
statements indicative of severity. Finally, we match the severe and
relevant tweets up to the corresponding CVE.
N-grams are word sequences, such as word pairs for 2-gram or word
triples for 3-grams. In other words, they are contiguous sequence of n
words from a text. After we extract these n-grams, we can represent
original text as a bag-of-ngrams. Consider the sentence:
A criticial vulnerability was found in Linux.
If we consider all 2-gram features, then the bag-of-ngrams
representation contains “A critical”, “critical vulnerability”, etc.
Word embeddings are a way to learn the meaning of a word by how it
was used in previous contexts, and then represent that meaning in a
vector space. Word embeddings know the meaning of a word by the
company it keeps, more formally known as the distribution
word embedding representations are machine friendly, and similar
words are often assigned similar representations. Word embeddings are
domain specific. In our work, we additionally train terminology
specific to cyber security topics, such as related words to
threats are defenses, cyberrisk,
cybersecurity, threat, and iot-based. The
embedding would allow a classifier to implicitly combine the knowledge
of similar words and the meaning of how concepts differ. Conceptually,
word embeddings may help a classifier use these embeddings to
implicitly associate relationships such as:
device + infected = zombie
where an entity called device has a mechanism applied called
infected (malicious software infecting it) then it becomes a zombie.
To address issues where social media tweets differ linguistically
from natural language, we leverage previous research and
software from the Natural Language Processing (NLP) community.
This addresses specific nuances like less consistent capitalization,
and stemming to account for a variety of special characters like ‘@’
Figure 3: Tweet demonstrating value of
identifying named entities in tweets in order to gauge severity
Named Entity Recognition (NER) identifies the words that construct
nouns based on their context within a sentence, and benefits from our
embeddings incorporating cyber security words. Correctly identifying
the nouns using NER is important to how we parse a sentence. In Figure
3, for instance, NER facilitates Windows 10 to be understood as
an entity while October 2018 is treated as elements of a date.
Without this ability, the text in Figure 3 may be confused with the
physical notion of windows in a building.
Once NER tokens are identified, they are used to test if a
vulnerability affects them. In the Windows 10 example,
Windows 10 is the entity and the classifier will predict
whether the user believes there is a serious vulnerability affecting
Windows 10. One prediction is made per entity, even if a
tweet contains multiple entities. Filtering tweets that do not contain
named entities reduces tweets to only those relevant to expressing
observations on a software vulnerability.
From these normalized tweets, we can gain insight into how strongly
users are emphasizing the importance of the vulnerability by observing
their choice of words. The choice of adjective is instrumental in the
classifier capturing the strong opinions. Twitter users often use
strong adjectives and superlatives to convey magnitude in a tweet or
when stressing the importance of something related to a vulnerability
like in Figure 4. This magnitude often indicates to the model when a
vulnerability’s exploitation is widespread. Table 1 shows our analysis
of important adjectives that tend to indicate a more severe vulnerability.
Figure 4: Tweet showing strong adjective use
Table 1: Log-odds ratios for words
correlated with highly-severe CVEs
Finally, the processed features are evaluated with two different
classifiers to output scores to predict relevancy and severity. When a
named entity is identified all words comprising it are replaced with a
single token to prevent the model from biasing toward that entity. The
first model uses an n-gram approach where sequences of two, three, and
four tokens are input into a logistic regression model. The second
approach uses a one-dimensional Convolutional Neural Network (CNN),
comprised of an embedding layer, a dropout layer then a fully
connected layer, to extract features from the tweets.
To evaluate the performance of our approach, we curated a dataset of
6,000 tweets containing the keywords vulnerability or
ddos from Dec 2017 to July 2018. Workers on Amazon’s Mechanical
Turk platform were asked to judge whether a user believed a
vulnerability they were discussing was severe. For all labeling,
multiple users must independently agree on a label, and multiple
statistical and expert-oriented techniques are used to eliminate
spurious annotations. Five annotators were used for the labels in the
relevancy classifier and ten annotators were used for the severity
annotation task. Heuristics were used to remove unserious respondents;
for example, when users did not agree with other annotators for a
majority of the tweets. A subset of tweets were expert-annotated and
used to measure the quality of the remaining annotations.
Using the features extracted from tweet contents, including word
embeddings and n-grams, we built a model using the annotated data from
Amazon Mechanical Turk as labels. First, our model learns if tweets
are relevant to a security threat using the annotated data as ground
truth. This would remove a statement like “here is how you can
#exploit tax loopholes” from being confused with a cyber
security-related discussion about a user exploiting a software
vulnerability as a malicious tool. Second, a forecasting model scores
the vulnerability based on whether annotators perceived the threat to
CVSS Forecasting Results
Both the relevancy classifier and the severity classifier were
applied to various datasets. Data was collected from December 2017 to
July 2018. Most notably 1,000 tweets were held-out from the original
6,000 to be used for the relevancy classifier and 466 tweets were
held-out for the severity classifier. To measure the performance, we
use the Area
Under the precision-recall Curve (AUC), which is a correctness
score that summarizes the tradeoffs of minimizing the two types of
errors (false positive vs false negative), with scores near 1
indicating better performance.
- The relevancy classifier
- The severity classifier using the CNN scored
- The severity classifier using a Logistic Regression
model, without embeddings, scored 0.54
Next, we evaluate how well this approach can be used to forecast
CVSS ratings. In this evaluation, all tweets must occur a minimum of
five days ahead of CVSS scores. The severity forecast score for a CVE
is defined as the maximum severity score among the tweets which are
relevant and associated with the CVE. Table 1 shows the results of
three models: randomly guessing the severity, modeling based on the
volume of tweets covering a CVE, and the ML-based approach described
earlier in the post. The scoring metric in Table 2 is precision at top
K using our logistic regression model. For example, where K=100, this
is a way for us to identify what percent of the 100 most severe
vulnerabilities were correctly predicted. The random model would
predicted 59, while our model predicted 78 of the top 100 and all ten
of the most severe vulnerabilities.
Table 2: Comparison of random simulated
predictions, a model based just on quantitative features like
“likes”, and the results of our model
Exploit Forecasting Results
We also measured the practical ability of our model to identify the
exploitability of a CVE in the wild, since this is one of the
motivating factors for tracking. To do this, we collected severe
vulnerabilities that have known exploits by their presence in the
following data sources:
- Symantec Antivirus
- Symantec Intrusion Prevention System
- ExploitDB catalog
The dataset for exploit forecasting was comprised of 377,468 tweets
gathered from January 2016 to November 2017. Of the 1,409 CVEs used in
our forecasting evaluation, 134 publicly weaponized vulnerabilities
were found across all three data sources.
Using CVEs from the aforementioned sources as ground truth, we find
our CVE classification model is more predictive of detecting
operationalized exploits from the vulnerabilities than CVSS.
Table 3 shows precision scores illustrating seven of the top ten most
severe CVEs and 21 of the top 100 vulnerabilities were found to have
been exploited in the wild. Compare that to one of the top ten and 16
of the top 100 from using the CVSS score itself. The recall scores
show the percentage of our 134 weaponized vulnerabilities found in our
K examples. In our top ten vulnerabilities, seven were found to be in
the 134 (5.2%), while the CVSS scoring’s top ten included only one
(0.7%) CVE being exploited.
Table 3: Precision and recall scores for
the top 10, 50 and 100 vulnerabilities when comparing CVSS scoring,
our simplistic volume model and our NLP model
Preventing vulnerabilities is critical to an organization’s
information security posture, as it effectively mitigates some cyber
security breaches. In our work, we found that social media content
that pre-dates CVE scoring releases can be effectively used by machine
learning models to forecast vulnerability scores and prioritize
vulnerabilities days before they are made available. Our approach
incorporates a novel social sentiment component, which CVE scores do
not, and it allows scores to better predict real-world exploitation of
vulnerabilities. Finally, our approach allows for a more practical
prioritization of software vulnerabilities effectively indicating the
few that are likely to be weaponized by attackers. NIST has acknowledged
that the current CVSS methodology is insufficient. The current process
of scoring CVSS is expected to be replaced by ML-based solutions by
October 2019, with limited human involvement. However, there is no
indication of utilizing a social component in the scoring effort.
This work was led by researchers at Ohio State under the IARPA
CAUSE program, with support from Leidos and FireEye. This work was
originally presented at NAACL in
June 2019, our
paper describes this work in more detail and was also covered by
The Centre for Information Policy Leadership submitted a white paper to the European Commission as the branch continues its work to update standard contractual clauses for international transfers, according to a post from Hunton Andrews Kurth’s Privacy & Information Security Law Blog. The white paper focuses on the challenges organizations face when they use existing SCCs and how these issues can be overcome by updating them to align with the EU General Data Protection Regulation. The CIPLâ€™s recommendations for updating SCCs include ensuring they are adapted to fit multiparty and multiprocessing situations and that the broad territorial scope of the GDPR should be considered when the SCCs are overhauled.
TechCrunch reports Preclusio has developed a “machine learning-fueled solution” to help companies adhere to EU General Data Protection Regulation and California Consumer Privacy Act regulations. Businesses can use the platform on premises or in the cloud depending on their preference. The solution uses read-only access data to “identify sensitive data in an automated fashion using machine learning.” The privacy team reviews the findings and makes adjustments to ensure the company is compliant with the regulations.Â
A researcher abused the GDPR to get information on his fiancee:
It is one of the first tests of its kind to exploit the EU’s General Data Protection Regulation (GDPR), which came into force in May 2018. The law shortened the time organisations had to respond to data requests, added new types of information they have to provide, and increased the potential penalty for non-compliance.
“Generally if it was an extremely large company — especially tech ones — they tended to do really well,” he told the BBC.
“Small companies tended to ignore me.
“But the kind of mid-sized businesses that knew about GDPR, but maybe didn’t have much of a specialised process [to handle requests], failed.”
He declined to identify the organisations that had mishandled the requests, but said they had included:
- a UK hotel chain that shared a complete record of his partner’s overnight stays
- two UK rail companies that provided records of all the journeys she had taken with them over several years
- a US-based educational company that handed over her high school grades, mother’s maiden name and the results of a criminal background check survey.
Politicians appeal to hackers to take up the fight
DEF CON Despite some progress, the US is still massively underprepared for a serious cyber attack and the current administration isn’t helping matters, according to politicians visiting the DEF CON hacking conference.
In an opening keynote, representatives Ted Lieu (D-CA) and James Langevin (D-IL) were joined by hackers Cris Thomas, aka Space Rogue, and Jen Ellis (Infosecjen) to discuss the current state of play in government preparedness.
“No, we are not prepared,” said Lieu, one of only four trained computer scientists in Congress. “When a crisis hits, it’s too late for Congress to act. We are very weak on a federal level, nearly 20 years after Space Rogue warned us we’re still there.”
Thomas testified before Congress 20 years ago about the dangers that the internet could pose if proper steps weren’t taken. At today’s conference he said there was much still to be done but that he was cautiously optimistic for the future, as long as hackers put aside their issues with legislators and worked with them.
“As hackers we want things done now,” he said. “But Congress doesn’t work that way; it doesn’t work at the ‘speed of hack’. If you’re going to engage with it, you need to recognise this is an incremental journey and try not to be so absolutist.”
Three no Trump
He pointed out that the current administration was actually moving backwards, having placed less of a priority on IT security than past administrations. The session’s moderator, former Representative for California Jane Harman, was more blunt, saying that US president Donald Trump had fired his homeland security advisor, Tom Bossert, one of the most respected men in cybersecurity (Bossert actually resigned), and abolished his position.
Representative Langevin noted that the situation was improving. The US had been totally unprepared for Russian interference in 2016, he said, but the situation had improved by the 2018 elections and the intelligence agencies were ready for the 2020 election cycle.
“[Former US president Barack] Obama laid out a framework for a national incident response team,” he said. “That policy is in place, but as to whether it can be executed then we have to hope for the best, but we need to practice it, that’s the key thing.”
Langevin, a repeat visitor to DEF CON, appealed to the assembled security workers to get involved in helping to educate politicians and make them understand technical issues. It is a problem also close to Ellis’s heart.
You can easily secure America’s e-voting systems tomorrow. Use paper – Bruce Schneier
Ellis, a Brit by birth, came to the US, identified the committees dealing with cybersecurity and started offering advisory services. She found that politicians were willing to listen.
“When I did this, people asked you in to talk,” she said. “They were crying out for people who could talk about cybersecurity. There is interest. It’s hard… but do your research.”
It’s not enough to sit on the sideline and moan, she told the crowd. Instead it’s time for the community to get out there and make a difference.
Lieu also said he was hopeful that hackers would take up the torch and warned attendees not to give up, because change could come in surprising ways.
“In politics everything seems impossible until it happens,” he joked. “10 years ago if you’d told me people in some states would be smoking legal weed I’d never thought it would happen. And yet here we are.” ®