Big data abounds. No precise definition of “big data” exists, but a good rule of thumb is data sets too large to fit in main memory on a single machine. While the buzzword may be overused, the trend is real. Cheap memory, fast Internet connections, and obsessively used, sensor-laden smart-phones have combined to generate massive datasets as well as the means to transmit and store them.
While companies could amass datasets about anything measurable, generally the most coveted datasets in Silicon Valley contain individuals’ personal information. The economic significance of such data is obvious. If a company can predict a user’s purchasing decisions, it can advertise optimally. Google and Facebook rely upon well-chosen ads to monetize their otherwise free web services.
The potential value of mining human-generated data goes beyond advertising. the collective health data generated by a large population may contain insights which could bring about better health outcomes for everyone. Medical institutions are eager to mine patient records for longitudinal observations in the hope of generating the knowledge necessary for personalized medicine.
What About Privacy?
While the potential benefits of large-scale data mining are obvious, so too are the pitfalls. As user information is collected, mined, and sometimes published for profit, concerns about privacy have grown. Recently, the European Union issued a “Right to be Forgotten” ruling, reflecting the desires of many individuals to restrict the use of their data. Other recent well-publicized cases contested the unauthorized inclusion of user data in advertisements. Several well-documented privacy mishaps, including the re-identification users from the Netflix challenge dataset, have attracted national attention. In another famous case, the private medical records of Governor William Weld of Massachusetts were identified in supposedly anonymous records released by the Group Insurance Commission.
Differential privacy offers one way forward, allowing data scientists to extract insights from a database while guaranteeing that no individual can be identified. The definition is actually more general. It guarantees that the answer one gets from any query on a database is not perceptibly different if any one individual is excluded from the database. Concretely, one could guarantee an individual that no additional harm would come to them should they choose to participate in a study. Of course, it’s possible that even if they don’t participate in the study they may feel that some harm has come to them. For example a study may reveal some unfavorable health characteristic of a specific subpopulation. Differential privacy ensures the privacy of people but not of populations.
Differentially private mechanisms typically accomplish this guarantee by adding noise to any answer returned by the database. This may seem counterintuitive. We typically mine data in the hope of seeing through noise. Why would we deliberately add it? The hope with differential privacy is that the amount of noise added should be large enough to conceal the effects of individuals, but small enough that it does not seriously impact the usefulness of the answer.
The magnitude of noise added depends upon the maximum possible effect that one individual’s inclusion might have have on the true answer. Intuitively, population-wide statistical estimators for common traits, e.g. the average height in a population, would typically require small amounts of noise, while the values of very rare features might need to be distorted more substantially to obscure the contribution of an individual.
Limits of Differential Privacy
Differential privacy addresses a very specific notion of privacy. It is suited to the situation in which one is deciding whether to allow their data to be included in database, say for a research study. If all mechanisms which access the data are proven to be differentially private, since any individual’s data does not perceptibly affect the study, a guarantee of differential privacy seems a compelling argument for participation.
However, this is not the only notion of privacy. An individual might consider any piece of information that is not obviously observable to be private. For example, one could prefer to keep their sexual orientation private. As another example, one could reasonably want to keep secret any family predisposition for certain medical conditions. In a world in which these things are unknowable, that privacy is attainable.
But as machine learning advances, and tools for prediction become increasingly powerful, algorithms could conceivably make high confidence predictions, even without spying directly on your private information. Imagine a system which could infer your sexual orientation from subtle clues in your speaking voice and video of you walking. If such a system reached 99.9% accuracy, no individual could any longer claim this information as private.
The interplay between knowledge discovery and privacy is complicated. Differential privacy offers a powerful theoretical framework for considering concrete ways in which the two may be reconcilable. But many open questions remain regarding the future of privacy in a world awash with large-scale data mining, even if all data access were differentially private.
The Concept of Privacy in the Digital Age
The people, who drew the intricate and exquisite images of animals in caves such as those in Lascaux in France, did so in deep and dark surroundings. Their art was meant for the selected few and they often “signed” their paintings by blowing pigment over their hand to leave their mark – a kind of early biometric. Human beings have always understood the concept of privacy and used it to reveal, or not, various aspects of themselves. Privacy is about choice, the choice to reveal or not to reveal, details about yourself and your life. These early people chose to reveal a part of who they were and this ethos of what privacy is, remains with us today.
We each need to feel as if we have control over what information we show to the outside world. Everyday, we find ourselves in situations where we disclose to individuals and organizations, various pieces of information about who we are, what we do, and how we do it.
When we enter our place of work, we may “clock in” showing what time we arrived. When we hand over our bank card to the coffee shop at lunchtime to buy our lunch, the bank now knows that we were at that shop and spent $10 at 12 noon on Tuesday. On our way home we make a call to our partner to let them know our arrival time, the phone company then knows our location and which number was called.
Privacy as a concept has not changed in the 200,000 years since Homo sapiens came into existence, but privacy as an action has changed, and the digital age has introduced layers of complexity to the underlying concept of privacy that we need, as digital citizens, to unravel and understand.
As we head into an era where we are intrinsically connected, to our devices and each other through the Internet of Things, privacy will become an even more disparate and complex landscape. We need to go forward into this new era with a deeper understanding of how to make sure privacy rights are maintained and respected.
How the Internet Changed Privacy Forever
The Internet changed how privacy is handled because of mass exposure. The Internet opened up communication channels that we had never used before and transferred information at speed, across multitude of outlets. There was no layer built into the Internet for security or personal identity. Without this layer, making a choice about when, to whom, and why to reveal certain data, was never going to be straightforward.
But the Internet was compelling, so these seemingly small issues were ignored and we all, as individuals and businesses embraced the power that the Internet gave us. Soon almost everyone had a website, commercial organizations digitized their processes and brought them online for all to access. It was this digitization of processes and tasks that required us to reveal personal information, which brought privacy into the spotlight.
Let’s have a look at a typical process, that was carried out pre and post the Internet.
global internet users have some degree of concern about their online privacy
Opening a Bank Account Before the Internet
Before the Internet, if you shared some personal details, such as your home address, it was generally in paper format, such as a form. This would then be handed in, or posted to the bank; one or two people who used it to set up your bank account would then see the form. You’d probably have to prove who you were using a utility bill, or driver’s license, which you’d show to the bank teller directly, or post in copies to the bank. All paper details would then be filed away in some dusty cabinet, until it was shredded some years later.
The number of people interacting with your details was minimal. Privacy breaches did happen, but it was on a much smaller scale because of the numbers involved in the process.
Opening a Bank Account After the Internet
Once the Internet took hold and became commercialized, sharing data, like your home address changed forever. If you now open up a bank account using an online service, this is a typical sharing cycle for your personal data:
In the online bank account setup, your data is now stored digitally and accessible by multiple persons. It can potentially be accessed in the following scenarios if security isn’t well implemented:
During storage by administrators of the database (at the bank and the credit file agency).
By cybercriminals who can hack into the database, perform attacks during transfer, or phish login credentials from either the individual or the system administrator.
The second scenario is the most concerning and we will go into more detail on that later.
The Internet has changed privacy forever and the genie is out of the bag. Unless you have had no online interactions in the last 20 years, no matter where you live, your personal data will be on many multitudes of databases, on many datacenters across the world.
Concepts of Privacy
In a digital age, the concept of privacy itself hasn’t changed. We, as individuals, still want to retain control over who has access to our personal information. In fact, as our online presence has become ubiquitous, and we’ve all settled into our digital lives, this need to retain privacy and ownership of our data has increased.
A Pew Research Center report into North American attitudes towards online privacy, found that 91% of adults felt they had lost control over their personal data, with 86% of them attempting to mask their online transactions.
A further report by Pew Research into privacy of data vs. perceived value of releasing the data shows a very mature and informed view of online privacy is developing. For example, 52% of participants were happy for their health data to be uploaded to their doctor’s website for management purposes; whereas only 27% were comfortable sharing data output from a smart meter in their homes.
Well, it seems that privacy has a price and if you get something back for sharing your data you don’t mind sharing it as much.
The maturing of consumer expectations is the driving force behind how privacy is handled online. Companies who transact with customers online, and need to create user accounts, as well as handle user data, have turned to a tool set of privacy-based methods to handle these data. To achieve privacy enhancement and respect for privacy of an individual’s data, a number of well-debated techniques have entered the online space.
The Opt-in / Opt-out Debate
A debate has raged in the industry for many years about the use of “opt-in” or “opt-out” options during a sign up process in a web form. At the heart of this debate is user consent, i.e. user choice to reveal information. These options allow the company to get user agreement to use the person’s information, such as their name, email address and so on, to then contact them, usually for marketing purposes. It may seem like a subtle difference between the two, but the act of actively choosing to “opt-in” is important as consent is an intrinsic part of privacy.
Opt-in: Privacy advocates prefer the opt-in choice, because this requires a user to actively check a box to state, I want to do a specific action, for example, share my details for marketing purposes. They see this choice as being “active consent“. Although the debate still rages, this option is, in general, the preferred one for privacy enhancement.
Opt-out: This option is seen as more “passive consent”. The user would need to remember to uncheck an already pre-checked box. It is seen by privacy advocates as assumptive and puts too much onus on the user to understand the implications off not unchecking the box.
Various countries across the world have dealt with this debate by bringing in legislation. For example, in Europe, the EU brought in the Privacy ad Electronic Communications Directive which explicitly sets out that an organization has to have “prior consent” when collecting personal data for email marketing purposes. This usually means that the user has to actively opt-in to give that consent. However, people interpret this in many ways and may use an opt-out box as long as it has a positive statement associated with it. As long as the statement next to the opt-out box is prominent, positive and cannot be overlooked, using opt-out continues to be acceptable.
The USA has also addressed the opt-in/opt-out debate through legislation across a number of states. The Consumer Online Privacy and Disclosure Act requires website owner to give full disclosure of their intent to collect personal information. Again this can be interpreted as offering an opt-out option, as long as it is backed up with a strong statement of intent.
Irrespective of the legislation and laws around consent and privacy, more companies are showing respect for the individual by using a softer approach to opt-in and opt-out and choosing to use opt-in options as best practice. This is a more pragmatic approach to customer relationship management as users become more Internet and privacy savvy, and understand the implications of targeted marketing and data sharing.
Cookies and Tracking
Cookies were invented to make web surfing easier, by keeping certain pieces of information, like preferences, in your local browser. When you then go back to that same website, your information is re-used to make the use of the website quicker and simpler.
However, there is a variant of a cookie known as a “tracking cookie“, which can impact on your privacy. These cookies can be configured to collect personal information and send it back to the host for analysis and use. Tracking cookies are used to profile users online behavior and use that to market ads back to you. The use of this type of user tracking for online ad marketing is now infamous. You’ll likely have noticed the use of this type of tracking if you go to a particular website, for example eBay, then go to another site, perhaps a news site, where you’ll have an ad for eBay pop up on the news site trying to entice you back in. This is done using Google tracking cookies. Google is able to cross match your online browsing between sites and use this to target market ads back to you.
Targeted ads are annoying, but tracking cookies have a more sinister side if the host collects personal data using them. Cookies can also collect data input into online forms, and this is where it could become of a greater privacy concern. Many privacy advocate groups are now pushing for an “opt-out” cookie. This is a cookie that is setup on first accessing a website that then prevents other cookies from being stored. The downside is that these cookies are domain specific, so you have to do this for each and every site you visit.
The Right to Be Forgotten
One of the most worrying things about the Internet is its longevity and reach. If you put a picture of yourself on Facebook after a night out, looking a bit ‘worse for wear’ in 2010, chances are in 2020 it’ll still be there for all the world to see.
That scenario is bad enough. But what if it impacted your business, as was the case for Mario Costeja Gonzalez, who in 1998 had a tax debt that even though paid off, continued to be found in search engines 15 years later. Mr Gonzalez ended up suing Google for his right to be forgotten using EU Directive 95/46/EC. The case was complex and in the end the court ruled that the right to be removed from search engines stood, but that it needed to be balanced against the right to free expression. The court case resulted in a blitz of 180,000 similar requests. This right is now being applied to search engines, but does not apply to news sites and similar.
Although this started in Europe, the principle is now being used worldwide to varying degrees. In Japan a recent case allowed a man convicted of child pornography offences to have his arrest records removed from Google search. Whilst in the USA the right to be forgotten is still hotly debated.
Social Platforms and Privacy
Social media has been one of the most successful applications of digital technology. Human beings love to gossip and platforms like Facebook have taken this human instinct and created a successful business model around it.
The model they have used is based on free access. But this comes at a cost to our privacy. Social platforms hold enormous amounts of our personal data. We sign up for an account, entering various details, such as name, address, date of birth and so on. Once we start using the platform, we enter data about our daily lives, our preferences and likes and whom we know – we even let the platform know our private relationships to others.
All of this forms the social graph, which can be used by developers to extract and utilize various social data.
Privacy Issues and Social Media
As social platforms have matured, they have been subject to much privacy speculation and debate. Facebook and Google in particular have come under scrutiny over their privacy approach, especially around their personal privacy settings.
One of the criticisms aimed at Facebook, is that is has complicated privacy settings and default settings, which are unable to be opted out of. The privacy settings of other platforms like Google and LinkedIn are similarly complex. Many of the settings are nested. This means that if you have restricted sharing to just friends, but then a friend opens a photo (for example) on the Facebook app, it may well then be shared with third parties, all without your knowledge or consent. In other words, your privacy settings are not inherited.
It has created a privacy web that is very difficult for the average user to navigate and predict. The goal is being set for social media platforms is to have good default settings and to a degree the platforms have complied with this request; Facebook for example, having stronger privacy settings for users under 18. Our social media accounts and posts are a rich seam of personal information, and services such as those offered by Social Intelligence, are set up to specifically use information that users have added to their social posts, or which like buttons they’ve used, to investigate insurance clams for fraud.
Being unable to fully delete an account, offering “deactivation” of accounts instead.
The “Like” button has also come under the scrutiny of privacy lobbyists. The “Like” button can be used as a type of tracking mechanism. A German court recently ruled that this breaks EU privacy laws if a user clicks a Facebook “Like” buttons on retailer websites, without giving explicit consent to the data that is then transferred on their behalf.
Targeted marketing must be one of the most irritating privacy intrusions of modern times. Social media sites, like Facebook, LinkedIn and Google, use various methods to push ads out to our social media timeline that they believe we will be interested in.
This has caused a storm in terms of the intrusive nature of the marketing. The social platforms use various techniques, including cookies, to watch your online behavior, such as browsing history, and even mining keywords and phrases from your social messaging. They then use this information as intelligence to push ads back out to you that they believe will be of interest.
This type of marketing has become known as “creepy tech” as it feels as if you are being followed online. Research by marketing analyst group, MENG, has shown that 73% of consumers do not like being tracked or target marketed. “Consented target marketing” however, may be the answer to the privacy vs. marketing conundrum. This is where the consumer tells the retailer what they are interested in and ads are pushed out based on that consent.
Real Names and Social Media
A row broke out a few years ago when Google Plus announced that registration for an account would require a person to use their real name and not a pseudonym. The outcry involved some of the most vocal and respected privacy advocates including Danah Boyd.
The concern was that there should be user choice in how a person presents themselves online and real name use should not be forced as often name choice is predicated on personal circumstances. A woman hiding from an abusive partner, for example, may want to use social media, but keep themselves hidden from that partner. Facebook retaliated saying that they required real names to ensure online safety. However Boyd retaliated with the statement:
“And you don’t guarantee safety by stopping people from using pseudonyms, but you do undermine people’s safety by doing so“.
Unfortunately, this topic is still rumbling away and Facebook have now brought in a policy of real name use. In a recent change to their registration policy, they now require that your name is verified using identity documents.
Mobile Apps and Privacy
Mobile apps are a phenomena of the 21st century. If a jobs worth doing, it’s worth doing using an app. The mobile app market place is expected to be worth $101 billion by 2020 according to App Annie. That’s a lot of apps collecting a lot of information. And it seems that these apps are not respecting our privacy.
Mobile app privacy is something users are becoming increasingly aware of. According to a Pew Research report into attitudes of consumers towards mobile app privacy, 60% of users decided not to install an app they had downloaded when they found out how much personal data it required for use.
Personal Data Sharing Across the Internet
All of our personal data be it contact details, name, address, IP address, geo-location, and even our online browsing habits, makes up our identity attributes. These data are also known as Personally Identifying Information or PII. The Internet is awash with our PII and the transmission and storage of it, using emails, API’s, mobile apps, social media sites, text messages, databases and so on, has opened up a “privacy can of worms” that we now need to deal with.
Sharing data across Internet connections has been the driving force behind the upsurge in cybercrime. Personal data is valuable and the price for stolen PII swamps the price for stolen financial details such as credit card numbers. A report by the Ponemon Institute in 2015 showed that the average cost paid per stolen personal data record was $154. If the record contained health data, the price increased to $363. The reason for this value is partly due to the data being useful for secondary cyber attacks because personal data opens doors to other criminal opportunities. For example, the 80 million health care records stole in the Anthem cyber-attack, and sold on the dark web, were subsequently used to make fraudulent IRS tax claims. Whilst our personal information is held across multiple disparate applications, apps and data centers, it will be at risk of privacy violations.
If the Internet is a minefield for privacy, then the Internet of Things is a warzone. The Internet of Things (IoT), a massively connected and disparate group of Internet enabled devices, is the next major technology uplift to hit our world. The Internet of Things is exploding and according to Cisco there will be 50 billion of these connected devices by 2020. The devices can be either consumer or business process focused. Each of the consumer devices will have a multitude of personal information about us. For example, in the health area, wearables, such as the ‘Fitbit’ have personal contact information, name, date of birth, etc. as well as daily activity information – all of it shared and held in a Cloud repository. Smart fridges, have account information and connect with your email account. Even smart beds, have your personal details and sleeping habits, shared and stored across Internet connections.
All of these data is being continuingly transferred between devices, gathered together, held in Cloud storage and analyzed. This leaves our data open to interception and breach. The IoT has been openly criticized for not being designed with a privacy and security layer. This omission means that ensuring privacy of data, generated and shared through the IoT will be a challenge.
The Law and Online Privacy
As we have entered the age of the Internet, we have needed to update our old pre-digital laws around security and privacy. There have been a number of countries that have created legislation or laws around privacy. The list below is not comprehensive, but gives you an idea of the types of work being done in countries which are actively working in this area.
In 1995 the European Union adopted the Data Protection Directive 95/46/EC. This was a framework, which worked to balance the protection of individual personal data, yet enable the free transfer of these data. It was brought in to cover computer based data and sets out the rights of the individual who owns that data to control its use. It is an EU wide legislation.
In April 2016 new and updated legislation was adopted to enhance the powers and extend the reach of the original directive. The new directive, General Data Protection Regulation (GDPR) 2016/680 now sets out the legal expectations of data sharing and free movement of data throughout Europe.
Countries such as Sweden and Norway, although not members of the EU, are now implementing the same EU Directive to control their own privacy violations.
Europe – USA Data Transfers: Safe Harbor
Transfer of data between jurisdictions is always difficult to control. With the European Unions stringent privacy and security regulations, transfer of data between U.S. and European countries has always been an area of concern. To accommodate the EU privacy initiatives, legislation, known as the Safe Harbor scheme was introduced in 2000. However, because of issues that came out during the Edward Snowden vs. NSA privacy affair, this legislation became invalid.
In October 2015 the European parliament overturned the Safe Harbor agreement after the successful outcome of the Schrems vs. Facebook case. Since then a new framework, Safe Harbor 2.0 or the EU-U.S. Privacy Shield has been agreed. This new agreement places stronger controls and guarantees over how U.S. companies handle data generated by EU citizens.
The USA has a mosaic of privacy and security legislation on a state, industry sector and federal basis. However, the U.S. does not have anything comparable to the EU privacy directive to cover individual data privacy. Some examples of this mosaic approach to privacy are:
The Federal Trade Commission (FTC) covers a wide scope of privacy and security of digital data and has powers to act upon violations. They set out specific requirements for consumer privacy and notification breach rules.
Children’s Online Privacy Protection Act (COPPA) is a framework for the protection of data of children less than 13 years old that is collected online – this is particularly pertinent to the use of mobile apps as those age groups are heavy users. The FTC regulates and enforces COPPA.
The Health Insurance Portability and Accountability Act (HIPPA), covers data privacy within the health care industry. It sets out expected criteria needed to protect ‘Protected Health Information’ or PHI.
Gramm-Leach-Billey Act, which covers the financial sector to ensure secure storage of financial data.
Australia brought in the Information Privacy Act in 2014, which was an amendment to the previous Privacy Act of 2012. This amendment was brought in to specifically deal with online data handling and has 13 Australian Privacy Principles or APP’s.
The Rest of the World
There are varying degrees of privacy protection legislation throughout the world. The countries mentioned above have the most stringent approaches, but others are following close behind. A comprehensive world map, showing data protection laws across differing jurisdiction.
Examples of Online Privacy Violations
Privacy violations cut across all sectors of society and technology platforms. The following are some of the more recent examples and some of which have been referred to in the earlier text, such as the Schrems case.
Europe vs. Facebook
Max Schrems, a law student at the time, filed 22 complaints against Facebook’s privacy practices around moving data outside of Europe. One of the issues Schrems had with Facebook’s privacy was that individual’s data may have been given to the National Security Agency, after allegations by Edward Snowden to that effect. The original request was refused by the court because of the existence of the Safe Harbor agreement.
However, in another case where Schrems took Safe Harbor itself to the courts for examination, the EU-U.S. agreement was shown to be invalid. This then opened the door for the original Facebook case to be reviewed. In October 2015 Schrems won his case against Safe Harbor which predicates the Irish privacy commission to investigate Facebook. The judgement can be found here.
Uber’s “God View”
Uber have been criticized for privacy violations from the outset. However their “God View” was a step too far. Uber created a mechanism, based on a person’s geo-location, as supplied through their mobile device, to track Uber customers at all times. This was even used as “entertainment” at executive parties, with a giant dashboard showing customer journeys in real-time. The company was not only tracking rides, but associating personal data with that ride. Uber ended up paying a $20,000 fine to the New York Attorney general, small fry for Uber, but at least a symbolic win for privacy.
Gmail Email Scanning, Targeted Marketing and Street View Privacy Violations
Google have had a number of privacy violations taken to court. These are some recent examples:
In 2013 Google were fined $7 million for collecting personal information via Wi-Fi connections when performing their Street View Project. The cars used in the project were able to collect information such as URL’s that a user has requested and even partial email communications.
Gmail admitted to scanning all sent and received emails from a user’s Google Mail. Google then mix this information with other data, such as geographic location, search results, map requests, even YouTube views, to then target ads at users. This resulted in a class action law suit against Google, the main thrust of which was that because they were also scanning non-Gmail emails, this violated the privacy of users outside of the Google Mail system. As of March 2016, this class action was dismissed by the court, but further smaller actions are likely.
Google is still under threat for email scanning of student email accounts. Google posted a blog announcing how they had stopped email scanning for targeted ad placement for students using Google Docs associated with educational emails. This post turned out to be not true and now a class action is being filed against Google for these privacy violations.
Verizon and Their “Super Cookies”
Verizon were fined $1.35 million by the FTC for violating customer’s privacy. Verizon were using a type of cookie that allowed them to do cross domain tracking and build up a picture of a user’s online web habits and browsing. The data was then analyzed and used to target market ads to customers. The FTC has also insisted that Verizon use an opt-in consent option for customers.
Privacy and Security: Two Sides of the Same Coin
Because of the digitization of personal data and PII, security and privacy have become intrinsically linked. They can be thought of as two sides of the same coin, and it is very difficult to get good privacy, without good security. Privacy overs a whole gamut of violations, from state sponsored citizen spying, through to corporate meta data collation for advertising, and ultimately cybercriminal theft of PII and PHI. The abuse of state powers to spy on their citizens is one, which is cloaked in political agendas. However, the latter two areas are both security issues. Corporate abuse of our privacy for targeted marketing is becoming frowned upon, and class actions, like the Verizon super cookie violation are being handled though the courts.
The disclosure of our personal data, however, is down to security. Privacy and data protection laws go some way towards ensuring that organizations take our privacy seriously by using appropriate security measures. Having data breach notification laws also helps by naming and shaming companies who have had a data breach. Organizations such as the Breach level Index (BLI) take this information and put it into the public domain for the world to see. The examples below are from the BLI, showing identity theft in the first few months of 2016, for the USA and Russia:
Because security and privacy are so intrinsically associated, it means that you can improve privacy by ensuring robust security, especially web security, measures are in place. In terms of web security vulnerabilities, the Open Web Application Security Project (OWASP) actively follow web security risks, and have a ‘Top Ten Project of web vulnerabilities which is helped by the development community with mitigation actions and advise.’
Being security aware, means that you are also privacy aware. Security and privacy is not exactly the same thing, but security has many elements that impact privacy. Ensuring that you follow good security practice puts you on the road to good privacy practice.
Designing a Website to Enhance Privacy
Being privacy aware, means that we can apply good privacy practice in our website and web application design. We can use all of the output of privacy debates across the privacy community to enhance our website – good privacy makes good customer relationships. Being a privacy-enhanced site, means that you show respect for your customer and take the protection of their personal data seriously. This will also result in better outcomes for your business.
Privacy violations and data theft not only cost businesses financially, but they also damage reputation. Even mammoths like Google and Facebook have taken a battering to their reputation because of their attitude to privacy. In this next section we ill look at some of the fundamental areas that you can look at when designing your site, with privacy in mind.
Privacy by Design
Privacy by Design (PbD) was a concept first proposed by Ontario’s Privacy Commissioner Anne Cavoukian in the late 1990’s. PbD is all about thinking about, and adding in, the elements of privacy at the beginning of the design stage. The argument is that, if you don’t consider privacy at the design stage, then bolting it on, as an afterthought, will result in a less than optimal approach to privacy. Privacy by Design is now recognized as an international standard for privacy.
Following these principles as a framework for web design and development will allow you to create the best possible privacy experience for your users and protect your own interests at the same time. The principles are extensible to any digital design project and can be used in the design of mobile apps and IoT devices too.
Securing Personal Data
Security is one of the fundamental principles of PbD. It is also a fundamental principle of web design and development. As well as Privacy by Design, Security by Design should be incorporated at the outset of a project. If your site or application in any requires user data to be input, or tracks user behavior, or other meta data like geo-location of users, then you should consider how to do this in a security enhanced manner.
Security encompasses the entire system and the architecture, from the ground up. Your architecture framework needs to have built in security parameters. How to address which areas of the system require security is about research and understanding the security landscape, as well as applying best practice security measures. One of the most difficult things to balance in creating secure online systems is usability.
The solution to this is not linear; this is a multi-faceted problem. Each part of the system needs consideration and may overlap in terms of impact on other areas. Security and usability are especially dependent on each other and there seems to be a direct correlation between increasing security and decreasing usability. However, with careful design, you can have good security whilst maintaining usability. Some areas that you need to apply good security measures to, include:
Secure Communications (HTTPS)
HTTPS is the secure version of HTTP. That is it allows for secure (encrypted) exchange of data across the Internet, between the web server and the browser. It is based on the open standard protocols of Secure Sockets Layer (SSL) and the more recent version of this, Transport Layer Security (TLS). Both protocols are based on digital certificates.
To create a secure site based on HTTPS you need to have a digital certificate in place. Digital certificates are purchased either as part of a hosted web package, or independently from a digital certificate vendor. When you purchase a certificate, your company (or you as an individual) is verified. This forms part of the security and gravitas of the certificate.
Any site that requires any data transfer to be made, for example in creating a user account, or inputting any user data, such as name, email address and so on, needs to have HTTPS configured.
Setting up a robust version of HTTPS can have many factors to it. Many web developers forget to ensure all parts of the site have SSL/TLS applied. For example, sometimes the login page itself is not protected by SSL and so is vulnerable to interception by cyber criminals. Getting SSL right is a fundamental security first step in creating a privacy enhanced site or web application. Having an HTTPS configured site also gives your customers a visual indicator that you are taking their data protection seriously – this is in the form of a padlock in the URL bar, which can be expanded to show the details of the certificate.
Authentication (Login Credentials)
If you create a user account on your site or web application, you should ensure you have the right level of authentication setup. Authentication (user login) is an area that is an attack site for cybercriminals. Phishing, for login credentials, is a very popular method amongst cybercriminals and all website and applications need to take precautions against this. In 2015 there was a 74% increase in phishing attempts according to Infoblox report.
Login credentials are most often username and password. This is because they are easy for web developers to implement, and easy for a user to remember. However, phishing makes the use of this type of credential vulnerable to attack. Even the strongest password is useless against a phishing email, simply because the user enters the username and password into a spoof site, and as soon as doing so, they are stolen. The only way to harden the use of username and password is to add an extra credential that is used after the username and password are entered. This is known as a second factor credential or 2FA. Second factors are typically SMS text messages, or mobile / hardware based codes, generated at the time of logging into a site or application.
You can improve usability when adding in this extra layer of security by using some other tricks such as device authorization. Device authorization means that when a user first logs into a site they set a policy so that whenever they login from that same device they don’t get asked for a second factor – they only enter their username and password. The policy can be set to expire after a certain amount of time and is a good balance between security and usability.
There are a multitude of options you can use for authentication and it really does depend on who is using your website / application. The choice of which options to offer can come down to whether your user base is consumer or employees; if consumer what is the demographic of the consumer and so on.
Ultimately you should carry out usability testing to determine which is the most usable, yet secure, method to apply.
Extending Privacy to Mobile Apps
Apps should not be forgotten when it comes to ensuring they are built with the ethos of Privacy by Design. In a Pew Research study into how teens view privacy in terms of mobile app use, 51% of them took the decision not to use an app that they considered had privacy issues. A further 46% switched off location tracking because of privacy concerns. Creating privacy enhanced mobile apps will gain customer confidence and improve app loyalty.
The GSMA, which is a global body representing mobile operators across the world, has a really useful guide on building mobile apps with privacy in mind. The guide reiterates the need for interacting with the end user in terms of consent to use data. It also covers more mobile app specific issues such as the collection of location data, use of silent updates and suppression of repeated prompting.
Overall, the design of any application whether for mobile use, or online use, needs to be done with the principles of privacy as part of the overriding design goal.
Before you start to develop your secure website or web app, you should follow the ethos of secure coding. Many of the vulnerabilities that are exploited by malware that is used to steal user data start off with a source code flaw. Practicing secure coding reduces this risk. OWASP offer a secure coding practices reference guide.
Before you start to code, set out your security requirements in a document that can be used to design the architecture of the system, and which can guide the developers in their tasks. Following the guidelines set out by OWASP in their Top Ten Project is the best place to start in setting out the security requirements for your site or web application. There are certain web security issues that are favorites of cybercriminals and that need to be addressed from the outset. These include Cross Site Scripting (XSS), SQL Injection and Cross Site Request Forgery (CSRF). All of these techniques used by hackers can be mitigated using known techniques, which can be researched online or dealt with by security consultants.
Database security is paramount in ensuring your data is secured and privacy retained. One of the main ways of getting at user data is through the database. Good database architecture and maintenance is the starting point for robust security. However OWASP have identified a number of techniques, which can be used to directly compromise a database. These include SQL and NoSQL Injection attacks which OWASP have set as the number one web attack method.
Keep watch on any issues and have a plan in place to respond quickly to any potential threat. The National Institute of Standards and Technology (NIST) have created a guide on how to protect PII which gives a lot of advise in protecting and securing personal data.
User Expectations Around Privacy on Websites
Once you have your security in order, you then need to look at the design of the user experience (UX) in terms of privacy. We mentioned earlier about consent being a big part of UX around privacy. Consent is also a great way to build interaction with your users, and in doing so start to build up a trusted relationship with them. You show that you respect their decisions and privacy; they in turn respect you for doing that. Privacy is a two way street. If you show respect for user’s privacy, they will feel trust in your brand. Having good security and privacy practices protects you as much as your customer.
Is written in plain language, using short sentences and bullet points
Closely aligns with your actual privacy practices
Explains what you collect, how it’s collected, and who you share it with
Don’t use a boilerplate – tailor your policy to your business – if you end up in court, this will serve you well in defending yourself
Take your own country or state laws into account when creating the policy
Make it prominent and accessible on your website / application
What is a cookie
How you can disable cookies
As mentioned earlier, Pew Research found that 86% of users attempted to mask their online behavior to retain their privacy. They did this is a myriad of ways, from encryption, to cookie deletion and even resorting to virtual private networks (VPNs). We can deduce from this that people do not like to be watched. When designing a website, we should take this consumer acknowledgement that privacy matters, and work with our customers to improve their overall privacy UX.
Simple things like the following can enhance the privacy UX and make for better customer relations:
Don’t ask for any information you don’t need. People are very savvy to this now and quickly tire of sites that require what seem like extraneous personal information. Do you really need to know someone’s gender, or their geographic location? If you do absolutely need it, give them the reason why. If you are using web templates to design your site, make sure they don’t collect information you don’t need.
Use the principles of data minimization – related to the point above, only collect what you need and only disclose what is absolutely necessary.
Let your users know what will happen to the requested data. For example, you may need to pass the data onto a third party credit file agency to check the status of that person. Let the user know who will be carrying out the checks, and why, AND let them know if that data will be stored, or not. When requesting information ask for the users consent for disclosure to third parties, or use the data for marketing purposes.
Use opt-in / opt-out appropriately. In any given situation, simple A/B testing of this can show which option is preferred.
Set default privacy settings as “on“. That is, make sure privacy is maximized, and allow your user to manage their privacy settings as they wish.
Use anti-phishing techniques in emails you send out to your customer – the Anti Phishing Working Group (APWG), a not for profit group of industry members has good advise on how to mitigate phishing attempts.
Tell them, engage your users if you make any changes that might affect their privacy.
Allow users to delete accounts and when they do, remove all of their information from your databases.
Get user consent. If you are using targeted marketing, tell them why you will be using it.