What is the difference between anonymisation and pseudonymisation?

I’ve had the opportunity of discussing this question numerous times in recent years. Nowadays, though, I start with a counter-question: “Why is that relevant?” In almost every instance, the person posing the question seems to be seeking confirmation that anonymised data isn’t covered by privacy legislation, in contrast to pseudonymised data.

Strictly speaking, this could be seen as correct, but unfortunately only if you apply the definitions of anonymisation and pseudonymisation as they are stated in the law. These deviate from what we actually see in daily practice. I’ve put this to the test a few times by asking a number of family members and friends to anonymise some data for me. They’ve invariably limited themselves to removing directly identifiable data, such as a name or citizen service number, or by pasting a black censor bar over a photo.

Personally, I think legislators have made a mistake by creating a differing definition of ‘anonymous’ and not defining it explicitly. It would have been better not to use the word at all.

Anonymisation and pseudonymisation according to the GDPR

How, then, are the words ‘pseudonym’ and ‘anonymous’ defined in the AVG? Neither one is defined in the legislative text. That being said, the word ‘anonymous’ does appear at a crucial point. Recital 26 states that:

“The principles of data protection should therefore not apply to anonymous information…”

In other words, if data is anonymous, the law does not apply. What is actually stated here, and what would in all likelihood have been clearer, is: “…to data that are not (or are no longer) personal data.” The law is namely applicable to “the processing of personal data” (Article 2, paragraph 1), and for personal data, there is indeed a definition, found in Article 4(1), the core of which is “…any information relating to an identified or identifiable natural person…”

The GDPR does, however, contain a definition of pseudonymisation. This is set out in Article 4(5):

“…the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”

Here, too, the definition is broader than what is seen in practice, so this is about more than just technical measures. Read what I have already written about this.

The document on anonymisation published by the joint supervisory authorities in 2014,’WP 216’, is crucial when it comes to clarifying the difference between these two definitions. The definitions contained in this document did not change significantly in the GDPR and are therefore still relevant: the essential difference between anonymisation and pseudonymisation is that anonymisation is an irreversible process and pseudonymisation is a reversible one.

Indirect identifiability

Reversible processes lead to data which remain personal data. In irreversible processes, the question is whether the remaining data are still personal. This is a second important point from which incorrect conclusions are often drawn. The question which must be answered in that instance is whether these remaining data are identifiable. The litmus test for this can also be found in Recital 26:

“To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.”

The phrase “or by another person” is often disregarded here. The fact that you do or do not pass along the data to others is not relevant to the test. The test forces you to assume a scenario in which the data is public, which would be permitted if it were no longer covered by the law. You have to ask yourself whether there is anyone on earth who could possibly trace (parts of) the data back to individuals. The answer to this question is nearly always a “yes”, which means there personal data remain.

Traceability is almost unavoidable

The WP 216 opinion demonstrates the complexity and near-impossibility of processing detailed personal data in such a way that they are no longer personal. In fact, this is only possible in specific and very well-designed applications. The supervisors themselves wrote:

“Thus, it is critical to understand that when a data controller does not delete the original (identifiable) data at event-level, and the data controller hands over part of this dataset (for example after removal or masking of identifiable data), the resulting dataset is still personal data. Only if the data controller would aggregate the data to a level where the individual events are no longer identifiable, the resulting dataset can be qualified as anonymous. For example: if an organisation collects data on individual travel movements, the individual travel patterns at event level would still qualify as personal data for any party, as long as the data controller (or any other party) still has access to the original raw data, even if direct identifiers have been removed from the set provided to third parties.”

I.e., you can mask, hash, blank-out, pseudonymise and anonymise, but as long as you do not aggregate data (merge data into groups) and the original data continue to exist, each processed set will remain a set of personal data.

Conclusion

First and foremost, do not be misled by statements that data have been anonymised and/or made anonymous and therefore no longer fall under the GDPR. They have probably been anonymised, but personal data most likely as well. If you take this as your starting point, you can focus on processing it as carefully as possible.

Now back to the initial question: What is the difference between anonymisation and pseudonymisation? The conclusion is that, in essence, there is only one difference: whether the process is reversible or not. Reversible techniques are forms of pseudonymisation and non-reversible techniques are forms of anonymisation. Both are examples of techniques which fall under the heading of Privacy Enhancing Technologies. They are essential measures for the protection of personal data. You have to determine which technique is appropriate for each situation.

To show just how much ‘static on the line’ there is around all these terms, in a subsequent article I will discuss pseudonymisation services that are demonstrably irreversible and should therefore be seen as anonymisation under the GDPR.

Edwin Kusters

About us - Edwin Kusters | Viacryp

In his role as a data and privacy consultant, Edwin has mainly been involved in large BI projects for the past eighteen years. He increasingly noticed that clients had certain customer analysis requirements which were in conflict with the Dutch Data Protection Act (a forerunner to the GDPR). As a result, he went in search of a solution that would still make carrying out such analyses possible. The entire playing field of the definition of the law was the starting point, but he also had to take into account such customer priorities as time-to-market, quality of service and compliance costs. The creation of a specific, separate company, today known as ViaCryp, proved to be the most effective solution for clients when it comes to navigating this complex world. Edwin regularly speaks on privacy matters at seminars and congresses, and is a member of the NEN working group for the development of a pseudonymisation standard.

10 July 2018

What is the difference between anonymisation and pseudonymisation?

Anonymisation and pseudonymisation according to the GDPR

Indirect identifiability

Traceability is almost unavoidable

Conclusion

Read also:

Edwin Kusters

Video

Viacryp B.V.

Sitemap

Reference projects