This is how pseudonymisation works

This is how pseudonymisation works

An author who doesn’t want the public to know he has written a particular book will use a pseudonym. Applying this concept to data analysis allows data from various sources to be compared without compromising the privacy of those involved. But how does pseudonymisation actually work? Edwin Kusters, co-founder and director of Viacryp, is happy to explain.

(This article was published on

“Pseudonymisation is a process through which identifiable data is replaced by a code. In this way, you can still see that someone has bought a book and that they have bought another book three weeks later, but you can’t see who it is. After all, knowing the individual behind the data isn’t relevant for many purposes and analyses. If you want to distribute the flow of passengers efficiently among trains, you don’t need to know who’s on the platform, but only whether a seat’s available and in which trainset.”

Technical and organisational

“As an independent third party, we at Viacryp have developed a service to safeguard people’s privacy when an organisation conducts research using personal data. It’s important that our service consists of both a technical and an organisational element. The GDPR repeatedly states that companies must take both technical and organisational measures to safeguard the privacy of users. After all, you can implement a beautiful technical system, but if users are not aware of its purpose or of the risks of a data breach, you’re still not doing a good job.”

“A breach of privacy can have incredibly far-reaching consequences. Money can be lost and earned again, but if someone’s sensitive data is inadvertently made public, it can hardly ever be undone. Look no further than the massive Sony data leak of the home addresses, telephone numbers, social security numbers and salary information of both regular employees and A-list celebrities. The fact that nearly everyone immediately remembers this case says it all, even though it happened some time ago. It shows how crucial privacy is.”

Hash and encrypt

“Our raison d’être is to defend against data being easily linked outside the agreed frameworks within which organisations have obtained that data. We do this by ‘hashing’ the datasets of the various parties that want to link their data. This means that a code, the hash, is assigned to identifiable data by means of an algorithm. That data always gets the same hash wherever it occurs in a dataset. In doing so, you can compare data without it being traceable. So this is about the ‘who’.

In addition to that, you also have behaviour or information about the who. That is yet another part of the process, and that’s where our encryption services come in. Encrypted information always has two keys: one public and one private. The public key is used to encrypt the data and can be made public, but only the person with the private key can decrypt the data again.”

Filtering information

“A good illustration of this would be a police investigation into individuals who’ve committed a criminal offence. During an ongoing investigation, it helps if the police can see who’s driving in and out of a city by means of license plate information. But one might also wonder whether the police then collect all the data of individuals who are not suspected of anything alongside that other data.

By using our filtering service, the police are able to compare only a set of hashed license plates used by the suspects by using, for example, files from environmental cameras which register which cars are passing by. These license plates have also been hashed, and when there’s a match, law enforcement can only see those data. This ensures the privacy of drivers who haven’t done anything wrong.”