Interview: How does the pseudonymisation and depseudonymisation of personal data help the Municipality of Zaanstad with data analysis?
More and more organisations are facing the challenge of performing data analyses of personal data within the framework of the General Data Protection Regulation (AVG). And more and more organisations are opting to pseudonymise personal data. We asked Tom Pots, programme manager for data-driven working at the Municipality of Zaanstad, about his challenges in this area. What has he experienced thus far when it comes to the pseudonymisation of data, and what has this yielded in concrete terms?
What challenge (business issue) was Zaanstad facing?
‘The Municipality of Zaanstad wants to work in a data-driven manner, based on the idea that social issues can be tackled more effectively on the basis of data. We succeeded early on in being able to make our most important data accessible in a variety of ways. The big challenge in this is safeguarding privacy. How do you do justice to both the legal obligation to protect data and the performance of a legal task such as tackling crime which undermines society or taking an integrated approach in the social domain? Many government organisations struggle with this complex issue. That is why we chose to build our own ‘data warehouse’. We retrieved data from various sources, including basic land registry records, real estate valuations and the Personal Records Database. We pseudonymised these datasets and then stored that as securely as possible. The weakest link in this process was us because we did everything ourselves. As a result, a small group of data engineers were still working with a great deal of personal data.’
How did you solve this?
‘We created an environment without weak links by using Viacryp’s pseudonymisation software. In the new data warehouse, data is pseudonymised at the source, which means that no personal data is used in the data retrieval process, except for the pseudonym, of course.
In the new architecture, we distinguish between an Analysis Data Street (ADS) and an Operational Data Street (ODS). In the ADS, the traceable characteristics are aggregated, minimised or pseudonymised. The data in the ADS is suitable for analysis purposes. The ODS, on the other hand, contains almost all characteristics. The vast majority of questions can be answered with the assistance of the ADS; personal data is rarely needed for analysis purposes. Making the ADS accessible is an automated process of making the data accessible, pseudonymising it and, finally, storing it in a silo in the data warehouse. Each department manager is responsible for their own process, data and application. As the source owner of the data, they are responsible for both the ADS and ODS within their silo.
We also immediately started making a data catalogue with all the definitions to enable us to see what we have, what has been requested and what has been delivered. Everything is logged. With that, we are able to be very transparent.’
Why did you choose Viacryp as a partner for this improvement project?
‘Before we started this trajectory, we did a one-off thematic analysis in our data lab with Viacryp and another party. The criteria that were important for us were collaboration, speed of response, partnership, focus on content and expertise. The collaboration with Viacryp came out best by far, which is why we asked Viacryp to also become a partner in the structural process of data retrieval. During this process, it soon became clear that what was promised on paper was actually delivered. It was also good to see Viacryp’s technical people made good sparring partners for our data architects. They spoke the same language, and this really created flow.’
What approach did you choose to achieve the desired solution?
‘The automatic set-up of the pseudonymisation street for the data warehouse was a technical feat. We worked well together on that. A so-called client tool was installed, which took care of the first pseudonymisation step. Viacryp also ensured configuration of the pseudonymisation street. An important aspect is that Viacryp’s pseudonymisation service has become part of our data disclosure process. The service fits with our approach to guaranteeing privacy and not using personal data in data analyses. The key ingredients are the DPIA, and the privacy statement with a basis and purpose limitation, so that you can always explain why you have used which data. This is followed by the pseudonymisation, aggregation and minimisation of data, so that no personal data is used, except for the pseudonym. Viacryp’s starting point is: no readable personal data within the Viacryp domain and, of course, a robust overall solution. The pseudonymisation process is easy to explain.’
What results have been achieved?
‘Privacy does not have to be restrictive, and the person requesting data has to be able to explain very carefully what data is really necessary and why. Source owners are often like hawks; if you want to use data, you have to be able to demonstrate that privacy is properly regulated. With the support of Viacryp’s services, we have developed a standard working method for dealing with data. The design of the ADS plays a vital role in this. We examined each dataset characteristic by characteristic and made conscious choices as to the level at which the data had to be stored. We analysed the traceability per characteristic: for example, all houses that above a certain value in a certain neighbourhood. Because you do this together with the source owner, they are also more willing to make the dataset available for data analyses. People knew that the dataset in the ADS was well thought-out; it was pseudonymised, aggregated and minimised, so there was little or no traceability in the data.
By agreeing on a common working method, the use of data is made possible in a responsible and accountable manner. In 99% of cases, you do not need personal data for data analysis. You usually don’t need to know the ‘who’, only the ‘what’. Why ask for everything if you don’t need it? Our approach to ensuring privacy ensures that a source owner knows how the ADS is structured and how the source owner can make it available.’
You also use the option of depseudonymisation. Can you tell us more about this?
‘In 1% of cases, you still want to depersonalise pseudonymised data in order to gain access to the original personal data. For example: In a neighbourhood in Zaanstad, a number of reports were coming in about poor quality of life, lack of safety and undermining of the economy. When that happens, we have a legal obligation to act. Alongside the signals from the neighbourhood, the decision was made to conduct a data analysis to gain a better picture of the problems in the area. The privacy statement is always the starting point so that it can be determined in advance whether the data analysis may be carried out. Three indicators were examined on the basis of pseudonymised data:
- Moving house more than four times in a year
- People living in a space of less than ten square metres
- More than four adults in one house
The analysis produced eighty premises that scored on the indicators in the area under investigation. You still don’t know what building it is, because each building is a pseudonym. At that point, an advisory board meets. It consists of a lawyer, a privacy officer, a spokesperson, the source owner and the programme manager, and they weigh up the two legal tasks: enforcing housing fraud and safeguarding privacy. This is compliance in the real world. They argue the necessity, weigh up the interests and draw up a joint recommendation on whether or not to depseudonymise, with the important criteria being effectiveness, legitimacy and (administrative) risks. The decision is supplemented in the privacy statement and submitted to the portfolio holder (in this case, the mayor). If Nieuwsuur or De Telegraaf appear on our doorstep, we can always explain why we did what we did. The depseudonymisation of data is an official decision that must be supported administratively.
The process is transparent, explainable and well-documented. This provides you with a clear justification when questions such as, ‘How did you use data analysis to ease your legal task of enforcing housing fraud?’ are asked afterwards. An important principle in using data analyses is that the actions of people are always paramount. A suspicious profile does not make a criminal, and a suspicious object does not mean that criminal activities are actually taking place. Data analyses only have a signalling and advisory function. The human touch always follows an analysis (such as an in-depth investigation) before a decision is made. In this specific example, twenty of the eighty buildings were selected by the team after in-depth investigation. The data relating to the other buildings was immediately destroyed. The municipality inspected twenty premises, and nineteen of those premises involved some form of housing fraud.’
What has Viacryp’s service made possible that could not have been done otherwise?
‘Twenty years ago, data was not as important as it is today. Nor was privacy as high on the agenda. As the importance of data has grown, privacy has become a weighty component, and rightly so. A great deal is still possible, as long as you remove the ‘who’ from the data. Moreover, in 99% of cases, you do not need those personal details at all, and you can use data analyses with pseudonymised data, so without using traceable personal details. These analyses would not be allowed if we were to use identifiable personal data. Viacryp’s pseudonymisation service is therefore the way to leverage the potential of data analytics while protecting personal data.’