In this episode I'm joined by Hannes Kuhl, a Solution Architect and Data Protection Officer at Trakken who has spent a lot of time thinking about the intricacies of Personal Data, the GDPR and it's effect on our (shared) field of digital analytics.
It was a great talk where we start out quite abstract and slowly move towards more technical topics. Hopefully, there's something in there for you to enjoy!
As of this episode, we will be releasing again on a weekly schedule. And added to that, we'll also release all new videos with a YouTube version included! Head over to our YouTube channel if you prefer video over audio.
Some of the resources mentioned in this podcast:
Make sure you follow the show:
- Follow LifeAfterGDPR on Twitter and LinkedIn & YouTube
- Follow the host Rick Dronkers on Twitter & LinkedIn.
- Subscribe to the show on Apple Podcast or Spotify, or wherever you listen to podcasts by searching for "Life after GDPR"
- If you'd rather get notified by email, subscribe for updates via the lifeaftergdpr.eu website
If you want to help us out, please share the link to this episode page with anyone you think might be interested in learning about Digital Marketing in a Post-GDPR world.
Talk to you next week!
PLEASE NOTE LEGAL CONDITIONS:
Data to Value B.V. owns the copyright in and to all content in and transcripts of the Life after GDPR Podcast, with all rights reserved, as well as the right of publicity.
WHAT YOU’RE WELCOME TO DO:
You are welcome to share the below transcript (up to 500 words but not more) in media articles, on your personal website, in a non-commercial article or blog post (e.g., Medium), and/or on a personal social media account for non-commercial purposes, provided that you include attribution to “Life After GDPR” and link back to the https://lifeaftergdpr.eu URL. For the sake of clarity, media outlets with advertising models are permitted to use excerpts from the transcript per the above.
WHAT IS NOT ALLOWED:
No one is authorized to copy any portion of the podcast content or use the Life after GDPR Podcast name, image or likeness for any commercial purpose or use, including without limitation inclusion in any books, e-books, book summaries or synopses, or on a commercial website or social media site (e.g., Facebook, Twitter, Instagram, etc.) that offers or promotes your or another’s products or services without written explicit consent to do so.
Transcripts are based on our best efforts but will likely contain typos and errors. Enjoy.
[MUSIC SOUND EFFECT]
[00:00:00] Rick Dronkers: Hey everybody. Thank you for tuning into the “Life After GDPR” podcast, where we discuss digital marketing in a post GDPR world. I'm your host, Rick Dronkers. And in today's episode, I get to interview Hannes Kuhl, the Solution Architect and Data Protection Officer at the digital analytics agency, Trakken.
Although we're both technology enthusiasts, in this episode we discovered a more philosophical side of data privacy and dive into the question of what personal data really is and why it matters. Hannah has clearly spent a lot of time thinking through these complex and abstract concepts and really shared some insights with me that improved my understanding on the topic.
We talk about what it is about data privacy that makes us want to protect it. How to think about personal data by considering it as data intimacy, consent as a legal basis and its challenges and much, much more. I really hope you enjoy this episode.
Also for all the audio only listeners as of this episode we're also releasing the podcast in full format on YouTube. So if you prefer that have a look at the “Life After GDPR” YouTube channel and subscribe. Now let's dive in.
[00:01:10] Rick Dronkers: Welcome to the podcast. How would you introduce yourself? What is your role?
[00:01:15] Hannes Kuhl: Tough question. I work for a Google marketing platform reseller, and at that I solve problems that are at the intersection of privacy and web analytics. So we have many people who deal with web analytics. We have many, no, actually not we just have me who deals with privacy on a legal basis. And I am one of the people who solve both problems.
So if plans come around asking how do we make Google analytics GDPR compliant? What is it about floodlight text? When can I fire them? They come to me and I try to talk to them and solve their problems.
[00:01:57] Rick Dronkers: So you're a busy man nowadays.
[00:01:59] Hannes Kuhl: I have gotten very busy over the last year. Let's say over the last 12 months.
[00:02:03] Rick Dronkers: Cuz you're based outta Germany,
[00:02:05] Hannes Kuhl: Mm-hmm.
[00:02:05] Rick Dronkers: I think your agency works well for German clients, but also I think a couple of more international clients.
[00:02:14] Hannes Kuhl: Most of our clients are in German speaking countries. So Austria, Switzerland, and Germany. We also have a branch in Scandinavia in Stockholm, which is growing. We also have quite a big office in Barcelona and we have some clients in Spain, but most of our businesses are in German speaking countries.
[00:02:36] Rick Dronkers: Yeah. And if not German, then at least in GDPR country. Yeah.
[00:02:41] Hannes Kuhl: Definitely. Yeah. I don't think we have clients outside of the GDPR area. If we consider Switzerland a GDPR focused country.
[00:02:49] Rick Dronkers: Yeah. Maybe we can do a separate podcast on that one.
[00:02:53] Hannes Kuhl: Data privacy laws.
[00:02:54] Rick Dronkers: Exactly. You're also very active on Measure Slack. And we were discussing ideas there and you came up with a cool idea of actually diving into the simple topic of personal data, which is a very expensive topic.
[00:03:10] Hannes Kuhl: Yeah. I saw more and more people asking the same questions. Is this personal datam is this PII? And I, while I feel them I, was there as well. And, but the answer is always, it depends. And therefore, I thought if we can dive into this and give a little more insights on, when is something personal data and why is it actually worth protecting it?I think it would be a goal worth pursuing.
[00:03:41] Rick Dronkers: Yeah, definitely. And I think the main question when it comes to anything GDPR related is, it personal data, right? That's like the first question you ask when you come across a piece of data, if you're trying to decide what actions you have to take. So if, you take us through the origin of personal data, what are we talking about?
[00:04:02] Hannes Kuhl: It's really difficult. And each country in the world has probably taken a similar, yet, slightly different approach. So I, at our company, I also give legal data privacy onboardings and tell our colleagues what they have to have in mind. And I always lead with the example that there's data privacy regulation everywhere. Like the US has data privacy regulation, Brazil, Japan, the EU of course, China has also data privacy regulation which is a country that you would not really think about.
And if you consider this, that every country in the world has some form of protection for data being personal, whatever personal is, we can explore that a little later. It indicates that there must be something universal to it that all humans just feel. And I read this book, “Human Universals” by Donald Brown in which he just explores things that every human does, regardless of the culture they grew up in. Be it collective or individual or progressive or very conservative, or if they are in a technologically developed country or a very, or a rural developing country.
They all have some culture, some universals that they share. And those are universal; like dancing, listening to music, friendship, and families. And one of them is actually privacy. So every human somehow has the need for some degree of privacy. And that makes you think, what is it to privacy that people who care about so that makes you think, what is it to personal data that really tickles everybody's nerves, every human. Is trying to be mindful of their privacy. And I recently read or listened to another podcast called masters of privacy in which there was this one interview guest who described privacy as the control over your future choices.
And this is where I think all of the magic is lying. We as humans are inherently bad with dealing with uncertainty. Like it makes us feel really uncomfortable if we don't know how a specific strategy or project plans out it makes us uncomfortable. We always try to get more security and privacy in a way is us behaving with when it comes to uncertainty.
So we don't feel comfortable with sharing intimate details about ourselves with everybody, because we don't know how this, someone who receives the data is gonna use it. So are they gonna use it to their advantage, to our advantage, to our mutual advantage, or we just don't know. And that's why we as humans try to be mindful of giving up pieces of information that can influence how we actually will be able to behave in the future.
[00:07:10] Rick Dronkers: The feeling that it is about future choices or about future options, let's call it that you had. I think a lot of people feel that, but not a lot of people realize it. So I think people who are actively working within privacy probably after a while figured this out, but until you just said it in this way also didn't really, hadn't really conceptualized it like this. That the real value of privacy is maintaining the optionality in the future to be able to make certain choices.
There's a door that might be locked or opened based upon the information you give up right now. I didn't have this concept in mind very clearly. Like, I knew like, okay there's value to privacy. But putting it in this context, like it's basically opening and locking doors for your future self.
[00:08:14] Hannes Kuhl: And you don't know which doors they're gonna lock and most of the times not why they locked or why they opened.
[00:08:21] Rick Dronkers: So a lot of people's argument when it, counter argument, when it comes to privacy is I have nothing to hide. Right. That's the famous thing. Like, I don't care. I have nothing to hide and in a lot of cases that is probably true, right. Like my browsing behavior on my supermarket website on the bananas I buy or something. Sure, it's probably gonna be fine. Right. That's that's usually it's probably not the. but you don't, you cannot predict the future. So you don't know the unknown, like you just specified. So that's a valuable point. And I think a nice way to, to describe it how they did in that podcast that you just mentioned. Yeah.
[00:09:02] Hannes Kuhl: All of what you said is true. And this idea of, I have nothing to hide typically comes from the fact that the data they or the companies will use are gonna be used for benign purposes, like show you specific ad or make their product better, which are all valid reasons and valid processing.
And therefore good reasons. But then there are also sometimes cases where it's, not just an ad. If you remember back to Brexit where they was this one town in England where they immigration was lowest in the entire England. And yet if you go to the streets, people and ask them why they wanted Brexit, they said, yeah, because the E.U. makes immigrants come here. I don't want them.
And they already were the company or the town with the, with the lowest immigration in tariff, England. And this was because political parties run ads there and showed people dangerous scenarios of what is gonna happen with immigrants. So especially when it goes into political areas, showing us ads is let's say loaded, like it's controversial, it's difficult.
Especially when it then gets personal and you can personally target people with political ads. This can have tremendous consequences for history basically. like Brexit is one of a kind until now. If you think about this, then there should be some degree of how you protect the personal data and how you protect people from being a danger to themself.
Like it's impossible for us, as you said, to estimate what someone is gonna do with a piece of data. Just to be careful in the future, some protection for data, personal data, at least should, be granted.
[00:10:59] Rick Dronkers: Another, great example that Simon uh, told me at Measure Camp, we recorded in episode with him, as well by the time this airs that episode's probably out, but he mentioned apps collecting data on women in their cycles and now in the U.S., of course you have the whole abortion thing going on. And that app data is being sold for certain targeting. And you know, that could be used for whatever kind of ad targeting, but it could now also be used for, Hey you are, you are a woman and you are pregnant and now you are not pregnant anymore. How did you, where did you do your abortion? Because it's illegal in this state, those kind of situations. So yeah.
[00:11:46] Hannes Kuhl: That's the future choices, we're not aware that we want to make them. They just. Become very important and all of this classifies as nothing to hide in air quotes, but it's still something also the processes of data should have thought about before and if this is the right thing to do.
[00:12:05] Rick Dronkers: So I think you had a classification for the data shared, right? So there's different types of intimacy of the data that you share.
[00:12:16] Hannes Kuhl: The more intimate specific piece of data is like the GDPR term is probably special. If it's a special piece of data or a special category of data. But the human word for it is essentially intimacy. So how, how secret should it be? There the scale ranges from like your nationality and everybody can know my nationality. If they couldn't tell by now, it's I can also tell them now it's, I’m German and I'm fine with everybody knowing that.
And then they're on the other end of the scale. Also pieces of data that are very intimate, like your bank account information or your credit card details. And the more intimate, specific things are, the more harmful they can be in the future. And also the, in general, the more powerful they are to someone else, like your credit card information can essentially be used to ruin your life. And someone could charge you random amounts of money. And then the credit card company says, yep, give us the money that someone charged on your credit card. And if you can't pay this back, it's gonna close so many doors in the future.
The scale of intimacy is very, very long. And the more intimate a specific piece of data is the more data subjects those who share the data and with that also the processes of the data should really think about, do I need this? And a, volatile example that somewhere in the middle between nationality and credit card details is probably when you share your phone number with a stranger in a bar, imagine yourself entering a bar and you get into conversation with someone you didn't know before and after an hour of talks, or so they ask, Hey, can I have your phone number?
And then in the offline world, you'd be able to tell if this person is worthy of getting your phone number, because from the conversation and their body language and the contents of the conversation, you could tell if they have good goals or good if they're good insight and are going to use it for good purposes. And if the relationship this phone number could, open this worthy to pursue.
But in an online world, this is much harder because you have no idea who's getting what data, when, and even if you had a relationship with a specific data processor with a specific app, then how do you know what they're gonna do with that telephone number?
In the offline world, they're gonna use it to send you a text message or, or give you a phone call, but online the phone number also is a really good universal identifier. It's a super cookie, essentially like everywhere you go, every website, you will have the same phone number. so while this telephone number in a bar example, doesn't fully translate to, the online context, it's still, it highlights some of the difficulties that we enter as webs analysts when we try to collect email addresses or phone numbers or social security numbers of our website users.
[00:15:24] Rick Dronkers: I like the analogy, but of course you could also be scammed in real life, but what is it about the internet, like on one end, it's the anonymity to the counterpart that you're dealing with like, or the possible anonymity. Right. So you never know who you're, who you're really talking to necessarily, but I think also because the internet allows for one to many communications, it is just the ideal place for scammers as well, yeah. [Laughs].
It allows cameras to try and throw out a wide net you know and a scam could never walk into a million bars and try to scam a million people in the same timeframe, but they can do that online. So I think that that also plays into it.
[00:16:16] Hannes Kuhl: The enormous scale at which you can do things on the web, like on websites or in apps doesn't matter, it is just opening so many new problem vectors. Let's go on that. In the offline world, you are constrained by the time you have, while in the online world, you're just constrained by the credit card that you put in behind your Google Cloud setup or your Amazon or your Agile setup and computing resources are aren't exactly scars.
[00:16:47] Rick Dronkers: No, not anymore. [Laughs] Maybe with CO2 uh, credits. okay. Yeah. So that part makes sense. Right? So privacy, basically. Why is it important because it is the control over your future choices, right? So that's the essence of it. And then we can classify it. ranges from pieces of data where you're comfortable with everybody knowing like, let's call it low risk data.
Me being from the Netherlands, you being from Germany, right. where we're, we're okay with that. We can hide amongst the masses. I can hide amongst a little bit less people than you, but still it's you know, wherever it's 17 million Dutch people. So it's it's enough. [Laughs]
[00:17:30] Hannes Kuhl: Still plenty.
[00:17:31] Rick Dronkers: But then, like you said you know, your credit card account you know, that poses that is unique to you and pose and also poses a high risk to you, is it a one to one ratio?Do you think it's always, if it is unique to you, is it always high risk?
[00:17:45] Hannes Kuhl: I tend to say no, but I've honestly never thought about it exactly. So let's assume an example, like your email address I would say yes, because you can use it to identify me, you can use it to contact me, you can use it to link my purchasing behavior between different shops, where I've left the same email address.
And if you take on the other hand, my name Hannes Kuhl isn't that common, but okay. Let's assume there's only one Hannes Kuhl from Lower Saxony, which is where I grew up in the Northern Western part of Germany. And with this information, you couldn't really do anything because unless you have more meta information, more context to it, you cannot use it to contact me.
So it's still personal, but the risk is not very high. If I told your 60 million listeners, my name is Hannes Kuhl. They have very limited ways of contacting me. They don't know my email address yet. They don't know my phone number or my address. so it's not necessarily high risk, but it can be because as soon as they find out that they talk to the Mayor of Munich and he goes into his database and looks up Hannes Kuhl and he only knows, he knows, oh, there's just this one.
He lives here and here, all of a sudden people can find me and they know everything I've said on this episode and they can hold it against me and they can stand outside my house. Say uh, you said this and that. And I don't agree with it. So it's not necessarily high risk if it's unique to me.
[00:19:27] Rick Dronkers: Let's go with, if it is unique, then it is higher but not necessarily the
highest risk, right?
[00:19:34] Hannes Kuhl: So ifit is unique, it can definitely not be ruled out that it's gonna be used against you in the future. So many negations. But it definitely has the potential to limit future choices to impact you in the future.
[00:19:50] Rick Dronkers: Yeah. Agreed.
[00:19:52] Hannes Kuhl: On the other side, if it's not unique to you, then not so much. Like If I just told you my name is Hannes, there's probably at least a thousand Hannes’ in Germany. Probably a couple hundreds in Munich only. And with that, you cannot do much.
[00:20:11] Rick Dronkers: Let's move over to the scope. What is personal data? The million dollar question, the million Euro question in this case. [Laughs] What is personal data?
[00:20:22] Hannes Kuhl: It's something probably a lot of the people of measure slack are now are currently wondering. And when someone asks, can you give me a list of parameters in Google analytics, that personal data. That's because they have no idea what personal data is. And I took a course at Maastricht University, like a year ago, as I was preparing for my activity as Data Protection Officer at Trakken.
And in that course, some of the professors gave me this rule of thumb that I still remember. And I think makes a lot of things easier to understand. And it's personal data is most likely everything a user had before you met them and will have after you met them. So email addresses, telephone numbers, credit card, information, and name, all of these things, data subjects have before they visit your website or open your app and they will have it afterwards.
And the cookie somewhat is also falling into this basket because you give them a cookie so that when they come back to your website, they show you the number that's in your, in their cookie again, and that you can identify them. That while there's a "potentially" in that rule of thumb, it still is a good indication.
Like everything that a user didn't have before and will not have afterwards is most likely not personal data. So that's the rule of thumb I try to live by. Do you have any good rule of thumb? Like how do you approach this?
[00:21:58] Rick Dronkers: My approach used to be heavily skewed to how Google Analytics talks about personally identifiable information. As we have learned from already, we cannot confuse personally identifiable information and personal data. They're not the same thing. [Laughs]
But I, yeah I used to think about it that way. And I used to think about it in a way of, okay, can I link this directly to an actual person? Basically, can I use this data point and with the data that I have, can I figure out what's this person's name and where do they live basically?
And that, that's where I started out. However, the issue with that is that it is not only about my data set. So let's assume in this case I'm the client, right? So I'm the eCommerce retailer or whatever, even if I can't figure out who that unique visitor is with my own data set, the problem is I'm sharing it with Facebook and Google and whatever. And I can't be sure that they are not also not able to figure that out.
So my, my data sub processors, basically. That makes it a complex equation where I've been, where I've recently switched to, okay, you should basically assume that if you use some, something you know, like Google analytics or Google ads, or Facebook Pixel, then if you're gonna send any identifiers to you should assume that they are able to figure out who this person is because of their you know, the fastness of their data set. And then, obviously the scope wides quite a bit.
[00:23:37] Hannes Kuhl: It does. Yeah. And the fact that you just spent, what was it two minutes on trying to explain what personal data is also shows how many ifs and whens there are in the concept of personal data. And that's where I think most of the confusion in the community is coming from. And not only the Measure Slack community, but anybody who is somewhat impacted by GDPR in the EU.
And that's, I would say. 95% of the Europeans and 5% they just for your measure and probably I don't care. And we at Trakkon, we are also trying to put this on a quantifiable basis. And how do you find something that's personal data? Like there are the obvious examples of personally identifiable information, like emails, credit card information, telephone numbers, where it's unique to a person. Plus you can use it to do some extra functionality, like sending emails, making phone calls, charging with money. And it's very likely that this idea also exists somewhere else, but essentially everything that makes you unique or differentiatable from others can be personal data.
So let's assume a web shop specifically targets Dutch people like it's let's call it cheese.nl and it only targets Dutch people. And there's this one weird guy who immigrated from the Netherlands to South Argentina and they open up this website every day and there's 100 users of the website coming from the Netherlands. And then there's this one user coming from South Argentina. And at this time, probably the information that they are from Southern Argentina is also personal data, cuz they're the only ones who are there. And at that point, it doesn't matter if you have a client ID, a cookie ID, a telephone number, email address, connected to the data you collect about them.
But the near fact that they're the only representation from A country that I have no idea if they're actually in as interested in Jesus, Dutch people just making it up as I go. But this will also be personal data for this one person and not the information country per se, this personal data, but it can be, and you should also try to find those.
I think in Google Analytics, this also is framed as cardinality. So how many different observations do you have per dimensional value? And this is essentially where are the definition of personal data expands to say it can be everything. If you just have one piece of data about your website's users and you know there's just one user who ever visited your website everything, this one piece of data contains is personal.
So that's where it gets so complicated. But once you try to think a little more about, are there dimension values that only represent very few users, are there dimension values that only represent one person like a city or a town. And in Germany there are a lot of rural areas where just five people in the village actually have access to the internet.
And everybody knows who the weird guys who always go to cheese.nl and at least in that town. And that's where all the confusion I think is coming from. It's so hard, there's no black and white. There's no list of dimensions that can be personal data because everything can be personal data.
[00:27:25] Rick Dronkers: So, in that case, you're basically taking the same approach as well, for instance, what Google does when it comes to like audience marketing lists, that it limits it. I think you have to have 500 or a thousand people in that list. I think that's also for the sake of the efficiency of the remarketing campaign itself, but it's also for the fact that you cannot single out a, single user.
So if you would reverse that way of thinking and apply that to your data set, if you basically, you want a data set, not to contain any parameters that allow you to drill down to a single user. Cuz then that basically means that it's personal data.
[00:28:07] Hannes Kuhl: You want to avoid outliers essentially in your data set because the outliers is where you can identify specific people. If you can identify specific people, you quickly slide into the definition of personal data, which will have to, which will require you to think about paragraph six of GDPR, the legal basis of your processing.
[00:28:32] Rick Dronkers: Yeah. So before we go down that route, so you. You have now the DPO title, but you are also a Measure Select technological guy. So let, if you think about the practicality of this, it is really hard to, like you said you wanna eliminate outliers, but these outliers will show up in your data set, Like you are implementing something, but it's like, you, you don't have full control over what comes in all the time. You could even argue that a person could perform a set of interactions in a certain unique way, which would make it personal data. If you go down that rabbit hole. Right.
So that basically boils down to this question. Do you think all data that we collect online, right? So let's call it all website data website or app data. Is it all always for definition personal data and do we need a certain legal basis for collecting and using it?
[00:29:35] Hannes Kuhl: No. So at least if so, okay, if we forget the concept of GDPR for a second. Yes. It's always personal data. If you attach a specific ID to something and connect each interaction event of a user to that specific ID, and every time the user comes back, they will have the same ID. At some point, the combination of their events will be unique enough for you to find out who's who, so at least if this unique combination of, for example, purchases also shows up in some other system where you have personally identifiable information.
So where you have an email address in your CRM for example. At some point, if I have made 10 transactions with revenue between 100 and 200 users, these 10 transactions will be unique for me and no other user in the world will have this 10 transactions. And this is also where GDPR makes things a little bit more complicated.
So while yes, I would say after, at some point, if you have enough people coming to your website, everything you collect online will be personal, but in GDPR you also have the concept of pseudonymization. So not just cause it's personal or you're not just not allowed to collect it and use it and process it because it's personal.But if you make sure it's not possible for yourself to reidentify that user a specific user specific person in that data that reflected then. It's also not as strictly bound to the rules of GDPR.
So thinking about the example earlier, where you have tons of events collected about a user, and if you put the dimensions of those events in big enough buckets, they will be so general that it's really hard for a computer system to find a link between your data database A where you have website interaction, events, and database B, where you store your transactions. For example, because you generalize revenue to be just like a random or a close number to the truth. And you also randomize the time stamps of the events and you don't send the city, but you just send the country of a specific user at that point. The data in your event storage and database A would be so different from database B, where the true personally identified both data lies that it's now possible for a machine to find the connection between the two.
And then you can also use it outside of the scope of GDPR, because it's essentially anonymous while I do realize that I'm leaning out very far out of this window and I will stop by saying the more dimensions you have, the less users you have, the harder it would be to make the pseudonymization fence high enough for nobody to jump over. So in theory, this works very well. In practice it's gonna require a lot of work.
[00:33:00] Rick Dronkers: in my previous podcast with Steen Rasmussen, he also mentioned that at his agency they've also built an engine like that. He didn't describe the details, but basically which anonymizes everything that goes into Google Analytics before it actually hits the Google analytics server. So on the server site tag manager it scrambles it let's say. And that's basically what you're also describing. Right?
If you make sure, let's say Google Analytics as an example but it applies to whatever kind of tool. If you make sure that you scramble everything that could be tied to the user. Event timestamps that you could link it to them taking a certain action at a certain millisecond. Right? So you scramble that. Of course, like transaction IDs, which you could obviously link to your CRM system when, whenever they do an order. Everything you can think of all the metadata of course, about their device and all these things probably wanna strip them all together. That could possibly allow your data set to be anonymized enough or pseudonymized under the GDPR. It would not require explicit consent for tracking.
[00:34:19] Hannes Kuhl: Yeah. And I think there's a lot of potential in the future. Like this scrambling, I think the technical term for it is synthesization. Making fake real data. Basically. You have the true value of what something gave you, what the event that someone created, but you change it just ever so slightly that it doesn't show up somewhere else, but it still provides value to you. So you still see, if the lifetime revenue of a specific user. You still see a number of conversions per day, but you don't know how to connect. The Google Analytics data. Let’s named the elephant in the room with your CRM data and just at least not on revenue. And you also of course have to hash transaction ID. And that way you can achieve anonymous data while it's still PDO, which can theoretically be deep pseudonymized and then it's personal data, but the fence can be made very high.
And the other approach is, differential privacy, where you just add noise to specific things like you have the true observations. You say you have 10 transactions by this user. And with likelihood of 5% you add an 11th transaction to this user. Although this never happens never actually happened, it makes it harder for you to link to specific databases. Both of which are very technology heavy.
And if we want true anonymous data collection to happen for the web, it cannot really be that technical because most companies don't have limitless budgets and I only have limited time and you only have limited time. So these concepts have to be made very accessible and built into the tool. Like, I still have high hopes for GA4 that's actually gonna be addressing properly those questions. Because that would be a major step. Like If, company like Google implements things that we are just discussing, it's gonna have amazing effects for the entire world.
[00:36:32] Rick Dronkers: That will be a chess move for them to come from like, from this position and make a leap forward. I have some clients who would like me to add some extra transactions to obfuscate their data. They would like some extra transactions in there.
[00:36:48] Hannes Kuhl: Yeah. Of course you have to be careful with how you obfuscate them and how much bonuses are tied to transactions and Google Analytics. But adding transactions can also mean removing transactions
[00:37:03] Rick Dronkers: Yeah. We don't tell them that we only add them.
[00:37:05] Hannes Kuhl: All right. okay. Then I can see how enthusiastic they would be.
[00:37:09] Rick Dronkers: I think this is a really interesting topic and I think. So, what I currently feel is that from a let's call it from the privacy side of things. This is often less discussed. I think on the privacy side of things you have two streams of people. One is focused on the law and on the rules. And the other is focused on the technology side and I haven't talked to many people on the technology side.
So I think this topic really fits on that side because I think a lot of people right now are focused on, no, you cannot do that. You cannot use Google Analytics, not good. Right. They just wanna tell you what we can't use.
And I think that's although it's necessary because you, we need. It needs to start somewhere. We need to wake up and legislature and rulings are probably the way that you get the ball rolling, but I don't think it's the solution. Like the solution will be on the technical spectrum.
And I think what you just described like for a lot of our analytics questions, I really don't care if the, if the event timestamp in milliseconds matches up correctly. Cause I don't, we don't use that for what we do.
[00:38:19] Hannes Kuhl: Not in the granularity.
[00:38:21] Rick Dronkers: Oo, and also I also, we don't like, of course the transaction ID, we've used that to debug to see when transactions were missing and which ones we're missing to figure out why, but we can figure out other ways to do that.
So if we can, if we can, scramble that and we can then use analytics tracking without consent, then we would definitely do that. Right. So there's a lot those reports about the device that I'm using. Of course, it's interesting for debugging to see that, Hey, people with Firefox have no conversion rate because maybe Firefox is the shopping cart is ruined in Firefox. That's super in interesting information.
However if, that is what it takes for us to preserve privacy and keep analytics, then it's definitely what we're gonna do. Right. So, But, I feel like those things are not talked about enough right now. So I think it's interesting you, bring it up.
What are your feelings when it comes to these techniques will need rulings, right? Like, somewhere a case will have to be made against a company who uses this new techniques and then have to rule like, yeah, this is good. Yes or no, somewhere in Europe.
[00:39:30] Hannes Kuhl: True. We will need rulings and specific ones like the, have you read the Austrian ruling in quotes again from December. And I think the wording was the Google user ID in combination with cookie ID in combination with 50 other dimensions can be personal data. So also very mushy again.
And. If this company, if net doctor had, for example, not used email addresses for tracking, if they had asked for consent or had offered these some transparency on what they were doing and had anonymized, synthesized their data collection as far as we just discussed, then maybe the history and in minor letters, would've been become a different one.And we would had a discussion about how anonymous data collection can properly look in the internet.
And this is where I personally would like us to go down. Cause if you ask me what top five worst inventions of all time are consent on the web is probably one of them because I have no idea what I accept when I click, I accept and I do this for a living and I know consent should be informed freely given, specific, unambiguous those things. And how can you make consent informed? I don't think it's possible.
Like you, you can of course provide the user with a wall of text and present them we use these 500 tools, send them their data. And this is basically the data controller handing over the responsibility of how they should do their things to the data subject, because it's easier. And at least in the short term it's easier. And, but the data subject has no way of verifying if they are okay with all of this. And so they're just accepting are like, yeah, whatever.
So if we could go into the direction where we go away from collecting data that could be personal and making pseudonymized walls and fences high enough. So that cannot be joined onto other data sources. Then this would be better for everyone.
Like I try to Google a specific topic and then open a new website with each Google result.And then on each website, the first thing you try to read a couple sentences then the concept shows up, then you click accept. Then it scrolls up to the top again. And then you find out this is not the website you wanted to go. So this is terrible. like we've identified a problem that's really important. Before it was also ruined when third party cookies were everywhere. And I think now it's a little less ruined, although people might not perceive it that way.
But maybe 10 years from now if we allows us to dream, we go away from consent and asking for it and maybe like topics API, the stuff in the Chrome’s privacy sandbox is, I think there's a lot of potential in there and putting user buckets and not collecting data that's just unique for one person, but still is valuable for us. So that's where I'd like to have industry.
[00:42:56] Rick Dronkers: No I couldn't agree more like I had a discussion a couple days ago where I had like a brain brainstorm idea of like, okay I want to ask consent at the moment the user does something. So, let's take the eCommerce example, right? My perfect situation, as the analytics guy for this company, I would want them to be able to measure analytics right away without consent, but also not being able to identify the user. Right?
So in this, what we just described, we scramble all the data. It's usable for analytics purposes, but it's not usable for identifying a specific user. And then whenever these users start to really heavily interact with the website and show a lot of intent in buying and the customer I have in mind, they have a relatively long customer journey. So I know that it will require several interactions before the product or service gets purchased.
So I know for them, I know there's a lot of value in being able to retarget users. Being able to spend their display budget on websites on users of which they know that they've clicked, add to cart, or they've visited multiple product detail pages. Right.
So what I would like to be able to do is after a certain threshold is met. So let's say the user has visited his fifth product detail page or clicked add to cart perhaps. On that moment, I would like to ask for consent and maybe give something in return. Right. So figure it out somehow, but basically mention like, hey, do you wanna, store your preferences of this kind of product? So we basically offer them like a feature. And then with that we say, hey we, also would like to place a cookie and then mention that we would like to use that for advertising, because in that there is this clear moment where you can ask it. The user is also familiar with your brand already, they're already clicking around.
I think that's a way more logical place to ask for something like that instead of them landing on the website for the first time. And you're like, Hey, give me all this consent for all the things that you are not aware of, what you're actually giving consent for.
[00:45:14] Hannes Kuhl: I like this idea as well, because it explores the idea of building a relation of trust first before you give someone your data. Like when I go to random eCommerce shop.com, I have no idea if I'm fine with the way they collect my data. And, but after a while, after a couple pages, I might have some idea about how they run their business.Of course, I only have partial view, but it's at least a more informed view than it was five pages ago.
I like the idea of enabling the user to give informed consent, because informed also means knowing who do you give your data to? Both from a legal perspective, like which companies, but also what are the values of that recipient? And this is already better than the way we do it now.
When it comes to tying your consent to other things like AUP or other benefits, other features, GDPR has its own opinion on that. And in that it's not very legal this, already the fashion ID, no planet 49 ruling. I think it was where they tied consent to the participation in the lottery or data collection. consent has to be unambiguous. And in this case it's not unambiguous because are they giving consent to the lottery or data collection or analytics data collection.
So there's certainly an amount of discussions with your client's legal team to be had, but it's possible, like you don't have to tie your consent to something else. But it's still, you could time it at a better moment then the landing page where the user hasn't even made up their mind yet if they want to return or if they are just gonna bounce. So storing those hits somewhere before sharing them with the third party. Anonymizing until you have consent for it. There's a lot of potential there, I think as well.
[00:47:13] Rick Dronkers: In this case, the client is in the US, so we can do a lot of things.
[00:47:16] Hannes Kuhl: The land of the free.
[00:47:18] Rick Dronkers: It's more of a generic approach, like asking for consent at a moment that it makes sense. I'm not sure if it was Steen or somebody else but, also what consent is, really interesting, oh no it was Till Büttner from DHL. And he mentioned like, yeah, consent is also a really nice bot filter. From that perspective it doesn't really make a lot of sense unless the valuable people are gonna click around on your website and get some interaction of before.
So you can wait a while before you can give some things before you ask for that consent. I think that's a way more organic approach. But of course, like in, we know the reality of how a lot of companies work right now. Like it's getting a consent banner up and making sure that it works is already a tough job for a lot of companies. [Laughs] So of course we have to think about these things and then figure out how to realize it. But I think that's definitely a future that I could see where we stop blasting every new visitor with this consent right away. [Laughs]
[00:48:22] Hannes Kuhl: Maybe to finish up this consent consideration, I always remind my colleagues when they ask, yeah, I should collect this data, but the client doesn't have it in their consent manner, but consent is not the only legal basis.
You can put your data processing on according to GDPR, but there's others like contract or legitimate interest, but legitimate interest is better and such that you don't have to interact with the users to get it, but you have to interact internally and think about what am I processing? Why is this in my interest? Why is this not conflicting with the interest of the user.
And this in the short term seems like less work and more solid than just asking for consent, because consent is, you buy a tool, you type in, I want to ask consent for Google Analytics and this tool, this consent management platform is gonna ask for you to collect data with Google Analytics.
But if you want to base that data collection with Google Analytics on legitimate interest, you would have to think about what data am I collecting with Google Analytics and why? And if you are pseudonymizating your data enough or scrambling it enough, sympathizing it enough, I don't think you restrict the interest or the freedom of the data subject much because the impact it can have on the data subject's future choices, decisions, potentials is very low if pseudonymization is done well, and the data cannot be used out outside of this one context anyway.
Danger happens when the data is used in other contexts, when Google can use it to, for advertising targeting, Facebook can do the same or credit card companies use it to give you a credit rating or something.
[00:50:15] Rick Dronkers: We have the pseudonymization of data, let's call it by scrambling. What, what other options do we have?
[00:50:22] Hannes Kuhl: We've mentioned before that personal data is data that someone had before you met them and will have after you met them. And if you could, for example, guarantee you only collect data that the user or the person the actual human will not present any time again in the future, this will also be a great way to anonymize the data because that way you cannot use it to go back to the user, you cannot use it to impact them in the future.
You can share it with Google, but Google cannot do anything with it or Facebook or credit cut companies, because the data is basically about a person that doesn't exist anymore. And in GDPR, there's also this recital that explicitly talks about dead people. And I'd like to ask the community if my take is right, but GDPR doesn't apply to people who are dead.
And my example is basically collecting people about dead people or waiting long enough to store the data until someone is dead. And the web analytics equivalent of a person dying is their cookies being deleted. And assuming of course, there's more dimensions that could be personal than the cookie ID. But if we assume the cookie ID is the only one, then we could collect data somewhere, somewhere neutral on the user's device. Requires a privacy consent different topic. But if we assume we have all that, we can store the data on their device. And then, and the user leaves, we unload all this data and at the same time, delete their cookie ID.
And then we have user in Google Analytics or other analytics tool with an ID that is guaranteed not to show up anymore. And then the client ID cannot be used to identify someone. It cannot be used to join Google Analytics data to other databases because it's got an ID that's not gonna show up.
Again, and there also lies a lot of potential to make sure data cannot be misused in the future. An offline example or an offline analogy of what I just said is a key that you have to a house that burned down the key ones was very valuable, but only as long as the house exists, because then you can go into the house and live there.
But if the house doesn't exist anymore, you maybe have a nice key left and it has material value of whatever metal is worth these days. But it's not as valuable as it used to be end similar to this cookie ID, which is deleted when the user leaves, the idea is still has some value, but it cannot be connected to, to a person anymore.
[00:53:22] Rick Dronkers: Just thinking out loud here. So if, when we set the cookie on a user's device, we also have the ability to define how long the cookie lives, right? So when setting the cookie, we can basically set its expiration date.
So let's say hypothetically, a user lands on our website. We set the cookie, let's say for 24 hours, just to keep the example easy. We store their interaction data in there. And that identifier that we used in that one cookie tied to that data and we store it somewhere first party. So let's say, so our self-built on-premise server in the EU. No Google, no AWS, right? And then we hold it for 24 hours until we are sure that cookie is deleted off the user's device. And then we send it to Google Analytics. It would bypass the fact that Google is not able to tie it back together. Right.
[00:54:24] Hannes Kuhl: Yeah. Google could not use it in the future for marketing purposes, for profiling, unless Google is doing some fingerprinting, which I don't believe. But if we also apply these scrambling methods that we discussed before then fingerprinting will be very hard at least fingerprinting well will be.
And yeah, then essentially there's no value in it for a third party company to extract from that data. Cause the value third party companies will have a specific piece of data is when they can reuse it in the future. And unless they're just analyzing it for like app improvement purposes and not use a specific purposes like marketing, then yeah. Then you could go around that.
[00:55:13] Rick Dronkers: So then there's still the point of you being the data in this case, the data controller and processor rights, cuz you're doing it all on your own on premise. But could you design it in a way that you can't access it? If you are unable to access it yourself?
[00:55:32] Hannes Kuhl: To be honest, I'm not 100% sure. I would assume it's possible that you could set up a server, which writes to an encrypted database that very few people have access to. I guess it's a little shortsighted if no one has access to it. But you could write it in a way that at least it's not clear text on the server so that even if I don't know the telecom cloud or OVH or the cloud providers of Europe would be seized and someone had the hard drives.
They could not do anything with it. So that's what these additional safeguards for transfer data transfers to the US are about. Can someone who has the hard drive access it, for example. And that can definitely be done. I am not a cryptographer, but I would assume it's possible.
[00:56:23] Rick Dronkers: I think that's an interesting thought experiment and probably somebody smarter than us can uh, maybe dive into that let's say I, as a company I want to, I want to take the most privacy preserving approach. And I only care about the anonymized analytics data for me to optimize the website experience. I don't care about the user data at also, I don't even want to store it. So if I would create a setup. It has this threshold, like after 24 hours when, when it knows that the cookies have been deleted, it will send it out scrambled and then the original will be deleted and nobody can access that original database.
And I can prove that because it has been set up once and it works, but nobody can access it then I would really love to see lawyers and judges try to figure this kind of stuff out, because this is eventually like what we wanna end up with. Right? Like these kind of hypothetical maybe discussions, because that would show us what we can and cannot do.
[00:57:24] Hannes Kuhl: So my still very unexperienced take on this would be, you don't need to ask consent for this, but you would still need to have a look at GDPR in that you at least need to document why is this not personal data? Why is this not, why is it not possible for anybody to do any shenanigans with?
And that way you can handle the GDPR question and be safe for transfers to third countries. The other aspect, of course, being e-privacy which is already implemented in some European countries like Germany, had it since December 1st where storing or writing any data. On the user's device requires consent unless it's absolutely necessary.
So there's also this 24 hour cookie. Also here, you would just need to go into the discussion, Is this absolutely necessary? Some might say yes. Some might say no. Data protection authorities probably tend to define necessary as very, narrow and nothing, but the user doesn't explicitly ask in this very moment is necessary. It's not necessary and with the setup you just described.
I think at least from a human perspective, you would build the best analytics set up because you cannot do shenanigans with it. You have good analytics data. You instead of concept of sessions, you can use it to improve your app, your store, your website. From a data privacy perspective it still requires discussions.
[00:59:07] Rick Dronkers: It's an interesting field. I'm really interested if like in five or 10 years or something [Laughs] when we can look back on this and see like, okay, we went in that direction or that direction. I personally, I truly think that let's call it privacy by design tech. So the technology side, that's where we are gonna have to figure this all out because technology trump's laws because technology you know, you cannot, you can't skip the technology.
Like you can decide not to adhere to a law, and then sure you can get a fine, but then the evil has already happened. Right. And, and, and with technology, we can prevent that. So I think it's always gonna be the better implementation, but it does take a while to figure it all out.
[00:59:51] Hannes Kuhl: It does take a while. Yeah. The motivation, the professors at my DPO course, always tried to instill into us was think privacy and analytics think privacy and marketing. And this genian thinking, like two way thinking, is where I think we should try to take our motivation from as well in the future.
Of course it's nice to have third party cookies everywhere to have super cookies, fingerprinting and Google Analytics without consent. But if you ask yourself, you as the data subject, is it really that nice? Every data controller is also a data subject at the same time. Unless you live in a cave and don't have any internet, don't use, don't interact with anybody, but everybody should also ask, am I myself fine with things being like this?
And if the answer to this is no, then you should think about the way you collect your data. And I think the more discussions we have now also you rate with opening this platform, this podcast, and also your interactions in the Measure Slack community they are helping tremendously. And the more people talk about specific problem the closer we get to the solution.
[01:01:14] Rick Dronkers: Yeah. Trying to figure it all out together. And not just every DPO is also a data subject, also every marketer. Right. And I think the, funny thing, the little bit, it's a little bit strange, but if you, if you look for which type of users are the users that have the most ad blockers installed, it's always us right. it's always our, industry. We know how to, we know we're, we're savvy enough. We know how to install those AdBlockers. And so it's really. You already know the answer to the question, right?
Everybody values their privacy in some way, or at least they don't want the nuisance of ads, but they also value their privacy. So the amount of ad blockers installed with people in our industry already says enough about where, where we should head.
[01:02:02] Hannes Kuhl: That's, so true. I, also have an app blocker installed a network wide, and I have exceptions for Google tech manager and Google Analytics,
[01:02:11] Rick Dronkers: Yeah. Otherwise you can do your work. Huh? [Laughs]
[01:02:13] Hannes Kuhl: Also that, of course, but it's, also I don't think you can do too many shady things with just Google Analytics. It's the shady things happen when you connect Google Analytics to other databases where I cannot really say what's gonna happen, but I can say really well, what someone can do with my Google Analytics data.
[01:02:34] Rick Dronkers: Hannes this was a great talk. I really liked exploring this topic with you. If people wanna learn more about you follow you online where can they, best contact you?
[01:02:44] Hannes Kuhl: I guess Measure Slack. Like, I think I'm the only Hannes on there?
[01:02:48] Rick Dronkers: We can check.
[01:02:49] Hannes Kuhl: At least one in 10, one in 10. If I'm not the only Hannes users can try to deep pseudonymize the information Hannes on Measure Slack. Like, I have a Twitter profile. I have a LinkedIn profile, but I don't produce content there. So if you want to follow me online.
[01:03:04] Rick Dronkers: There's actually at the moment of writing there's four Hannes on Measure Slack but you're the only one with this profile picture, so people can find you. [Laughs]
[01:03:14] Hannes Kuhl: Yeah. See? See?
[01:03:16] Rick Dronkers: Thanks a lot for sharing and hopefully I have you on in the future as well diving into more of these kind of topics.
[01:03:23] Hannes Kuhl: I’d love to. Yeah. Thanks for having me.
[MUSIC SOUND EFFECT]