Back to Engineering Blog

RampUp 2020: RampUp for Developers Recap – How LiveRamp Addresses GDPR, CCPA, and Future Data Regulations

  • 23 min read

Speakers:

  • Tara Aida – Product Manager for Privacy
  • Quinn Stearns – Software Engineer

As evident by the title, our fourth RampUp for Developers session focused on privacy, with specific focus on GDPR and CCPA. Since innovation typically outpaces legislation, an ethical approach to data requires safeguards, policies, and engineering that goes beyond what is required by the law to protect consumer privacy. GDPR and CCPA are some recent examples of data ethics challenges businesses face. In this session, attendees learned how LiveRamp addresses recent legislation and the ethical treatment of data from a development and engineering perspective.

You can watch the video below, and can read the transcript below that. You can also download the slides from the presentation (which also appear throughout the video) here.


https://player.vimeo.com/video/397294040


Quinn Stearns: Hey everyone. My name is Quinn. I am the engineering manager on our privacy engineering team at LiveRamp. I’m not super-used to public speaking, so there’s a very solid chance I’m going to get very sweaty during this talk. It’s better if everybody just ignores it. So just… Perfect.
Tara Aida: And hi, my name’s Tara. I’m product manager for privacy and data stewardship at LiveRamp. So I work very closely to Quinn.
Quinn Stearns: Perfect. Tara said that I can’t do too many jokes during this talk so I’m going to try and get as much of it out of my system as quickly as possible. I’m sorry Tara. So before we dig in, who here is an engineer or a developer? So we’ve got a few developers in the crowd and how many people here work in ad tech in some fashion? Quite a few. How many people were impacted by CCPA in their work? Okay, cool. Perfect.
Quinn Stearns: So I had a feeling it would be a lot of people. This is actually the perfect talk. Oh no, Tara, did they get the title of the slides wrong?
Tara Aida: I don’t know, Quinn. Did they?
Quinn Stearns: I’m sorry, Tara. This was supposed to be coping with CCPA-related anxiety. I don’t know what happened here. Tara had some yoga exercises that she was actually going to lead the room in.
Tara Aida: Nope. Nope.
Quinn Stearns: Okay. Sorry.
Tara Aida: Not today.
Quinn Stearns: Okay. No, no. Let’s actually talk about CCPA. So this talk really centers around our CCPA journey and a lot of this is, how did we handle, as a company, CCPA? But I think there are a lot of broad-reaching lessons about, what does CCPA mean for ad tech systems, for software systems in general, and how are we going to proceed as an industry and as developers?
Quinn Stearns: So we’re going to get into a few of these questions and walk through it. How do we get to that January 1st time of when CCPA comes into effect? So I think that the place we want to start is, what was the state of LiveRamp before all the CCPA stuff started happening?
Tara Aida: Yeah. So I’m going to talk a little bit about that. Privacy, for a long time, has been a really core principle at LiveRamp, even before we had very prescriptive regulations like GDPR and CCPA and, broadly speaking, the way we thought about it was centered around this idea of privacy by design, and there are different principles and fundamental pieces to privacy by design. But the way that I think about it is making sure that you’re thinking about privacy at every stage of the process.
Tara Aida: So not just as an afterthought at the end, but when you’re actually designing, writing your PRDs, thinking about, how can I make this, design this, in such a way that protects consumers’ data? And that’s how we’ve thought about it historically at LiveRamp.
Tara Aida: And how has privacy by design actually turned into concrete ideas at LiveRamp? You can see this in a couple of different ways. Historically, we’ve always had a distinction between what we call PII or personally identifiable information. So these are things like raw emails and phone numbers, names and postals, versus what we’ve historically considered anonymous information, which are things like cookie IDs or RampIDs.
Tara Aida: We always had this distinction between these two types of data. We kept them separated and that was a big thing that came out of our work with privacy by design. A couple of other things that came out of that was the idea of supporting opt-outs. That’s something we’ve done for a very long time, allowing consumers to make a decision about whether they want to be involved in our workflows or not.
Tara Aida: And then, the other thing that we’ve done is more around process, is we’ve made a lot of efforts to make sure that our governance teams, our legal, data ethics, security teams are involved in every single product that we build and design.
Quinn Stearns: Cool. Cool. So in order to contextualize some of those principles that we had incorporated into our company and the way we did things, we wanted to talk a little bit about, what does our process and what our engineering systems look like with those principles in place before CCPA? And so, we’ll start by walking through what, what do our main client interfaces look like and the ensuing engineering processes when we’re handling personal information in the pre-CCPA world?
Quinn Stearns: We’re going to start with handling one client file. And so, for those of you – I don’t know general level of familiarity with the specifics of LiveRamp’s business in the room – but what we’re often going to do is process files that contain some useful piece of segment information about people.
Quinn Stearns: So we can imagine some client coming to LiveRamp and handing us a file of all of the people who like stocks. We have Jane in this file and Jane is a dog lover. And so, they’re going to send us this piece of information and we’re going to categorize it or we’re going to pass this through our system. So the first step of that process is our ingestion pipeline. This is going to look pretty familiar to a lot of folks. It’s just a standard ETL pipeline. There’s some LiveRamp specifics in here, but yeah, what this looks like is we hand this file containing personal and PII, personally identifiable information, to our ingestion team.
Quinn Stearns: And that’s where we turn it into LiveRamp’s concept of a person, the RampID idea. And in the old world, this is a really important step of our pipeline because this is an anonymization step. So as we walk forward through the process, we are going to see that all of the data has been anonymized and that makes a lot of guarantees about how our engineering team can handle privacy.
Quinn Stearns: So we take the data, we ingest it, and then we pass it to our data management team. And our data management team is going to handle segments of data. So they’re going to hold some collection of anonymous people and treat them alongside their segment information. So they’re going to hold data about all the cat lovers or, excuse me, dog lovers. Cool. Very different groups of people. Cool.
Quinn Stearns: So after we pass this to our data management team, we’re going to take this activation step, and historically this is going to be like a partner’s cookie space where we’re going to translate this to a targetable cookie, and now what we’re moving towards is a new world where we’re turning it into more future-looking identifiers, but really what we’re trying to do is turn this input information into this useful information for targeting people online.
Quinn Stearns: So we’re going to take that target information, we’re going to pass it downstream and I think we can look at this diagram holistically and see what are the general parameters of the privacy problem LiveRamp is solving. And I think the big thing you see is that once the person acquires that lock in the ingestion stage, privacy and security start to look like very similar problems.
Quinn Stearns: What we really have to be as cognizant about how we’re handling that information and ensuring that we’re treating it safely. Once CCPA goes into effect, that problem is going to look very different because anonymizing data is not enough. We have to provide some new things to consumers, and so let’s talk a little bit about those.
Tara Aida: Yeah, so as Quinn mentioned, that was the old world before CCPA. CCPA introduces a whole new set of language and requirements around consumers that will impact every step in that workflow that you just saw. The way that I think about CCPA is around, what are the core rights that are given to consumers? And there are three of those.
Tara Aida: Consumers have the right to opt out, which means that they can go to a company and say, “I don’t want you to continue distributing or selling my data.” They can also go to a company and say, “I want you to delete any data you’ve collected on me up until this point.” And then, the last thing that consumers can do is they can actually go to a company and say, “I want to see what have you collected about me. I want to understand what you’re doing with that data.”
Tara Aida: This is the strongest piece of privacy legislation that’s been passed in the US. There are a lot of similarities to GDPR and, as we all know, it went into effect on January 1st.
Quinn Stearns: Cool. So yeah, we have all of these new concepts that exist in the world and we are required by law to abide by them. And so, what does that change? How are things different now? I think the big thing here to start with is that all of a sudden our engineering teams are all exposed to a direct interface with consumers.
Quinn Stearns: And I think this is a similar thing. Before, where anonymization meant that all of a sudden teams didn’t have to be as directly aware of what personal information they might be processing, that’s just not a guarantee we can make any more. So engineering teams are going to have to be able to service these requests from consumers and affect them directly in their systems.
Quinn Stearns: So alongside that, it’s this idea of the shifting data model. So before, we had our own language for defining, what does personal information mean? And PII and anonymous identifiers meant that we did very different things with data. Once data was anonymized, then we essentially treat that as a much more private class of information that didn’t need to be directly addressed on behalf of consumers.
Quinn Stearns: In a CCPA world, that’s going to change pretty dramatically where people have a right to interact directly with that data and that really changes what the systems end up looking like. Yeah. So the next piece is an obligation to support our clients. We not only have an interface with consumers now, but now an obligation to help people who were previously working with us pass on their CCPA requests and help them effect those requests. And then, finally we have an obligation for this new transparency with the consumers.
Tara Aida: So we’re just going to walk through a couple of those impacts more concretely. So this was the workflow that we looked at before. Now a consumer can come directly to LiveRamp and say, “I want you to stop using my data.” And once she makes that decision, once Jane makes that decision, we then have implications in every team in the workflow, from data management who’s actually storing her data, to activations, who’s helping us convert her RampID into the downstream partner IDs, to our own identity graph where we hold information that allows us to recognize Jane.
Tara Aida: Each of these teams needs to be able to respect that consent decision that Jane has made for every single workflow going forward. And that’s, let’s see, yup. That’s something different that we never really had to grapple with before in the same sort of way. And one thing that comes out of that is that what you see from that workflow is that we need to coordinate a set of complex requirements across multiple engineering teams.

Tara Aida: And very early on in preparing for CCPA, this is something that we recognized. And so, one of the first steps that we took was to form a dedicated engineering team, the data stewardship team, to help be the experts on CCPA and to help coordinate and work with every single engineering team that we knew was going to be impacted.
Tara Aida: It also allowed us to bring legal and data ethics closer into the process. So the data stewardship team works very closely with our legal and data ethics teams. We seek to be a group that understands both the technical systems as well as the regulations and bridge that gap between our legal teams and engineering, where you actually have to build the features we need for compliance.
Quinn Stearns: I think that particular piece is really worth calling out as one of the really key levers that made us more effective in handling this regulation. Having those brokers who are experts in both the regulatory side of things as well as the technical side of things can really make the conversation between those different stakeholders in your CCPA compliance strategy much clearer. So I think we were able to be much more effective once we had that framework in place.
Quinn Stearns: So we talked a little bit about how there’s this organizational challenge that stems from this opt-out concept. It’s impacting all of these teams very directly. I think to highlight some of the technical challenges, it uniquely makes sense to talk about deletion requests, which are a little different than opt-out in their nature, where opt-out or a do not sell request requires you to not pass on this information. A deletion request actually requires you to remove the data from where it rests.
Quinn Stearns: So we can talk a little bit about… Is the slide not advancing?
Tara Aida: Yeah.
Quinn Stearns: Oh no. Oh, it’s… Oh, there we go. Hold on. We’re going to get this slide figured out in a second here.
Tara Aida: Is it advancing?
Quinn Stearns: I’m trying to think about a joke, but I really don’t have much.
Tara Aida: It’s not moving.
Quinn Stearns: Tara.
Tara Aida: It’s not moving.
Quinn Stearns: Okay, so we can, while we figured this out, talk a little bit about the nature of this challenge. So this idea of a deletion request, it shifts the way we have to handle this data because historically we’re processing all of this data in aggregate, and the parameters of these systems are these huge batch requests.
Quinn Stearns: And so, what is new with a deletion request is there’s an individual consumer asking basically for a random right to a massive dataset. And that’s not something we historically really supported. And so, this is going to involve a dramatic shift in lots of our engineering systems.
Quinn Stearns: And to highlight a little bit about, and I would love to see the numbers right now so I don’t misquote anything, but so some of these problems are going to just be scale problems where we have this massive index to data set that we need to completely rewrite as soon as we remove a member of it. That’s going to be a cost problem and it’s going to be a technical scale challenge.
Quinn Stearns: One of those big areas that, and you see this… Oh, here we go. We made it along. So yeah. So one of these places that we see these costs occurring is partly in this online identity realm, where we’re holding all of this targetable information about consumers. And you can see, so two petabytes is roughly the scale of the data stores we’re talking about that we have to delete from. That’s a massive cost to scan and find the records in question. And it’s also a massive cost to produce the new datasets.
Quinn Stearns: So there’s also the nature of this challenge in distributed systems where historically the way we would affect an opt-out is by tombstoning a particular record so multiple writers don’t conflict on an update and we are able to effectively suppress a particular identifier. That flow works fine. But this flow for deleting using tombstones is quite a bit more complicated for us. It requires us to wait for an additional step.
Quinn Stearns: And so, there’s really, really complex technical work that we both have to prioritize from a functionality standpoint and also pay for. So I think that this marketing data is useful en masse, but now all of a sudden we have to address these records individually. And I think that changes quite substantially the way teams have to work, particularly at LiveRamp, but I think in general as well.
Tara Aida: Okay. And then, the last thing that we just wanted to talk about is there was a lot of coordination that we needed to do with partners and clients. So not only did we need to coordinate with everybody who’s sending data into LiveRamp so that they knew how to send us requests, whether they’re opt-outs, or deletion requests, or access requests. But also there’s this notion under CCPA now of not just applying an opt-out within your own company system, but also passing that downstream and making sure that the opt-out follows the flow of data.
Tara Aida: And so, a consumer really does stop having their data leveraged. So that was another thing that we did a lot of coordination on is, is working with our downstream platforms to figure out, how do we send a notice of consent? I don’t know.
Quinn Stearns: Is it going?
Tara Aida: Yeah.
Quinn Stearns: It’s kind of going. Cool. So we’re going to just take one last pass at this. Now that we’ve put together some of the parameters, this coordination challenge and these new technical concepts, and we want to walk through how do these requests pass through our system and maybe highlight some of the challenges where they actually appear for us.
Quinn Stearns: So the first stage in an access request in particular is what we’re going to talk about. The first stage of this is just collecting information from consumers and validating that information belongs to the identity that it claims to.
Quinn Stearns: And so yeah, there’s this form there, probably everybody has one, where you can submit your personal information. And this is the stage where we’ll first take the piece of information that we need to affect an opt-out with.
Quinn Stearns: Well it’s going to get a little tricky after this and the reason for that is essentially that LiveRamp has a concept of identity that doesn’t necessarily come directly from the consumer. So the first challenge of CCPA for us, and I think marketing companies in general, is that we have to reconcile, what information does a consumer pass us that identifies them versus what information actually belongs to them in our systems?
Quinn Stearns: So we’re going to make an attempt to convert this consumer data to RampIDs and that’s our internal anonymous representation. And there’s just a lot of questions that come up here about, what constitutes ownership over a particular RampID, and that kind of thing? That ends up being really tricky.
Quinn Stearns: And so, we do have to figure out lots of ways to authenticate and prove ownership over the personal information that triggers this request, which I think is pretty generically tricky.
Tara Aida: Yeah. And one thing to note is that in our historical worldview, we only ever had offline data coming in and anonymous IDs going out. When you think about returning and generating access requests, we’re actually returning very sensitive, identifiable information back to the consumer. So the level of authentication and accuracy needs to be high, very high.
Quinn Stearns: Okay. So once we’ve received the IDLS that represent these people in our systems, we need to go convert these people into concepts that other teams can understand. So lots of teams store state at LiveRamp. Some of them store them in terms of the RampIDs themselves but other teams might store state on the basis of, say, a partner’s online identifier, all this targetable information, or maybe just a different… We have lots of different encodings of RampIDs.
Quinn Stearns: So we need to translate this input information into information that’s useful to communicate with other teams. This is a big data inventory conversation that we have to have with all these teams. So maybe with the data access team that holds the segment information, we maybe are going to be converting this to online identifiers, and some particular partner’s cookies and those are the forms of data that they hold state on. This is going to allow them to execute their access request.
Quinn Stearns: So this is the next point where we pass this data on to other engineering teams. In here there’s several examples of who we might be passing it onto, but data science. We have data science teams that aggregate all this information, in terms of identity and links and online identifiers. Pixel serving, which is collecting the information in the first place and LiveRamp TV, which is using it for a specific business purpose.
Quinn Stearns: These all might be different forms of identifiers that we’re passing on. And I think yeah, again, this is something you’re going to see in a lot of places where a lot of teams have technical reasons for storing information on the basis of different identifiers. So coming up with that shared language of, how do we talk about what represents a person in terms of lots of different identifiers is a pretty critical part of this process.
Quinn Stearns: So once we’ve received information back from these teams that we’ve passed these identifiers to, we need to consolidate it and send it back to consumer. And this is yet another point of great technical challenge because, again, we’re converting data in the language of scale to data in the language of humans.
Quinn Stearns: So a lot of this information is not really going to make sense to a person at first blush. If you give somebody a dump of all the cookies or whatever that they might own, that’s not really particularly meaningful. So contextualizing this information ends up becoming a big challenge, and what we seek to do is talk about at a high level, what do each of these data elements really represent and how can you understand them and make a consent decision on the basis of them?
Tara Aida: Right. And so, I think there are two pieces to this. One is getting the data back from each engineering team and packaging it and sending it to the consumer. Another really big part of this was working closely with both data ethics and engineering to come up with explanatory docs that basically try to explain all of the concepts, our products, our use cases, at a high level. And this is somewhere where we’re going beyond what’s the minimum requirements for the law, but something that we think is in the spirit of the law.
Tara Aida: Cool. And then, we just wanted to end with a looking forward slide thinking about what’s next. In many ways, January 1 was this big deadline for our team that we were working towards over the past couple of years, but in a lot of ways it’s only just started now.
Tara Aida: So a couple of the things that we’re looking forward to: One, we’re looking into the IAB consent string proposal that’s been put out. I think that one of the things that we’ve struggled with and worked on is, how do we accept consent decisions, opt-outs and deletions from our partners and how do we pass that down?
Tara Aida: Right now it’s a really complicated process because everyone’s pushing and pulling requests in their own way and it would be great if we could have some standardization here.
Tara Aida: Second, we’re looking at the finalized regulations from the AG (attorney general), which still haven’t been pushed out. We’ve seen the proposal. There were actually a lot of material changes in that. So that’s something that we’re working towards as well.
Tara Aida: And then, the last thing is just keeping an eye on all of the different legislation that’s being written right now and proposed, both at a state level and a federal level. There’s a CCPA 2.0 ballot initiative that’s out right now. So that might be passed. And there’s talks of federal legislation as well.
Quinn Stearns: Yeah, I think that there’s also this more generic piece of this idea that it changes the way our engineering teams have to do work pretty fundamentally. We thought that we were going to have all these great stable workflows and, of course, they require active maintenance and the way we design things is always going to have to account for this regulation.
Quinn Stearns: So where we could leave out pieces of the problem, how do we individually address records for consumers? That’s just not true anymore. And so, a big part of what we’re trying to do is feel our way through that problem and understand, how do we help our engineering teams come to a decision and write effective systems that account for this new regulation and many of the regulations that have yet to come. So this is another big thing that we’re going to be working for next. That’s the rest of the CCPA journey.
Tara Aida: Cool. And I think we have about five minutes left. If there’s any Q&A, we’re happy to talk through it.
Audience Question: So I’m asking you guys to have a whole bunch of identifiers for data, and then individual. What do you do when they conflict on the information about opt-out and opt-in, and does that happen all the time? Is that simple to resolve?
Tara Aida: Yeah. So the way that we handle that, we don’t typically have a lot of opt-in information. It’s more from the opt-out side, and the way we handle it there is when a consumer comes in and makes a request, we’ll basically see, what is the cloud of IDs that we could potentially connect to that person if they were to come through one of our standard marketing pipelines? And we take all of those IDs and we put them on a black list. And that’s what functions as the opt-out today.
Quinn Stearns: By and large, also you could say we just err on the side of over-suppression because of the sparsity of opt-outs in our overall data, it’s not actually all that common that we’re going to run into a conflict like this. But when we do, it’s just opted out and that tends to be a good enough solution for us.
Audience Question: So I was talking about opt-outs where someone’s coming to your website or the AI and actively opting out. What happens when you see a TCF string that says that this user’s opted out? Do you go back and propagate that to their person-based identifier?
Quinn Stearns: So in particular, with respect to propagating, we only propagate requests that we receive directly through our CCPA interfaces at the moment. And so, IAB consent string, we don’t yet have the capabilities in place to fully propagate an IAB consent string, style consent. We’re really acting on the level of, is this person executed a do not sell request or not? And that’s something we want to move into, how do we propagate a more complete idea of consent?
Audience Question: How do you deal with a consumer opting out from one of your customers, one of your customers but not from others?
Tara Aida: Yeah, so that was one of the big changes that we had to make for CCPA because, as I mentioned, we’ve been supporting opt-outs for a long time, but that has been just a LiveRamp opt-out. Whereas in this world we can have individual clients forwarding us opt-outs. So the way that we handled that is we now support basically client-specific opt-out lists.
Tara Aida: So if Macy’s sends us an opt-out that’s going to be referenced and applied to every Macy’s delivery, but it won’t be applied to, say, a Nordstrom delivery because that consumer has opted out with Macy’s and not Nordstrom. So that’s how we handle it.
Audience Question: How are you thinking about managing consent between, there’s device level, there’s person level, there’s household level. What approach are you taking?
Tara Aida: Yeah, basically all of our opt-outs are on a person level basis. So our conservative interpretation of CCPA was if we have a device that’s opted out and we have it connected to other devices, we’re not really respecting the opt-out unless we opt out those connected devices as well.
Tara Aida: And so, that’s we’ve taken a people-based approach to all our opt-outs, and we do though continue to offer two different types of opt-outs. So our recommendation is that a consumer does a person-based one, but if somebody doesn’t want to provide personal information like their name or postal, we do still allow them to do, say, a device-based opt-out. We just explain that there’s a difference in the depth of the opt-out depending on what they choose to do. Yeah.
Audience Question: And you’re not doing the households opt out?
Tara Aida: Yeah. So we looked into household requests a decent amount before CCPA went into effect, and what we saw is that there was a lot of guidance and complexity around household requests should only be applied to truly aggregate information that’s only associated to the household, not individuals in the household. Because the problem that you start to face is if somebody makes a household request, and then they can get back information on, say, their roommates or ex family members when they’re not really supposed to have access to that.
Tara Aida: So the way that we’ve handled that is if a consumer represents a household and wants to get information on the household, what they do is they make individual requests for every person in the household, and then we verify each person individually. And that’s how you get household information from LiveRamp because it always connects back to an IDL in our systems. Yeah.
Audience Question6: Just one thing I can’t remember. So you accept direct opt-outs, but you guys don’t work with NAI or DAA?
Tara Aida: We do. We do. Yeah. So I don’t think we’re in the NAI. I think we’re in the ANA and the DAA app choices. Yeah.
Quinn Stearns: It’s specifically requests. We’re propagating requests on a as granular level as the TCF string allows. Not all of the partners that we work with support the TCF string. So it just ends up being a massive problem. How do we migrate this whole thing? So right now the granularity that we can express is, this person has opted out. It’s not broader than that.
Randall Grilli: Cool. Thanks guys. We’re going to cut. If you guys want to talk to them, they’ll be round here for the next few minutes. And then, there is, again, happy hour from 4:30 to 6:30 where you guys will be, and you can ask them more questions there. We’re going to break for nine minutes and we’ll be back at 11:40.

 

 

Interested in more content from RampUp?

Clicking on the links below (to be posted and updated on an ongoing basis) will take you to the individual posts for each of the sessions where you can watch videos, read the full transcript of each session, as well as download the slides presented.

RampUp for Developers’ inaugural run was a great success and was well attended by a variety of attendees. Many interactions and open discussions were spurred from the conference tracks and discussions, and we are looking forward to making a greater impact with engineers and developers at future events, including during our RampUp on the Road series (which take place throughout the year virtually and at a variety of locations), as well during next year’s RampUp 2021 in San Francisco. If you are interested in more information or would like to get involved as a sponsor or speaker at a future event, please reach out to Randall Grilli, Tech Evangelist at LiveRamp, by email: [email protected].