Highlight
- Google Cloud’s newly launched tool will help users set the frequency and conditions during their data profiling.
- The system will automatically delve into the data and scrutinize each column of personally identifiable data to ensure it is not disclosed unintentionally.
Indeed, big enterprises gather all sorts of sensitive and confidential data. More often, it is personal identifiable data (PII) of their clients and workforce or other data that only a few users should access. But as the volume of data collected by big organizations increases, manual data discovery and classification cannot scale anymore. With automatic Data Loss Prevention (DLP), Google released a tool that assists its Big Query users to explore and distinguish sensitive data in their data warehouse and set access policies depending on those discoveries. In the past, automated DLP was in public preview and is now generally available.
“One of the challenges that we see a lot of our customers facing is around understanding their data to protect it better, preserve the privacy of PII for their customers, meet compliance or just better govern their data. We really feel that one of the challenges they face is just that initial awareness or visibility into their data,” Scott Ellis, Google Cloud’s product manager for this service, said in an interview.
Ellis observed that manual workflows incorporated by many enterprises are not able to cope with the scale of data that is now coming in. Hence, an automated system is used to dig in and evaluate every column for PII, for instance, to make sure that this sensitive data is not unintentionally exposed.
Many companies gather a huge amount of unstructured data, which might be a challenge to overcome. “One of the biggest challenges we’ve heard from customers is around: when they have a column of email addresses, it’s good to know. Once you know it, you can treat it like that. But when you have unstructured data, it’s a little bit of a different challenge. You might have a note field. It’s super valuable. But occasionally, somebody puts something sensitive in there. Treating those as a little bit different. Sometimes, the remediation is different for those,” Ellis explained.
To streamline the initial process for Automatic DLP, the team developed a fresh dashboard template for Google’s Data Studio to give users streamlined access to a detailed summary and a better graphical investigation tool. They can also utilize the Google Cloud Console to dig into their data, which is not really a user-friendly experience. They can also investigate this data in explorer or other BI tools, but the team wanted that the users get a seamless access point to work with their data that included a lot of self-learning.
Google also provides users with new tools to set the frequency and conditions during data profiling with this launch. During the launch of this service, Google teams set defaults, but during client interactions, it quickly became evident that there were often use cases where the profiler had to perform at various intervals. If someone makes an edit in the table’s schema, for instance, one enterprise may want that to be profiled immediately, and another might want to wait for a few days for that table to populate with new data.
Another latest feature designed by the team is integration with Chronicle, Google Cloud’s security analytics service. This new service will automatically sync risk scores for each table with the Chronicle, and the team promises to develop additional integrations over time.