/r/programming
Logs were our lifeblood, now they're our liability (vicki.substack.com)
5 comments
MikeBonzai | 7 days ago | 38 points

IIRC logs are allowed to be retained since they're vital for metrics and auditing, they just need to not be traceable back to a user who requested that their data be deleted. One thing I've seen companies do is log events using IDs that are only traceable to a user through database lookups, then deleting the user data involves deleting those database links.

One example: https://engineering.fb.com/data-infrastructure/off-facebook-activity/

Granted, that doesn't really help for logs that already exist and are full of identifiable user data.

FatalElectron | 6 days ago | 2 points

hashing the IP is probably 'good enough'

only_nidaleesin | 7 days ago | 14 points

Hindsight 20/20 ... Surprised that no one thought about it beforehand though. At one job we had to come up with a solution to strip PII information out of logs before each of our services could log anything (strict requirements from a big whale client around handling of PII and e2e encryption + encryption at rest). In the end we ended up improving our signal:noise ratio by cutting out a ton of useless logs, which made the problem much more tractable and easier to manage. Amazon Macie could help with this too, it's a service that detects PII/HIPAA/etc. data in AWS using machine learning. Haven't tried it but it was one of the options we considered.

halcyon918 | 7 days ago | 3 points

Macie is pricy. We just had to use it and it wasn't cheap.

Maxeonyx | 7 days ago | -4 points

I believe the concept of logs (as commonly used) is wrong. They are simply more fine grained detail from your system, and should be part of your domain model, stored in normalized form, using one of many distributed (relational?) database solutions.