Robert's blog
Robert Važan

Spam at the core of our security problem

Is it possible to stop automated attack without data inspection? How can the machine help you with your data without seeing the data?

Spam is not commonly considered when dealing with security, which is generally reduced to encryption, digital signatures, and uptime in the face of DDoS. Yet spam is key here. Email and chat servers function as electronic secretaries. They weed out spam so that we don't have to. End-to-end encryption blinds these electronic secretaries, which results in encrypted systems being overrun by spam, which causes people to abandon them and revert to unencrypted alternatives.

NSA or other adversary hell-bent on gaining access to private communication wouldn't hesitate to flood any emerging encrypted communication platform with spam in order to discourage people from using it. If they cannot control it, they will destroy it. Spam is no longer an annoyance. It's a deliberate attack on the network.

Of course, NSA and the like can just as well shoot down service operators or disable the service on network level, but that attracts too much attention and identifies the source of attack, which might just spur more interest in encrypted communication. They need something more subtle that can be blamed on operator of the network. Spam and other harassment does the job without revealing attacker's identity or intentions.

If I am to recommend end-to-end encrypted communication software to people around me, I have to make sure these systems won't bother users with tons of spam. But how do these systems implement filtering without seeing what's being sent? Is it even possible?

Encrypted communication systems generally do see at least metadata, i.e. who is talking to whom and when. While P2P networking makes part of this communication invisible, it cannot hide the metadata from the low-level network and anyone who has access to it. Peers might just as well report the metadata voluntarily to central authority in order to get spam filtering benefits without sacrificing much of their privacy.

Assuming that the service sees the metadata at least temporarily and clients voluntarily report user-marked spam, the service can then employ reputation system to turn these pairwise reputation markers into global reputation markers. There are known attacks on reputation systems, but countermeasures are available in every case. The general idea of reputation system is sound.

The only additional condition to make it work is that user identity is stable. No throwaway accounts. Some recent chat systems enforce stability of user identity by identifying users with their phone number. Phone numbers are relatively rare and hard to change, which leaves spammers stuck with a few numbers that quickly accumulate bad reputation.

Email however has no such rare identity source. Reputation systems can however track reputation of multiple identities at the same time, possibly arranged in hierarchy, e.g. domain owner, domain name, sender IP, sender account. IP address itself can be split into hierarchy by considering all the subnets that IP address belongs in.

This permits assignment of semi-stable set of identities to email senders that is good enough to be used in reputation system. The reputation system nevertheless has to be adapted for such use, including tentative ratings for new identities that permit rate-limiting of traffic until sender obtains better reputation.

One tricky thing about reputation systems is that they require global authority or a small set of such authorities to work effectively. This is normal way of doing things in modern communication apps that have single vendor. Email and some open chat protocols are federated though, requiring some kind of more-or-less centralized information sharing among all the servers.

There has been a proposal to charge people for sending email, which might discourage spammers. This is IMO unlikely to work given the amount of spam SMS people receive these days. And NSA-style adversaries won't mind paying the bill. Especially while the network is still small.

Simple techniques for detection of spam based on behavioral or technological differences between spammers and legitimate users won't work, because both commercial spammers and attackers can imitate user behavior and implement communication protocols correctly.

Manually maintained blacklists and whitelists are inferior to automated reputation systems. There are however honey pot techniques that can populate blacklists efficiently. These are useful as a supplemental anti-spam technique.

CAPTCHA can limit inflow of fake accounts, but cash-rich adversary won't mind paying humans for cracking CAPTCHAs for $0.01 a piece. Not to mention that massive attacks might justify investment in effective automated CAPTCHA solver. Nevertheless, many services use the technique, including free email providers, as a basic countermeasure against spam.

Personally, I would avoid email since there are too many uncooperative mail server operators out there. That makes it extremely difficult to roll out anti-spam measures globally and it is impossible to react to emergencies in timely manner. Chat services are the way to go. Finding perfect solution turns out to be difficult, but there is a future to be explored.