BREACH (not really ssl related)

In short:

Recognize sensitive data by adding the same string "manually" to the data flow and recognize a smaller increase in data transmitted as the "manual" given string has already been compressed and will be tagged as such.

If the "manual" string has not been seen, it will be compressed and the data flow will be a little bigger as the "already compressed flag"

Use i.e. nikto to detect such vulnerability.

Thank's to the InfoSec Institute the below lines were taken from:

Introduction

Back in 2012, when Juliano Rizzo and Thai Duong announced the CRIME attack, a TLS / SSL Compression attack against HTTPS, the ability to recover selected parts of the traffic through side channel attacks was proven. This attack was mitigated by disabling the TLS / SSL level compression for most of the browsers. This year at Black Hat, a new attack called BREACH (browser reconnaissance and exfiltration via adaptive compression of hypertext) was announced and it commanded the attention of entire industry. This presentation, titled “SSL Gone in 30 seconds,” is not properly understood and hence there seems to be some confusion about how to mitigate the problem. So I felt that this article should give some detailed insight into how notorious the attack is, how it works, how practical it is, and what needs to be done to mitigate it. So let’s have a look.

BREACH Attack

Unlike the previously known attacks, such as BEAST, LUCKY, etc., BREACH is not an attack against TLS; it is basically an attack against HTTP. If you are familiar with the famous Oracle padding attack, BREACH is somewhat easy to understand. A BREACH attack can extract login tokens, email addresses, and other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted). The attacker just needs to trick the victim into visiting a malicious link to execute the attack. Before going into the details, let me explain a little bit more about the basic things you need to know. Web pages are generally compressed before the responses are sent out, which is called HTTP compression, primarily to make better use of available bandwidth and to provide greater transmission speeds. The browser usually tells the server (through the “Accept-Encoding” header), what compression methods it supports and the server accordingly compresses the content and sends it across. If the browser does not support any compression then the response is not compressed. The most commonly used compression algorithms are gzip and deflate.

Accept-Encoding: gzip, deflate

When the content arrives, it is uncompressed by the browser and processed. So, basically with SSL-enabled web sites, the content is first compressed, then encrypted and sent. But you can determine the length of this compressed content even when it’s wrapped by SSL.

How Does It Work?

The attack primarily works by taking advantage of the compressed size of the text when there are repetitive terms. Here is a small example that explains how deflate takes advantage of repetitive terms to reduce the compressed size of the response.

Consider the search page below, which is present after logging into this site:
http://www.ghadiwala.com/catalogsearch/result/?q=
Observe that the text highlighted in red box is the username. Now enter any text (say “random”) and click “Search.”
URL: http://www.ghadiwala.com/catalogsearch/result/?q=random
So you can control the response through the input parameter in the URL. Now imagine that the search term is “Pentesting” (which is the username in this case).
URL: http://www.ghadiwala.com/catalogsearch/result/?q=Pentesting

Now, when the deflate algorithm is compressing the above response, it finds that the term “Pentesting” is repeated more than once in the response. So, instead of displaying it a second time, the compressor says “this text is found 101 characters ago.” This reduces the size of the compressed output. In other words, by controlling the input search parameter, you can guess the username. How? The compressed size would be least when the search parameter matches the username. This concept is the base for the BREACH attack.

Practical Attack

Now let us see how an attacker would practically exploit this issue and steal any sensitive information. Consider the site below and assume a legitimate user has just signed in.

[Before signing in to the application]

[Search page, which is accessible after logging in]

As shown in the above figure, also assume that there is some sensitive data in the Search page, for example, a card number. When the user searches for something (say “test”) the following message is displayed.

Now an attacker, using social engineering techniques, could lure this currently signed-in user to click on a link. The link would be a simple html page that has a JavaScript in it that will request searches continuously for search terms “100-1000.” For example, the JavaScript would request the URLs shown below:

http://localhost/demo/Search?p=100

http://localhost/demo/Search?p=101

………

http://localhost/demo/Search?p=10000

The attacker can also get the compressed sizes of the responses for each of these requests. Can you guess why the compresses sizes for each of these responses would differ and can you guess which request would have the smallest compressed size? Below are the requests with the smallest compressed sizes:

http://localhost/demo/Search?p=4545

http://localhost/demo/Search?p=5454

http://localhost/demo/Search?p=4543

http://localhost/demo/Search?p=5433

Below is the explanation of why the above requests have the smallest compressed sizes. Take the first request. Here is the response from the server:

URL: http://localhost/demo/Search?p=4545

As shown above, when the deflate algorithm encounters this, it makes an easy representation of the repetitions and thus results in a least compressed size. So by analyzing the compressed size for each of the requests from 100-10000, an attacker can simply deduce what the card number is in this case. This the beauty of this attack lies in the fact that we did not decrypt any traffic but just by analyzing the size of the responses we were able to predict the text.

To summarize in simple steps, for an application to be vulnerable to this breach attack, here are the conditions that it must fulfill:

The server should be using HTTP level compression.
There must be a parameter that reflects the input text. (This will be controlled by the attacker).
The page should contain some sensitive text that would be of interest to the attacker.

Remediation

Turning off HTTP compression would save the day, but that cannot be a possible solution, since all the servers rely on it to effectively manage the bandwidth. Here are some of the other solutions that can be tried:

Protecting the vulnerable pages with a CSRF token.
Adding random bytes to the response to hide the actual compressed length.
Separating the sensitive data from the pages where input text is displayed.