Here are the highlights of our research on do-it-yourself kits for phishing attacks, allowing attackers to quickly and elegantly mount a phishing campaign. These slides present examples of phishing kits, reviews their main capabilities, and shows a statistical and clustering analysis of our collection of phishing kits. The main goal of our research is to shed light on the dynamics of phishing and the distribution of phishing kits in the underground community
Slide 1
Hello, my name is Luda Lazar. I’ve been working at Imperva for 5.5 years. I’m a security researcher.
Slide 2
Introduction
Slide 3
Today I’m going to talk about phishing, particularly about how easy and quick is to setup a phishing scam.
I will present examples for phishing kits and their main capabilities.
I will show Statistical and Clustering analysis of our collection of phishing kits.
Slide 4
So how to setup a phishing campaign in 60 seconds:
To achieve this goal we will need the following components:
Phishing pages
Spam service/server - SMTP infrastructure: To send massive amounts of emails
Email list for spamming - List of emails (cost depending on country, freshness and targets)
Compromised servers to host the phishing pages or hosting services - The attacker needs access to compromised legitimate servers to remove the dependency on hosting services and host the phishing pages.
Each of these components can be purchased in few second on the Russian black market.
In previous research we focused on other components of the phishing, such as compromised servers.
In the current research we focused on phishing kits to understand their main capabilities, where they come from and their effect on phishing market.
Slide 5
How does a standard phishing attack, based on a phishing kit, look?
First of all, the attacker buys a compromised server (or use a hosting services) and uploads a phishing kit to the server.
Then, the attacker, using a spam service, sends a burst of phishing emails to the potential victims.
The victims fall into the phishing trap by visiting the phishing pages and enter their credentials.
Phishing kits perform processing to credentials and send them to an external email account.
Slide 6
After we understand the basic flow of a phishing attack based on phishing kit, we will present the high level flow of our research.
The first phase of project was to find a source for phishing sites:
We used two different sources to gather phishing kits samples:
OpenPhish Feed https://openphish.com/ for zero-day samples
Pastes from TechHelpList.com for long-live phishing campaigns
2. The second phase was:
We developed a kind of scraper, which gets a list of phishing URLs and retrieves phishing kits from the backend of the phishing server.
For each URL, if phishing site is online and allowing directory listing, we generated list of paths and tried to locate and download phishing kit(s).
3. The third phase is definition and retrieval of the features from the phishing kits and normalization of the data.
4. The fourth stage includes statistical analysis of the extracted features and clustering.
5. The fifth stage results in conclusions based on the previous stage.
Slide 7
From both sources we in total collected more than a thousand phishing kits.
From Openphish we collected about six hundred samples which is 7% of all checked URLs.
From the Pastes we collected more than four hundred kits which is almost 10% of all checked URLs.
Limitations. The main threat to the validity of the statistics presented above is the problem of the “coverage” of the examined kits, i.e., the variety of the recovered kits.
Slide 8
Now let’s discuss the structure of the phishing kits and show some examples, including a Google Docs phishing kit.
Slide 9
The phishing kits contain two types of files: resource files which need to display a copy of the targeted web site, and processing scripts which are used to save the phished information and send it to the phishers.
Slide 10
The following is an example of a common Google Docs phishing kit, which is about 15 percent of our collection.
The resource files contain Google figures and CSS password validation files.
The PHP files are processing files which store and send the stolen information to the attacker.
The majority of phishing kits contain all the resources required to replicate the targeted web site, including HTML pages, JavaScript and CSS files, images and other media files.
This minimizes the number of requests the kit issues to the legitimate site and, thus, the chances of being detected if the target site analyzes incoming requests.
However, a significant part of kits contain links to the target web sites.
Slide 11
The following is the processing code of the Google Docs kit.
The first part checks which email provider was selected by the victim (Gmail, Yahoo, Hotmail, AOL or other).
Then it retrieves victims’ details such as Browser, IP address and using the IP address resolves the geolocation of the victim.
The next part is the building of the phishing results email message.
It’s interesting that the processing code is signed by attacker ‘NoBody.’
The phishing message contains:
Email provider
Email and password of the victim
IP address
Geolocation (city, region, country and country code)
The results message is exfiltrated in two ways, it’s written to file and sent to the attackers’ email address.
If the email provider is Gmail, the victim will be redirected to the next page (verification.php) which will lure the victim to enter his recovery email or phone number which is required by Google to authenticate from an unrecognized device.
The last part is redirecting the victim to the legitimate landing page of Google Drive.
Slide 12
Following the example of Google Docs phishing kits, we can now talk about the main capabilities of the phishing kits.
Slide 13
One of the main functions of phishing kits is to automatically send phished information to the attackers.
The vast majority, ninety eight percent, of kits use email accounts to send data to the attackers.
2 percent save directly the collected information on the server.
Slide 14
But what happens when buying from a thief?
About of 25 percent of kits contains an implicit recipient which receive emails with the phishing results as well as the intended recipients.
We saw multiple techniques to hide the authors’ email addresses, but the popular few are address obfuscation and repeated mail statements.
More info:
For obfuscation of emails kits writers use a variety of techniques, ranging from standard encoding and compression algorithms to simple, custom cryptographic methods.
Base64-encoding is a popular obfuscation choice. The email address is encoded using its base64 representation and the built-in base64_decode() function is used to retrieve its original value.
Another commonly-used encoding is ASCII. In this case, the address is obfuscated by substituting each character with the corresponding ASCII value, typically in hexadecimal format.
A function mapping a value to the corresponding character (e.g., the built-in pack() function) is then used to recover the email address.
Among custom techniques, obfuscations based on Caesar ciphers are popular. Each letter of the email address is replaced with the letter that is some fixed number of positions further down in the alphabet. Another common technique is the use of simple permutations.
Slide 15
We have also seen how attackers are trying to implement techniques to block unwanted access to their phishing kits, as they may want to prevent Google, Yahoo, or security company bots from finding them.
Some of the techniques include:
.htaccess files with a list of blocked IP addresses related to bots from search engines and security companies.
robots.txt files that are used to prevent search engine or security company bots from accessing specific remote directories.
PHP scripts that dynamically check if the remote IP address is allowed to access the phishing pages. These scripts are often included as part of the phishing kit.
17% of deployment kits contain techniques to block unwanted access to their phishing kits in order to avoid detection by security companies and index services.
Slide 16
The next technique is black list evasion which is based on redirecting each new victim to a newly generated random location.
Slide 17
The last part of the presentation contains our similarity analysis of the phishing kits.
Slide 18
Firstly, let’s describe our research method.
We extracted features that characterize phishing kits
We made statistics on certain features of phishing kits
Then we performed unsupervised machine learning on the extracted features
The features we chose:
Files list (metadata of phishing archive)
Author’s signature (results processing file)
Subject (results email)
Sender (results email ‘from’ header)
We perform clustering of data, when every cluster of kits has at least one of the features in common:
Extracting features that characterize phishing kit
Cleanup and normalizing features’ data
Clustering of data when every cluster of kits has at least one of the features in common
Slide 19
The clustering results:
We got in total 230 clusters
19 big clusters (with at least 10 kits in each cluster) covering 53 percent of the data
48 clusters (with at least 5 kits in each cluster) covering 73 percent of the data
118 clusters (with at least 2 kits in each cluster) covering 89 percent of the data
14% of the kits have four identical features
39% of the kits have at least three identical features
56% of the kits have at least two identical features
14% of the kits have four identical features
25% of the kits have three identical features
16% of the kits have two identical features
Slide 21
The following are general statistics on the author feature.
Slide 22
We searched for one of the popular signatures and found a few interesting sites.
Slide 23
The following are general statistics on the buyers feature:
8% of buyers appear in at least three different kits (represent 23% of kits)
24% of buyers appear in at least two different kits (represent 46% of kits)
Slide 24
In conclusion, kits’ authors minimize the effort and risks associated with deploying the phishing site and attracting victims, and maximize their return on investment by harvesting the work of unaware users.
Slide 25
Click here to subscribe to the Imperva blog for more details on phishing, as well as other application and data security trends: https://www.imperva.com/blog/