Machine Learning Cybersecurity: Harnessing the Power of ML to Safeguard Against Threats

SafeDNS processes several billion requests every day, and our database is updated with new websites on a daily basis. By using DNS filtering, we make sure that even if a user clicks on a phishing link, they will be redirected to a blockpage instead of a malicious resource.

We strive to make our service reliable and high-quality. The SafeDNS machine learning department employs various machine learning algorithms to verify and categorize domains. To ensure stable and high-quality filtering, we employ different approaches and utilize our own and third-party databases.

In this article, we will discuss the basic methods of threat detection, that allow us to handle the majority of cyber threats.

Detecting a domain that is not in our database with artificial intelligence

When you click on a link that is not in our database, our system launches a series of machine learning models for checks.

If we have not encountered the existing domain before, the user will be redirected to a blockpage and the domain itself will be "quarantined" until it is assigned a category. The page content is then scanned.

Based on this scan and a series of additional checks, the website is assigned a specific category. On the SafeDNS website, you can verify a link before clicking on it.

When the category is known, the system acts according to established rules, either blocking access or allowing the user to visit the website.

Parked Domains

A parked domain is a domain that is listed for sale. This means that the webpage associated with it either has minimal or no content because it is, for example, under development or will soon be transferred to another owner.

In addition to cross-checking other sources, the ML department monitors the NS (Name Server) addresses of most registrars. Parked domains are often gathered on separate NS servers, and we have the addresses of most of these servers. During domain analysis we look at where it is resolved, and if it is resolved to a dedicated NS server for parked domains, we assume it is a parked domain and assign it a corresponding category.

The threat posed by parked domains is that they can be acquired by malicious actors at any time and used to distribute malware or control botnets. Therefore, they should always be closely monitored.

Our algorithm regularly checks parked domains for content. If malicious content is detected on a website, it will immediately be categorized accordingly and blocked upon attempted access. The website can be assigned any other category based on its content.

Phishing

Phishing is a form of fraud in which a malicious actor attempts to obtain confidential information, such as login credentials or account data, by impersonating a reputable person or entity through email or other communication channels. According to the Forbes phishing is one of the most prevalent types of cybercrimes with over 500 million phishing attacks reported in 2022. For perspective, that’s over double the number of reported attacks in 2021.

The messages may appear similar to the ones you have received before. It could be an email from a bank, a ticket aggregator, or a notification from a social network. The message contains malware designed to infiltrate the user's computer or a link to malicious websites to deceive and obtain account or credit card information.

Phishing is popular among attackers because it is easier to deceive someone into clicking on a malicious link that appears genuine than to try to breach computer security systems. Attackers disguise their messages to resemble content from various companies using logos and slightly altered phishing links that may differ from the original by just one letter. For example, "gogle.com" or "facebook.me." Knowing that users may already be suspicious of such links, the malicious link may be embedded in a button, making the actual address invisible at first glance. However, in such cases the malicious resource will still be blocked. If you are reluctant to click on a link, you can copy the link address and verify it using our verification service.

Phishing links can be detected using ai and machine learning methods and natural language processing. The first step is to check for typosquatting.

There are known methods for creating phishing domain names that resemble legitimate ones but are actually different. This is called typosquatting. Typosquatting occurs when fraudsters intentionally use typos or similar characters to create domain names that look almost identical to the genuine ones.

Some typosquatting techniques include skipping, repeating, adding, or rearranging characters in the domain name. They may also substitute characters with visually similar but distinct characters or use characters located near each other on the keyboard.

To detect typosquatting-based phishing domain names, analysis of the web address and domain name is conducted. Specific patterns characteristic of fraudulent web addresses are sought. This approach allows for quick domain name checks without the need to load the content of the web page.

As a baseline method for detection, we use the Levenshtein distance. It is more likely for scammers to impersonate a domain associated with a well-known brand rather than a niche company. We take a comparison base of the top most popular domains, totaling around a million. We use a metric that measures the absolute difference between two character sequences. It is defined as the minimum number of single-character operations required to transform one character sequence into another.

We establish a threshold value for this distance. Then, we analyze the incoming domain name by calculating the distance to the original names. The smaller the distance, the more similar the domains are, indicating a higher likelihood of phishing.

In the example of Facebook.com and Facebok.com, the Levenshtein distance is 1. In comparison to the original, it would be 0.

The second stage involves checking other indicators such as domain age, external sources, content analysis, and more. When a combination of factors suggests the site is a phishing one, it is categorized accordingly, and the user will see a blockpage when attempting to access it. Knowing that a specific domain name belongs to the phishing category, all pages of that site will also be blocked.

Ransomware

Ransomware (or a ransomware program) is a type of malware that prevents or restricts users’ access to their system by either blocking the system screen or blocking users’ files until a ransom is paid. As in the case of phishing, when using our service, the user will not be able to click on the link where the ransomware program is located if this site is in our categorization database. It should be noted that attacks using ransomware can be large-scale. If the company and ransom amount are big, the attackers will carefully prepare for the attack taking into account the specifics of the security systems of said company. This is why we strongly recommend applying a set of cybersecurity measured and regularly conducting trainings with employees.

We need to mention that the volume of ransomware attacks dropped 23% in 2022 compared to the previous year. However, the nature of the attacks has changed and they have become more effective.

Botnets

A botnet is a network of compromised computers infected with malicious software. Cybercriminals use botnet networks consisting of a large number of devices for various malicious activities without the users' knowledge.

Here's how our system works in dealing with botnets:

  • We analyze user traffic and look for requests to known botnets.
  • If connections to command-and-control servers or infected nodes are detected, we consider the traffic from that user/device suspicious.
  • We identify unknown domains in the traffic, which also fall under suspicion by default.
  • The remaining domains are categorized as botnets, and attempts to access such websites are blocked.

In addition, SafeDNS monitors the volume of requests from users. If we observe sudden spikes in network traffic, network administrators will receive notifications about suspicious traffic growth.

In Q1 2023, attacks witnessed a significant 47% surge compared to the same period in the previous year. This rise was accompanied by a shift towards botnet utilization and an increasing prevalence of smokescreening techniques to conceal multi-vector incidents. Notably, the use of attacks as decoys rose by 28% in comparison to Q1 2022.

DGA

DGA stands for Domain Generation Algorithms. These are algorithms found in various families of malicious software that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers.

SafeDNS also tracks this category. The detection process is somewhat similar to typosquatting detection. We analyze the domain name by breaking it down into n-grams and compare how closely they match the pool of n-grams from known valid and "white" sites. We also use a range of other parameters for verification. In all cases, the age of the domain plays an important role in our assessment and trust in it.

In this category, there is a risk of blocking technical domains because their names are often generated randomly. To avoid this situation, we additionally check the content on the page before assigning it to a specific category. The age of the created domain is also taken into account. The younger it is, the higher the likelihood that it is a DGA-generated domain.

Cryptojacking

Сryptojacking is a type of cybercrime where a criminal secretly uses a victim's computing power to generate cryptocurrency. Over the past years, there has been a sharp increase in cryptojacking cases. In the SonicWall Cyber Threat Report for 2023, researchers from SonicWall Capture Labs reported a 43% increase in cryptojacking attempts compared to the previous year, 2022.

Cryptojacking can remain unnoticed for a long time, as it often targets IoT devices, many of which are easily compromised due to the use of unprotected public networks.

If traffic to cryptojacking-related domains is detected, such traffic is blocked, and all unfamiliar domains are added to the database for verification.

Conclusion

All the threats described above as a rule exhibit fairly typical patterns of cyber attacks. For more targeted threats, we have the Passive DNS service, which helps cybersecurity specialists draw conclusions about potential threats.

We store and gather a history of domain changes, as well as the information about the IP address a particular domain belongs to, along with other relevant information. Based on this, we establish connections between nodes in the global network.

When a new domain enters our database, we compare its registered IP address, connections, and patterns with those in our database. If we see that the IP address has already been compromised (associated with or previously owned malicious domains), all other domains from that IP will be added to the "suspicious" website database and checked for malicious content.

Passive DNS historical data also enables security teams to identify patterns of malicious activity, detect phishing attacks, and other targeted threats.

Passive DNS helps identify patterns and enables predictive analysis for attack detection. At first glance, you can discover useful information about a domain. For example, you can view the date of the A record modification and identify changes in the A record.

The domain database is enriched from several dozen external sources, with cross-checks of data. The database is also replenished from daily user traffic. We constantly seek new sources of information in the field of child protection and cybersecurity and actively collaborate with data scientists, government regulators, safe internet associations, and technology companies.

It is worth mentioning that we actively collaborate with government organizations in the field of child protection. We implement lists from IWF (UK), BPjM (Germany), ARAHNID (Canada), CTIRU, as well as data from over 100 private and government organizations. We help companies comply with legal requirements and regulations.

The use of DNS filtering is recommended by the CISA. It serves as the first and effective layer of protection for your company's network against malicious resources. Using DNS filtering in addition to other cybersecurity solutions significantly reduces the risks of data leaks and cyberattacks.