Big Data Analysis with Astraea Technology
Astraea technology makes up the key “cloud cyberbrain” of the Kaspersky Security Network (KSN) —another element of Kaspersky Lab’s multi-layered, next generation protection.
The system aggregates all the collected statistics and meta-information about suspicious activities and threats worldwide in real-time, and produces detection decisions towards malicious objects. Then this information becomes immediately available to all users through Kaspersky Security Network.
Everyday more than 80 million users benefit from using Kaspersky Security Network cloud service. Kaspersky Labs’ products request and receive information on the reputation of requested objects, and participate in sharing statistics with meta-information about suspicious objects. This results in a stream of hundreds of millions of notifications and hundreds of gigabytes daily.
All of this data is forwarded to an expert filtering and detection system called Astraea. The system verifies the incoming data for consistency to prevent any even hypothetical attempts of data manipulation. Then the data is accumulated into a big data database of objects like files, URLs, etc with corresponding meta-information and interlinks between them.
For example, a product could send information about a suspicious object, like:
- Object 0xc9e13b88a6f745096f7cf4b232aad4d41054b32d464c5bed95aa7de216bc22a0
- name of the object is “revised invoice and packing list.docx.exe”
- the object is located in archive “revised invoice and packing list.docx.zip”
- the object was started from filepath c:\windows\temp
- the object is not signed
- etc.
After aggregating the incoming information, it is possible to generate knowledge like:
- When a particular file becomes known in the world
- Full list of URLS where the file was downloaded from, or to what it requested to
- Full list of paths where it was ever stored on disk
- Full list of detects against the file, if they happened
- Full list of processes that started the file
- File prevalence and its change over the time
Each object is verified against large list of indicators created by experts and expert systems. For example, it could be important to check:
- If the file has double extension by the moment of run (“MyPhotos.jpg .exe”)
- If file is located in folder C:\Windows\System32, although is packed and has file attribute “hidden”
- If file has one of outdated extension (say, “.com” , “.pif”, etc)
- If filename is very similar to a trusted system file, with just single difference (say, “svcnost.exe”)
- If file was downloaded by object which is already known as malicious
- etc.
Passing the list of rules, each object gains a calculated object risk score, which Astraea uses to make an expert decision on whether the object is malicious or not.Therefore the more information about an object is collected, the more precise automatic conclusion could be made. It is clear that in some cases it could become still not enough information about the object to make a verdict. If this is the case, the rating will be recalculated later after extra information collected.
Once Astraea generates its verdict on object, it transfers this to the Kaspersky Security Network cloud service, enabling it immediately reach users all over the world
It is important to note that the system logic is not static--the system is permanently self-trained. In the world where malware writers always verify their code against detection by security solutions and weaponize it by new techniques, system of indicators could become non-actual and easily lead to decrease of efficiency in detection rate and increase of false positives. This means the indicators separately and the list of them in whole should be tested for efficiency and updated dynamically based on information collected from Kaspersky Lab’s database and expert knowledge.
Since its start in 2012, percentage of detections created by Astraea against the total number of new detections increased from 7.53% to 40.5% by end of 2016 (323,000 new detections daily), with a total of one billion unique malicious files.