Introducing Behavior Graphs in Joe Sandbox 13

    We are proud to release today Joe Sandbox 13! The 13 release includes a couple of very cool new features, including:

    • Support for Windows 10
    • 70 new behavior signatures
    • Analysis advice signatures
    • Static unpackers for VBE and SWF
    • Live system performance statistics in the web interface
    • COM Analysis
    • String analysis in compressed files
    • Static file analysis for Flash
    • Static PE file analysis for dropped / downloaded files
    • New tricks to prevent VM-detection
    • ASN detection for IPs 
    • Code obfuscation detection for Hybrid Code Analysis (HCA)
    • Behavior graphs
    • Hybrid Decompilation (HDC) Plugin
    • Big performance improvements

    Beside of the Hybrid Decompilation (DEC) Technology we have also developed a new feature called Behavior graphs. Behavior graphs are new graphs which display the behavior of a sample. They show processes, IPs, domains, dropped files as well as behavior signatures in a connected graph. The graph coloring is very simple and intuitive while the format is clean and well structured.

    We also invested lot of brain power into shrinking and compressing the graph so that it stays small and clear.

    Below you find several behavior graphs with the corresponding Joe Sandbox analysis report:

    Pure Innovation: Hybrid Decompilation with Joe Sandbox DEC

    Joe Security is proud to announce its latest innovative technology - Hybrid Decompilation (HDC). This unique new feature builds upon Hybrid Code Analysis (HCA) to empower the malware analyst with extensive code analysis capabilities. Existing Joe Sandbox reports already include a hybrid low-level disassembly for each relevant function found during the analysis, which combines information from both static and dynamic analyses. Thanks to the Joe Sandbox DEC plugin implementing HDC, Joe Sandbox reports can now also display an equivalent C high-level source code representation for each function, which constitutes a huge boost to the process of reverse engineering.

    Here is a very simple report extract to illustrate the purpose of this Hybrid Decompilation feature.
    Suppose we have the following disassembly for a function in a Joe Sandbox report:

    Joe Sandbox DEC will generate the following corresponding C source code:

    E00406D20(CHAR* _a4) {
     long _t4;
     void* _t6;
     _t6 = CreateMutexA(0, 1, _a4);
     _t4 = GetLastError();
     if(_t4 == 0xb7) {
     if(_t6 != 0) {
      return ReleaseMutex(_t6);
     return _t4;

    As seen in this example, the source code highlights the function parameters and local variables, and makes its control structures and function calls much more explicit.


    Decompilation 101

    The process of translating machine or assembly code to source code is called decompilation. From a high-level perspective, decompilation is the reverse of compilation: it starts with low-level machine code and builds a higher-level representation in several incremental stages. Decompilation is usually much more efficient and gives better results if it can use symbolic information found in the binary file or in associated debug files.
    Decompilation can also be seen as a natural extension to disassembly, and indeed the first stage of a decompilation engine is a disassembler. But besides the usual difficulty of code and data separation during disassembly, the decompilation process must also solve the following issues:

    • Rebuild function prototypes and infer local variables while getting rid of register and stack references.
    • Generate high-level control structures (if, switch/case, do/while/for loops) from basic jumps and compares, discovering calls to known APIs and libraries (such as PE file imports).
    • Retrieve high-level type information (including compound types such as structures and unions). 
    • Assign the correct arguments to function calls.

    This schema presents the global architecture of a generic decompilation engine:

    Decompilation builds on techniques developed initially for compilation, such as control and data-flow analyses, register allocation, loop transformation and alias analyses. But decompilation has its own challenges and it is usually considered extremely difficult to automatically decompile an arbitrary machine code, and even more so for obfuscated malware code for which no symbolic information is available. Keeping this in mind, the goal of Joe Sandbox DEC is to provide the user with a fast decompilation of the most relevant functions found in the analyzed sample, together with a measurement of the quality of the decompilation.

    Hybrid Decompilation

    Compared to a generic decompilation engine, Hybrid Decompilation introduces three powerful features:

    • Instead of running on the initial PE file, which may be packed or contain hidden code, HDC runs on PE files generated from dynamic memory snapshots which give an accurate picture of the code which is actually executed.
    • HCA provides input information to HDC such as known Windows API function calls, discovered used string values and statement execution status. This is akin to retrieving symbolic information and is very useful for achieving better decompilation results.
    • HDC has an extensive knowledge of Windows API types and function prototypes, thus enabling the use of high-level types in the output source code files.

    These features make HDC a big improvement over a purely static decompilation engine:

    • "Better decompilation code coverage": all function entry points discovered by the powerful heuristics of HCA are made available as decompilation entry points.
    • "Better decompilation quality": in particular, knowledge of indirect call targets as provided by HCA makes decompilation both faster and more precise.
    • "Decompiled source code commenting": observed runtime information such as statement execution status and variable value can be added to the decompiled source code in the form of comments.

    Some Hybrid Decompilation Source Code Outputs

    Let us now have a look at some actual examples of HDC-generated C source codes to get a taste of the power of Hybrid Decompilation.
    The first decompiled source code is extracted from the sample studied in blog post

    E0040912A(void* __edi, void* __eflags, long _a4) {
     long _v8;
     long _v12;
     long _v16;
     struct _SYSTEMTIME _v32;
     void* _t13;
     void* _t17;
     void* _t28;
     signed int _t29;
     void* _t30;
     void* _t32;
     CHAR* _t35;
     _t32 = __edi;
     _t35 = _a4;
     GetSystemTime( &_v32);
     if(_v32.wMonth >= 0xb && _v32.wYear >= 0x7da) {
      ExitProcess(0); // executed
     _t13 = E004070C0();
     _t40 = _t13;
     if(_t13 != 0) {
      E00408A06(_t29, __eflags, _t35);
     } else {
     E004084F7(_t29, _t40, _t35);
     _t41 =  *0x4011e8 - 1;
     if( *0x4011e8 == 1) {
     E00409029(_t28, _t30, _t32, _t41);
     _t17 = E00408220();
     if(_t17 != 0) {
      return _t17;
     } else {
      if(_v32.wMonth >= 7 && _v32.wYear >= 0x7da) {
       CreateThread(0, 0, E004098A0, 0, 0,  &_a4);
       CreateThread(0, 0, E00407180, 0, 0,  &_v8);
       CreateThread(0, 0, E00407230, 0, 0,  &_v12);
       if( *0x4011dc == 1) {
        CreateThread(0, 0, E00407A80, 0, 0,  &_v16);
      goto L14;

    The source code really highlights the condition on the system time under which the sample immediately terminates (see lines 18 to 20). Thanks to HDC, the comment of line 20 gives us the information that the evasive behavior has been triggered for the analyzed sample’s run. Still, the static component of Hybrid Decompilation gives information about what occurs in the non-evasive case. In particular, several thread creations may occur at lines 44-48, and the corresponding calls to CreateThread have explicit call arguments including the reference to the function executed by the new thread.
    Our second decompiled source sample is a function belonging to a PE file dropped by the Rombertik malware (see analysis

    E00401960() {
     void _v5;
     void _v6;
     void _v7;
     int _v12;
     int _v16;
     char _v280;
     long _v292;
     long _v308;
     void* _v316;
     char _v572;
     char _v828;
     char* _t36;
     char* _t39;
     void* _t42;
     intOrPtr* _t44;
     int _t46;
     intOrPtr* _t47;
     intOrPtr* _t51;
     int _t54;
     void* _t59;
     _Unknown_base(*)()* _t62;
     void* _t69;
     void* _t71;
     void* _t74;
     int _t77;
     int _t79;
     long _t82;
     void* _t94;
     void* _t98;
     void* _t99;
     void* _t100;
     _v316 = 0x128;
     _t77 = 0x100;
     _t36 =  &_v828;
     goto L1;
     _t79 = 0x100;
     _t39 =  &_v572;
     do {
       *_t39 = 0;
      _t39 = _t39 + 1;
      _t79 = _t79 - 1;
     } while (_t79 != 0);
     _v12 = 0x100;
     CryptStringToBinaryA("aWV4cGxvcmUuZXhl", 0x10, 1,  &_v572,  &_v12, 0, 0);
     while(1) {
      _t42 = CreateToolhelp32Snapshot(2, 0); // executed
      _v12 = _t42;
      Process32First(_t42,  &_v316); // executed
      do {
       _t44 = "chrome.exe";
       do {
        _t44 = _t44 + 1;
       } while ( *_t44 != 0);
       _t46 = StrCmpNA( &_v280, "chrome.exe", _t44 - "chrome.exe"); // executed
       if(_t46 != 0) {
        _t47 =  &_v572;
        if(_v572 == 0) {
         if(StrCmpNA( &_v280,  &_v572, _t47 -  &_v572) != 0) {
          _t51 = "firefox.exe";
          do {
           _t51 = _t51 + 1;
          } while ( *_t51 != 0);
          if(StrCmpNA( &_v280, "firefox.exe", _t51 - "firefox.exe") != 0) {
           goto L39;
          _t99 = OpenProcess(0x1fffff, 0, _v308);
          if(_t99 == 0) {
           goto L39;
          _t59 = GetProcAddress(GetModuleHandleA("kernel32.dll"), "CreateFileW");
          if(_t59 == 0) {
           goto L39;
          _v6 = 0;
          if(ReadProcessMemory(_t99, _t59,  &_v6, 1, 0) != 0 && _v6 != 0xe9) {
           _t62 = E00402690();
           _t100 = _t100 + 8;
           if(_t62 != 0) {
            _t94 = CreateRemoteThread(_t99, 0, 0, _t62, 0, 0, 0);
            if(_t94 != 0) {
             WaitForSingleObject(_t94, 0xffffffff);
          goto L38;
         if(E00402AE0( &_v828) == _v292) {
          goto L39;
         _t99 = OpenProcess(0x1fffff, 0, _v308);
         if(_t99 == 0) {
          goto L39;
         _t69 = GetProcAddress(LoadLibraryA("Wininet.dll"), "HttpSendRequestW");
         if(_t69 == 0) {
          goto L38;
         _v5 = 0;
         if(ReadProcessMemory(_t99, _t69,  &_v5, 1, 0) == 0 || _v5 == 0xe9) {
          goto L38;
         } else {
          goto L35;
        do {
         _t47 = _t47 + 1;
        } while ( *_t47 != 0);
        goto L20;
       _t71 = E00402AE0( &_v828);
       _t82 = _v292;
       if(_t71 == _t82) {
        goto L39;
       _t99 = OpenProcess(0x1fffff, 0, _t82);
       if(_t99 == 0) {
        goto L39;
       _t74 = GetProcAddress(LoadLibraryA("Ws2_32.dll"), "WSASend");
       if(_t74 == 0) {
        goto L38;
       _v7 = 0;
       if(ReadProcessMemory(_t99, _t74,  &_v7, 1, 0) == 0 || _v7 == 0xe9) {
        goto L38;
       } else {
        goto L35;
       _t98 = _v12;
       _t54 = Process32Next(_t98,  &_v316); // executed
      } while (_t54 != 0);
      if(_t98 != 0) {
       CloseHandle(_t98); // executed
      Sleep(0x1388); // executed
      *_t36 = 0;
     _t36 = _t36 + 1;
     _t77 = _t77 - 1;
     if(_t77 != 0) {
      goto L1;
     } else {
      _v16 = 0x100;
      if(CryptStringToBinaryA("ZXhwbG9yZXIuZXhl", 0x10, 1,  &_v828,  &_v16, 0, 0) == 0) {
       _v16 = 0;
      goto L4;

    The decompiled source code makes it clear that this function is in charge of enumerating all processes (infinite while loop starting at line 48), and to look for browser names such as “iexplore.exe” (call to StrmCmpNA at line 62, the browser name is Base64 encoded using the call to CryptStringToBinaryA on "aWV4cGxvcmUuZXhl"at line 47), “chrome.exe” (line 57), “firefox.exe” (line 67). Once a process corresponding to a particular browser is found, the function tries to create a hook in the browser memory loaded DLLs: different functions starting addresses are used for that purpose (CreateFileW for Firefox at line 74, HttpSendRequestW for Internet Explorer at line 104, and WsaSend for Chrome at line 131). Once a suitable address has been found for the hook (calls to ReadProcessMemory at lines 81, 109 and 136), the actual hook injection is performed with a call to CreateRemoteThread at line 88.

    Our last decompiled source code example is extracted from the Dyre Banking Trojan. This malware achieves persistence by registering as the “Google Update” system service using the following function:

    E00402900(short* _a4) {
     signed int _t2;
     void* _t5;
     int _t13;
     void* _t20;
     void* _t25;
     _t2 = OpenSCManagerW(0, 0, 2);
     _t20 = _t2;
     if(_t20 != 0) {
      WriteConsoleW(0, 0, 0, 0, 0);
      while(1) {
       _t5 = CreateServiceW(_t20, L"googleupdate", L"Google Update Service", 0xf01ff, 0x10, 
                            2, 1, _a4, 0, 0, 0, 0, 0);
       if(_t5 != 0) {
       if(RtlGetLastWin32Error() != 0x431) {
        return CloseServiceHandle(_t20) | 0xffffffff;
       } else {
        _t25 = OpenServiceW(_t20, L"googleupdate", 0xf01ff);
        if(_t25 == 0) {
         goto L7;
        } else {
         _t13 = DeleteService(_t25);
         if(_t13 != 0) {
         } else {
          goto L7;
       goto L9;
      return 0;
     } else {
      return _t2 | 0xffffffff;

    Once a handle to the service manager is obtained (lines 8-10), the sample tries to create a “Google Update Service” (line 13) in a loop starting at line 12. If it manages to do so, it exists the loop (line 16), otherwise it checks whether the service creation of line 13 fails with a ERROR_SERVICE_EXISTS error code 0x431 (line 18). If this is the case, it tries to delete the existing service (lines 22 to 27) then loops to restart the malicious service creation (line 29).


    Thanks to its Hybrid Decompilation technology, Joe Sandbox DEC outputs a decompiled function which is much more readable than the associated disassembly, and thus gives a quick and precise insight about the function's functionalities. As a whole, the process of retro-engineering a complex malware is made more efficient by pinpointing hard to decompile functions and let the analyst concentrate on their study by falling back on the still available disassembly code only when necessary.

    Meet Joe Security at IT-SA Security Expo 6. - 8. October in Nürnberg

    This year Joe Security is going to show its products and technologies at IT-SA Security Expo in Nürnberg, Germany. IT-SA is one of the biggest Security Expos and Conferences in Europa with over 390 exhibitors. 

    Looking to meet the inventors and engineers behind Joe Sandbox? Don't miss this opportunity and visit us at Stand 29.0 in Hall 12.0!

    Looking forward to see you in Nürnberg!

    -- The Joe Security Team

    Hacking Team inspired Anti-VM Trick spot in the Wild

    Two days we came across an interesting sample (MD5: 9437eabf2fe5d32101e3fbf9f6027880, source: ThreatWave). The sample has been unknown at this time and also did not look interesting from a dynamic behavior analysis perspective. However there were some tiny outliers which brought attention to us:

    We first ran the sample on a virtual machine. The overall score was suspicious but some of the behavior signatures (up to now Joe Sandbox's Behavior Signature set includes over 850 signatures) detected several anti-VM, anti-sandbox and anti-debugging tricks.

    To verify the sample has detect the virtual machine we run the sample on a native analysis machine. A native analysis machine is a pure physical machine like a real laptop or PC. All our products including Joe Sandbox Cloud enable to analyze on physical machines. Compared to virtual machines or emulators (e.g. QEMU or BOCHS) physical machines cannot be easily detected. In addition, you can use directly an existing laptop or PC from your (company) network environment for analysis. This is a perfect malware analysis system since there is no difference to a target system. Some analysis results from the analysis run on the physical machine:

    As the report cuttings outline, the sample has persisted itself and also shown some very interesting network behavior. We analyzed the anti-VM, anti-sandbox and anti-debugging tricks in more depth. Here is a list of them:

    • HKEY_LOCAL_MACHINE\HARDWARE\DEVICEMAP\Scsi\Scsi Port 0\Scsi Bus 0\Target Id 0\Logical Unit Id 0  Identifier
    Another interesting trick used by the malware is checking for PCI devices unique to virtual machine hardware:

    What actually is compared are the device strings (PCI vendor IDs) VEN_80ee (Virtualbox), VEN_1ab8 (Parallels) and VEN_15ad (VMWare). This detection seems to be very similar to the one used by Hacking Team and also recently added to Pafish:

    We have updated all our products to evade this detection on virtual machines. Some full Joe Sandbox 12.5.0 Analysis:

    The Power of Execution Graphs 2/3


    This is the second part of our three-part “Power of Execution Graph” blog series. The first part which introduces Execution Graphs can be found at here.

    As you may recall, Execution Graphs are highly condensed control flow graphs, showing information about which part of the code has been executed and which not. Execution Graphs highlight additional attributes such API calls, threats starts, and key decisions.

    Analyzing Packers

    In this blog post, we are going to focus on an interesting sample we already have analyzed previously with pure Hybrid Code Analysis (HCA). The sample includes various sandbox detection tricks including one trick to identify specifically Joe Sandbox. In the following text, we outline how to spot these tricks by using Execution Graph.

    The analyzed sample relies on packing and encryption as a first layer of evasion. This technique is quite challenging to inspect manually from the PE file and generally poses a major problem for static analysis approaches. Hybrid Code Analysis is resilient against packing and therefore facilitates the analysis of unpacked code.

    Let us start with having a look at the Execution Graph summary tab:

    The first important striking fact is that 99% of the code is tagged as “Dynamic/Decrypted”. When looking at the prefix of the Execution Graph in some detail, we notice the following:

    • The code starts by allocating dynamic memory using NtAllocateVirtualMemory native API calls.
    • Once the allocation is performed, the code reaches the node labeled 401065 which is flagged as Unpacker code. At this point, the code is written into the previously allocated memory sections.
    • After execution the unpacker code then branches to the dynamically generated code.

    By clicking on the node 401065, we can check that it indeed contains unpacker code:

    Checking the basic blocks leads us to the unpacker code itself:

    Similarly, by clicking on the following node 164a00, we can see that the corresponding code is located in a dynamically allocated memory (as almost all nodes reachable from this point):

    Please note that according to the status dynamic or unpacked code status the nodes are highlighted with different color:

    There are three different branches starting from node 401065. By clicking on node 401065, (which covers several basic blocks) then following the hyper-link to basic block 4010CC, we jump to the following disassembled code:

    The computed call call edx at the virtual address 04010E5 represents the execution branching to the unpacked code. 

    Sandbox Evasion

    The Hybrid Code Analysis found three potential targets represented by the three target nodes, while the executed code starting at node 164a00 is the most interesting with respect to its behavior.

    The sub-graph below outlines the various evasion tricks:

    • The sample first checks the its file name (call to GetModuleFileName) and may stall if it looks suspicious (e.g. in this case a file name like “sample”) by sleeping (branch to node 164838).
    • After checking the serial ID information of the volume C: (call to GetVolumeInformationA), it may stall again if it matches a given magic value (sandbox detection via disk serial number).

    Here both evasions fail since the execution proceeds as illustrated by the node coloring.
    Later in the code, at node 1548dc the sample tries to detect if it is being run on a virtual machine. To do so it reads the disk names via registry System\CurrentControlSet\Services\Disk\Enum and compares to well-known products names such as VMWare. Checking disk names of virtualization products is a well-known anti-VM trick which we see in nearly 70% of all samples.  

    Finally, one last check has more success and the execution ends up stalling in an endless Sleep loop:

    Selecting the key-decision node 16499d shows us the disassembly, which indicates the trick is related to the registry key AutoItv3CCleanerWIC:

    The code enumerates all software uninstallers. This enables to collect a list of all installed software on the machine. The fingerprint AutoItv3CCleanerWIC is then used to check if AutoIt, CCleaner and WIC are installed. If true the sample falls asleep. AutoIt and CCleaner are two additional software we often install on machines to make administration more easily. Likely, the guys behind this malware were extracting the fingerprint by using our free Joe Sandbox Cloud Basic online service.

    Process Injection

    Beside the evasion tricks, the Execution Graph can also be browsed for finding hidden / non executed functionalities, in the form of suspicious sub-graphs. Here is an example:

    This sub-graph outlines a remote process injection technique, which has not been executed during analysis but still can be found easily in the graph. The various edges to the lower nodes are error handling, e.g. if CreateToolhelp32Snapshot fails then CloseHandle is directly called. The code is quite extensive and spans several functions. Thanks to the condensed and connected graph it is easy to detect and understand.

    An Execution Graph often consists of a main graph as well as several independent graphs:

    The main graph contains executed nodes (marked as red) while the independent graphs do not have any executed code. The reason behind this is the difficulty to generate a completely connected graph. E.g. consider the non-executed instruction call eax, where eax is previously computed. It is not possible to determine which code location is being called. 

    In order to focus on the main graph, we added a new feature to hide independent graphs. Simply click on the Hide Nodes/Edges label found at the top-left of the Execution Graph panel to hide independent graphs and focus on the main graph. Click again to restore the full view.

    Graph based Signatures

    Of course, manually browsing through the Execution Graph is not the only way for detecting evasive behavior. Execution Graph Analysis uses an extensive set of behavior signatures to automatically detect evasion tricks. A nice feature we have recently added is the ability to jump to the incriminated Execution Graph nodes from the signature by using the links in the report:


    The sample covered in this blog post uses a large panel of techniques to avoid detection by sandboxes. But thanks to the Execution Graph Analysis, the following information could be quickly obtained:

    • The execution starts by dynamically generating code. The Execution Graph enables to easily find the unpacker code as well as the newly generated code.
    • The unpacked code uses various evasion tricks that Execution Graph Analysis automatically detected and rated as malicious. The evasion tricks can be further analyzed in-depth by navigating from the signature hits to the Execution Graph nodes and from the nodes to the disassembly code.
    • Besides the detection of evasive behavior, the Execution Graph provides a good way of detecting complex malware functionalities (such as remote process injection) in the form of sub-graphs.
    Stay tuned for our last blog post in our Power of Execution Graphs series! 

    Report available at:

    Dynamically Analyze Offices Macros by instrumenting VBE


    As you all know, Microsoft Office documents have become a new attack vector. They allow to easily transfer exploit or dropper code by e-mail to victims by embedding macro code. Since sending executable files such as exe, scr or cpl files as an e-mail attachment is usually blocked, Office documents remain one of the last options. However, a further obstacle is that macros are often disabled on the victims host, so the code will not directly be executed. In order to lure the user to enable macros various social engineering tricks are being used:

    Macros can be analyzed with static analysis very easily. In order to do so one parses the document structure, searches for OLE streams, and then extracts the VBA code:

    Signatures can be used to detect suspicious API calls inside the code:

    Writing static deobfuscator is a dead end

    Such static signatures are part of Joe Sandbox since we have seen such malicious Office documents with macro payloads. As you may guess it did not take long and macro code was no longer easily human readable but source code obfuscated:

    Such obfuscations are simple and work well to evade static signatures on the code. In order to get the clean code one may develop deobfuscators. However, this is a dead end. First, it is always reactive, you have to understand the deobfuscation technique first before you can write a deobfuscator. Second, it is very easy to randomize obfuscations. Finally, it takes time and effort to develop new deobfuscator. For instance, the following code does not use any Chr based string obfuscation but rather a more complex algorithm (checkout that all the variables have names of persons):

    Dynamically Analyzing VBA Code by instrumenting VBE

    The solution to the obfuscation problem of VBA code is dynamic analysis. We have successfully instrumented the Visual Basic runtime interpreter in order to track code execution. We already used the same approach in order to capture Java Script compilation and DOM modification events in the Internet Explorer. This greatly helps to understand obfuscated Java Script and browser exploits:

    The VBE instrumentation we have added to Joe Sandbox allows us to see live VBA data, for instance string decryption:


    Signatures to detect suspicious strings inside decrypted data:

    The cool thing about the VBE instrumentation is that as long as the VBA code is executed it enables  to see everything no matter how sophisticated the obfuscation is. In addition, it enables Joe Sandbox to inspect live execution data for malware written in Visual Basic. Lot of APTs have an crypter or obfuscation stub written in VB.


    Using pure static analysis in the context of deobfuscating source code of script languages is a dead end. It costs a lot of time to develop deobfuscator while it is super easy to randomize or change the obfuscation in order to evade the deobfuscator. Custom dynamic analysis which instruments the script interpreter core does not care about code obfuscation, it sees everything such as decrypted data. This feature facilitates the malware reverse engineering and analysis process, and makes generic detection more sound.

    Full Analysis Report:

    The Power of Execution Graphs Part 1/3


    We have been quite busy and will soon release Joe Sandbox 12. It is so far one of the biggest releases we have made and includes several new features such as:

    • Execution graphs
    • Yara rule generator (see
    • MITM SSL proxy to inspect HTTPS (credits to Daniel Roethlisberger)
    • 63 behavior signatures
    • Behavior signatures to detect unpacked / dynamic code
    • More than 10 behavior signatures to detect evasive behavior
    • Score algorithm with lower FP and FN
    • System event logging
    • Slim PCAPs
    • Per process memory and CPU stats
    In this and two follow-up blog posts we are going to outline a new feature called Execution Graphs. 

    Evading sandboxes is a key feature of today’s advanced threats. To do so malware uses various tricks for checking whether it is running on an analysis system, such as trying to detect if the current system is a virtual / emulated machine or checking whether it is being debugged or analyzed. In such cases, the malware will keep a low profile and avoid exhibiting its actual malicious behavior, potentially evading detection by the malware analysis system. Latest threats also implement generic evasion such as validating user behavior or time and sleep tricks (see blog post and

    Since version 7, released in 2012 Joe Sandbox implements a variety of techniques to prevent or detect evasive malware. This includes execution on native systems, analysis of non-executed functions through Hybrid Code Analysis (HCA), specific signatures for identifying evasive patterns as well as cookbooks. 

    In the last months we have seen a strong increase of more sophisticated evasion techniques in malware which are harder to find. Therefore we have decided to make this topic a key for Joe Security’s research roadmap.  

    Execution Graphs

    One of the new features we added to Joe Sandbox 12 are Execution Graphs. Execution Graphs have been designed to automatically spot evasions but also to help to quickly understand how the malware implements the evasion. 

    In general an Execution Graph is a highly condensed control flow graph with a focus on API-rich paths. Since it is highly compressed it is easier to understand than a full control flow graph. The graph is composed of nodes representing sections of code and edges correspond to the control-flow (call, jmps etc) of the malware. Each node is labeled with the set of API calls it executes. Nodes are colored to highlight additional properties:

    • Yellow: the node is a program / thread entry point or a top level function
    • Orange: the code has been triggered during execution
    • Red: the code has been unpacked and executed
    • Grey / blackish: the code has not been executed
    Different shapes are used for highlighting graph locations. The diamond-shaped nodes correspond to so-called key decision nodes, in the sense that the process decides at this node to avoid execution of a branch which could lead to interesting key behavior. Thus key decision nodes are especially relevant when browsing the execution graph for evasive behavior. Note that determining whether a decision node is key depends on the execution status of the nodes reachable through its branches (one branch should lead to executed APIs, the other to different non-executed APIs), thus different executions may lead to different key decision nodes.

    The following figure shows the initial part of the execution graph for our demo sample (MD5: 0af4ef5069f47a371a0caf22ae2006a6). 

    Notice how the first few nodes after the entry point (colored yellow) have an orange/red color, while the other nodes are grey/black? Recall that red coloring indicates that the corresponding code has been executed, while black is used for non-executed code.

    When zooming in the graph entry node, the following control-flow pattern appears:

    The sample execution graph clearly exhibits a very straightforward evasive behavior: there is a key decision point where the GetSystemTime API is called, followed by another key decision and a call to the ExitProcess API. All these nodes are colored in red and thus are executed: the part of the graph starting at GetVersionExA is not executed (grey and black color): the full execution graph includes a lot of non-executed malicious behavior not shown here. The green edges represent so-called rich paths, which allow the analyst to track the most API intensive paths of the execution graph, independently from their actual execution status.A path is considered to be "intensive" if a lot of APIs are executed which appear in malicious codes. Here the rich path leads to some non-executed part of the graph:

    The blue edges represent thread creations, and the yellow nodes are thread entry points. In the given sample each created thread has its own malicious payload:

    • Thread 4098a0:  its task is to terminate debugging tools and Antivirus. Function 4095e0 is registered as a callback using the EnumWindows API: it enumerates all top-level windows and checks their title against strings such as "avast", "avira" or "kaspersky" among many others. If the title matches the processes is killed instantly.

    • Thread 407230 is in charge of persistence and installation behavior.
    • Thread 407180 spreads its main executable to external drives, since it checks for available system drives and uses API call chains often found in USB drive infection routines (GetDriveType, CopyFile, SetFileAttributes).

    • Thread 407a80: parses remote commands. It is the main payload thread which acts as a broker.

    The structure of the graph as well as all additional properties such as execution coverages or decision nodes are directly passed to the signature interface of Joe Sandbox. This enables to write behavior rules which detect evasive behavior.

    We may navigate between the execution graphs and the corresponding assembly code.  In the case of sample MD5 0af4ef5069f47a371a0caf22ae2006a6, we can determine that the current system time returned by GetSystemTime is checked in the code associated with the key decision nodes, and depending on its value the sample decides to exit the process or continue with execution:

    Same for the command handler found in thread 407a80:


    Execution graphs are a powerful tool for detecting and understanding evasive behavior. Due to its form, coloring and node shapes we can spot evasion pattern very efficiently. Since the graph is reduced and simplified this also works with very complex and extensive codes. The structure of the graph and all attributes are fed to the Joe Sandbox signature interface. Therefore we can easily rate and classify evasive behavior within seconds. Since the graph describes the complete behavior and not just the executed path, any behavior can be rated and classified.

    During development execution graphs already have proven to be very useful. Therefore we will present some of our detection of more complex behaviors / evasion in two additional blog posts. Stay tuned!

    Example Reports for the sample used in the post: