Mace: Model-Inference-Assisted Concolic Exploration for Protocol and Vulnerability Discovery, C. Y. Cho, et al., USENIX Security 2011. Synopsis by Andrew Chi, 2013 December 1 Problem addressed: ================== Symbolic execution can get stuck in small local state-subspaces (e.g., loops in the code). In certain classes of applications such as network protocols, there exists a finite state machine (FSM) that is a good model for the overall behavior of the application. In such cases, the protocol FSM can be used to deepen the coverage of symbolic execution techniques by forcing it to skip to different parts of the state space. The trick, as shown in this paper, is to do this (almost) automatically. Basic approach: =============== 0. Pick implementations of network protocols (i.e., RFB and SMB) where the overall model (FSM) has a good chance of being discoverable from external observation. Manually create equivalence classes of outputs, named "abstract outputs" in the paper. 1. Begin with some seed input/output pairs, such as from a regression test suite. Use L* to build a minimal model of the protocol FSM. 2. Run the DART (?) concolic execution engine, exploring the "vicinity" of the abstract states in the FSM, creating more input/output pairs. Filter out pairs where the output (when mapped to abstract output) has already been seen. 3. Iterate steps 1 and 2 until the FSM model converges. In the meantime, record the input/output behavior of step 2. If there were any crashes (e.g., critical exceptions), record them, and then later deduplicate them based on location of crash. Key insight/innovation: ======================= Pairing concolic exploration with an automatic protocol discovery tool can produce mutual benefit: 1. The concolic exploration can be "seeded" to start at the protocol FSM states, thereby skipping over time sinks such as loops, and increase code/behavior coverage. 2. The automatic protocol discovery is helped by symbolic execution in that the symbolic execution can automatically generate more input/output pairs by taking branches in the code that were not traversed previously. Pros/cons: important problem, assumptions, evaluation, approach =============================================================== One still has the manual work of deciding between equivalence classes of outputs. And this seems like an art -- one is trying to decide which variations in output actually cause traversal of different parts of the overall state machine. The Mealy machine will still be incomplete if there is a closed subspace of messages. As the authors mention, "[MACE] might not discover some types of messages required to infer the full state machine of the protocol". And case in point: they only discover 23 out of the 67 SMB message types. But I guess it's better than before. (4.4) The model assumes that 1 input maps to 1 output. If there are either 0 or >1 outputs, they had to introduce a bit of cludge: artificial no-response message, and ignoring all but the first output message. Many protocols will certainly not be a 1-1 mapping between input and output packets. TCP sends an ACK of the last complete sequence number, so multiple input packets can be ACKed with one output reply packet. DNS zone transfer has diff behavior: send me the list of stuff that I haven't seen yet. The model needs to be stronger. For a complex protocol such as SMB, there is more internal consistency required in the protocol messages than this technique is likely to discover automatically. "The concrete messages generated... often had invalid message parameters, so the server would simply respond with an error." This may be a fundamental limitation. They basically use "does it crash" / "does it hang" to decide whether a vulnerability has been discovered. Actually, many insidious vulnerabilities don't crash the application, but rather cause it to misbehave silently. Questions: ========== Is "dynamic symbolic execution" the same thing as "concolic execution"? Does the phrase "decision procedures" mean "SAT solvers"? Section 3.3: Let s = m_0 ... m_{n-1}. Does s_j = m_j, or s_j = m_0 ... m_j? Their definition was not clear. Figure 5: Isn't the MACE line guaranteed to be at 100%, since it is the only basis for comparison? I find this to be a misleading graph... Ideas for future research: ========================== Any improvement to the issues I list in the "pros/cons" section. One lower hanging fruit might be finding non-crashing security violations, i.e., check a security property more sophisticated than "it doesn't segfault." "Techniques that automatically reverse-engineer message encryption are required." I laughed out loud at this. And yet, they cite a paper (Wang, et al., "ReFormat"). Very curious how this could be possible assuming the crypto is worth anything at all...