Directed Compositional Symbolic Execution for MSP430

Thursday August 31th, 12-1PM @ BA5205

Speaker: Ivan Pustogarov

Title:
Directed Compositional Symbolic Execution for MSP430

Abstract:
As embedded systems become ubiquitous and gradually shift from isolated offline to interconnected online systems, their security becomes a major concern. Embedded systems software is usually written in low-level memory-unsafe programming languages such as C which makes them particularly susceptible to memory corruption vulnerabilities.  In addition, such systems are often equipped with sensors and the firmware controlling them is commonly designed assuming a benign environment which makes them susceptible for signal spoofing attacks.
Symbolic execution is an efficient approach that can help in identifying and understanding these threats: combination of symbolic execution and black-box fuzzing can achieve high code coverage; its ability to automatically generate inputs that drive the program into particular states can be used to better understand possible signal spoofing attacks. Existing symbolic execution tools, however, do not work for firmware code and in contrast to the large body of research for traditional architectures such as x86, there are few tools for lower-end embedded architectures. 
 
In the first part of this talk, I will discuss our experience in building a directed, compositional  symbolic execution framework that targets software for the popular MSP430 family of microcontrollers.  I will give a few more details about our modular approach and  how we tackled interrupt-driven control flow, extensive use of peripheral devices, and hardware-related memory areas that are common for embedded programming and which frustrate traditional symbolic execution tools. I will then describe how we used our tool to partially automate a signal spoofing attack against a recently proposed gesture recognition system to trick the firmware into “recognizing” a gesture of the adversary’s choosing.  In the second part of the talk, I will briefly describe our preliminary work on handling dynamic memory allocation routines
(c/re/malloc) with symbolic sizes that are known to significantly increase the number of execution states if symbolically executed “as-is”.

Bio:
Ivan Pustogarov is a Postdoctoral Researcher at Cornell University where he works on program analysis for security of embedded systems. He received his PhD from the University of Luxembourg with the focus on network security in 2015.
 
His research interests lie in the area of system security and center on program/binary analysis for embedded systems. His recent project focused on developing program analysis tools and techniques for the popular MSP430 family of microcontrollers. His research on network security includes practical low-resource off-path attacks on the Tor and Bitcoin P2P networks. The flaws described in his publications had a direct impact on users’ security and caused redesign of parts of the Tor protocol and its core code. His research has been published in S&P, CCS, AsiaCCS, ESORICS etc., and received attention from the media including BBC, The Register, Dailydot, and others.

How to Learn Klingon Without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy

Thursday Aug 3rd, 12-1PM @ BA5205

Speaker: Prof Haixin Duan from Tsinghua

Title:
How to Learn Klingon Without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy

Abstract:
Online underground economy is an important channel that connects the merchants of illegal products and their buyers, which is also constantly monitored by legal authorities. As one common way for evasion, the merchants and buyers together create a vocabulary of jargons (called “black keywords” in this paper) to disguise the transaction (e.g., “smack” is one street name for “heroin” [1]).  Understanding black keywords is of great importance to track and disrupt the underground economy, but it is also prohibitively difficult: the investigators have to infiltrate the inner circle of criminals to learn their meanings, a task both risky and time- consuming. In this talk Prof. Duan  will introduce their attempt towards capturing and understanding the ever-changing black keywords. We investigated the underground business promoted through blackhat SEO (search engine optimization) and demonstrate that the black keywords targeted by the SEOers can be discovered through a fully automated approach. Together with Baidu, the leading search engine in China,  Prof. Duan’s team built a system called KDES (Keywords Detection and Expansion System), and applied it to the search results of Baidu. So far, they have already identified 478,879 black keywords which were clustered under 1,522 core words based on text similarity. They further extracted the information like emails, mobile phone numbers and instant messenger IDs from the pages and domains relevant to the underground business. Such information is helpful to understand the underground economy of China in particular.

Bio:
Dr. Haixin Duan is a professor in the Institute for Network Science and Cyberspace at Tsinghua University. He was once a visiting scholar at UC Berkeley and a senior scientist in International Computer Science Institute(ICSI).  Professor Duan focuses his research on network security, including security of network protocols, intrusion detection and underground economy detection. His research results were published in top security conferences like Security & Privacy, USENIX Security, CCS and NDSS. One of his research won Distinguished Paper Award of NDSS 2016. Prof. Duan won the  Excellence Talent Award for Cybersecurity (one of ten awardees from both academia and industry in China).

Heterogeneous GPU reallocation

Wednesday July 5th, 1-2PM @ BA5205

Speaker: James Gleeson

Title:
Heterogeneous GPU reallocation

Abstract:
Emerging cloud markets like spot markets and batch computing services scale up services at the granularity of whole VMs. In this paper, we observe that GPU workloads underutilize GPU device memory, leading us to explore the benefits of reallocating heterogeneous GPUs within existing VMs. We outline approaches for upgrading and downgrading GPUs for OpenCL GPGPU workloads, and show how to minimize the chance of cloud operator VM termination by maximizing the heterogeneous environments in which applications can run.

Bio:
James is a PhD student under Eyal de Lara.  He has done research in mobile security for both physical and software attacks on Android phones.  His current research interests are in heterogeneous computing in data centers.

Crane: Fast and Migratable GPU Passthrough for OpenCL applications

Wednesday May 17th, 12-1PM @ BA5205

Speaker: James Gleeson

Title:
Crane: Fast and Migratable GPU Passthrough for OpenCL applications

Abstract:
General purpose GPU (GPGPU) computing in virtualized environments leverages PCI passthrough to achieve GPU performance comparable to bare-metal execution. However, GPU passthrough prevents service administrators from performing virtual machine migration between physical hosts.
Crane is a new technique for virtualizing OpenCL-based GPGPU computing that achieves within 5.25% of passthrough GPU performance while supporting VM migration. Crane interposes a virtualization-aware OpenCL library that makes it possible to reclaim and subsequently reassign physical GPUs to a VM without terminating the guest or its applications. Crane also enables continued GPU operation while the VM is undergoing live migration by transparently switching between GPU passthrough operation and API remoting.
 

Bio:
James is a PhD student under Eyal de Lara.  He has done research in mobile security for both physical and software attacks on Android phones.  His current research interests are in heterogeneous computing in data centers.

The Game of Twenty Questions: Do You Know Where to Log?

Thursday May 4th, 12-1PM @ BA5205

Speaker: Xu Zhao

Title:
The Game of Twenty Questions: Do You Know Where to Log?

Abstract:
A production system’s printed logs are often the only source of runtime information available for postmortem debugging, performance profiling, security auditing, and user behavior analytics.  Therefore, the quality of this data is critically important. Recent work has attempted to enhance log quality by recording additional variable values, but logging statement placement, i.e., where to place a logging statement, which is the most challenging and fundamental problem for improving log quality, has not been adequately addressed so far. This position paper proposes an automated placement of logging statements by measuring the uncertainty of software that can be eliminated. Guided by ideas from information theory, authors describe a simple approach that automates logging statement placement. Preliminary results suggest that the algorithm can effectively cover, and further improve, existing logging statements placed by developers. It can compute an optimal log-placement that disambiguates the entire function call path with only 0.218% of
slowdown.

Bio:
Xu Zhao is a 2nd year PhD student at the University of Toronto, under the supervision of Prof. Ding Yuan. His research interests lie in the area of performance of distributed systems and failure diagnosis. His current work focuses on automated placement of logging statements and non-intrusive performance profiling for distributed systems.

Challenges and Solutions to Secure Internet Geolocation

Wednesday May 3rd , 12-1PM @ BA5205

Speaker: AbdlRahman Abdou

Title:
Challenges and Solutions to Secure Internet Geolocation

Abstract:
The number of security-sensitive location-aware services over the Internet continues to grow, such as location-aware authentication, location-aware access policies, fraud prevention, complying with media licensing, and regulating online gambling/voting. 
An adversary can evade existing geolocation techniques, e.g., by faking GPS coordinates or employing a non-local IP address through proxy and virtual private networks. In this talk, I will present parts of my PhD work, including Client Presence Verification (CPV), which is a measurement-based technique designed to verify an assertion about a device’s presence inside a prescribed geographic region. CPV does not identify devices by their IP addresses. Rather, the device’s location is corroborated in a novel way by leveraging geometric properties of triangles, which prevents an adversary from manipulating network delays to its favor. To achieve high accuracy, CPV mitigates Internet path asymmetry using a novel method to deduce one-way application-layer delays to/from the client’s participating device, and mines these delays for evidence supporting/refuting the asserted location. I will present CPV’s evaluation results, including the granularity of the verified location and the verification time, and summarize some lessons we learned throughout the process.

Bio:
AbdelRahman Abdou is a Post-Doctoral Fellow in the School of Computer Science at Carleton University. He received his PhD (2015) in Systems and Computer Engineering from Carleton University. His research interests include location-aware security, SDN security, authentication, SSL/TLS and using Internet measurements to solve problems related to Internet security.

Consistency Oracle

Friday April 28th, 1-2PM @ BA5205

Speaker: Beom Heyn Kim

Title:
Consistency Oracle

Abstract:
Many modern distributed storage systems emphasize availability and partition tolerance over consistency, leading to many systems that provide weak data consistency. However, weak data consistency is difficult for both system designers and users to reason about formal specifications that may offer precise descriptions of consistency behavior, but they are difficult to use and usually require expertise beyond that of the average software developer. In this paper, we propose and describe consistency oracle, a novel instantiation of formal specification. A consistency oracle takes the same interface call as a distributed storage system, but returns all possible values that may be returned under a given consistency model. Consistency oracles are easy to use and can be applied to test and verify both distributed storage systems and client software that uses those systems.

Bio:
Ben is a PhD student under Prof. David Lie. His research primarily focuses on Consistency Verification for Distributed Systems.

Semantic Aware Online Detection of Resource Anomalies on the Cloud

Wednesday Nov 23rd, 12-1PM @ BA5205

Speaker: Stelios Sotiriadis

Title:
Semantic Aware Online Detection of Resource Anomalies on the Cloud

Abstract:
As cloud based platforms become more popular, it becomes an essential task for the cloud administrator to efficiently manage the costly hardware resources in the cloud environment.
Prompt action should be taken whenever hardware resources are faulty, or configured and utilized in a way that causes application performance degradation, hence poor quality of service. In this paper, we propose a semantic aware technique based on neural network learning and pattern recognition in order to provide automated, real-time support for resource anomaly detection.
We incorporate application semantics to narrow down the scope of the learning and detection phase, thus enabling our machine learning technique to work at a very low overhead when executed online. As our method runs “life-long” on monitored resource usage on the cloud, in case of wrong prediction, we can leverage administrator feedback to improve prediction on future runs.
This feedback directed scheme with the attached context helps us to achieve an anomaly detection accuracy of as high as 98.3% in our experimental evaluation, and can be easily used in conjunction with other anomaly detection techniques for the cloud.

Bio:
Stelios Sotiriadis is a research fellow under Prof. Cristiana Amza. His research focuses Inter-Cloud Meta-Scheduling (ICMS) framework.

Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle

Thursday October 27th, 12-1PM @ BA5205

Speaker: Xu Zhao

Title:
Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle

Abstract:
Understanding the performance behavior of distributed server stacks at scale is non-trivial. The servicing of just a single request can trigger numerous sub-requests across heterogeneous software components; and many similar requests are serviced concurrently and in parallel. When a user experiences poor performance, it is extremely difficult to identify the root cause, as well as the software components and machines that are the culprits. This work describes Stitch, a non-intrusive tool capable of profiling the performance of an entire distributed software stack solely using the unstructured logs output by heterogeneous software components. Stitch is substantially different from all prior related tools in that it is capable of constructing a system model of an entire software stack without building any domain knowledge into Stitch. Instead, it automatically reconstructs the extensive domain knowledge of the programmers who wrote the code; it does this by relying on the Flow Reconstruction Principle which states that programmers log events such that one can reliably reconstruct the execution flow a posteriori.

Bio:
Xu is a second year PhD student under Prof. Ding Yuan. His research focuses on performance failure debugging and log analysis in distributed systems.

Don’t Get Caught In the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Wednesday October 26th, 12-1PM @ BA5205

Speaker: David Lion

Title:
Don’t Get Caught In the Cold, Warm-up Your JVM:
Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Abstract:

Many widely used, latency sensitive, data-parallel distributed systems, such as HDFS, Hive, and Spark choose to use the Java Virtual Machine (JVM), despite debate on the overhead of doing so. This paper analyzes the extent and causes of the JVM performance overhead in the above mentioned systems. Surprisingly, we find that the warm-up overhead, i.e., class loading and interpretation of bytecode, is frequently the bottleneck. For example, even an I/O intensive, 1GB read on HDFS spends 33% of its execution time in JVM warm-up, and Spark queries spend an average of 21 seconds in warm-up.

The findings on JVM warm-up overhead reveal a contradiction between the principle of parallelization, i.e., speeding up long running jobs by parallelizing them into short tasks, and amortizing JVM warm-up overhead through long tasks. We solve this problem by designing HotTub, a new JVM that amortizes the warm-up overhead over the lifetime of a cluster node instead of over a single job by reusing a pool of already warm JVMs across multiple applications. The speed-up is significant. For example, using HotTub results in up to 1.8X speed-ups for Spark queries, despite not adhering to the JVM specification in edge cases.

Bio:
David is a first year PhD student under Prof. Ding Yuan. His research primarily focuses on Java Virtual Machine performance in data-parallel applications.