Bridging the design and implementation of distributed systems with program analysis

Tuesday August 21st, 12-1PM @ BA5205

Speaker: Ivan Beschastnikh

Title:
Bridging the design and implementation of distributed systems with program analysis

Abstract:
Much of today’s software runs in a distributed context: mobile apps communicate with the cloud, web apps interface with complex distributed backends, and cloud-based systems use geo-distribution and replication for performance, scalability, and fault tolerance. However, distributed systems that power most of today’s infrastructure pose unique challenges for software developers. For example, reasoning about concurrent activities of system nodes and even understanding the system’s communication topology can be difficult.

In this talk I will overview three program analysis techniques developed in my group that address these challenges. First, I will present Dinv, a dynamic analysis technique for inferring likely distributed state properties of distributed systems. By relating state across nodes in the system Dinv infers properties that help reason about system correctness. Second, I will review Dara, a model checker for distributed systems that introduces new techniques to cope with state explosion by combining traditional abstract model checking with dynamic model inference techniques. Finally, I will discuss PGo, a compiler that compiles formal specifications written in PlusCal/TLA+ into runnable distributed system implementations in the Go language. All three projects employ program analysis in the context of distributed systems and aim to bridge the gap between the design and implementations of such systems.

Bio:
Ivan Beschastnikh is an Assistant Professor in the Department of Computer Science at the University of British Columbia. He finished his PhD at the University of Washington in 2013 and received his formative training at the University of Chicago. He has broad research interests that touch on systems and software engineering. His recent projects span distributed systems, program analysis, networks, and security. Visit his homepage to learn more: http://www.cs.ubc.ca/~bestchai/

Breaking Apart the VFS for Managing File Systems

Tuesday July 3rd, 12-1PM @ BA5205

Speaker: Kuei (Jack) Sun

Title:
Breaking Apart the VFS for Managing File Systems

Abstract:
File system management applications, such as data scrubbers, defragmentation tools, resizing tools, and partition editors, are essential for maintaining, optimizing, and administering storage systems. These applications require fine-grained control over file-system metadata and data, such as the ability to migrate a data block to another physical location. Such control is not available with the VFS API, and so these applications bypass the VFS and access and modify file-system metadata directly. As a result, these applications do not work across file systems, and must be developed from scratch for each file system, which involves significant engineering effort and impedes adoption of new file systems.
Our goal is to design an interface that allows these management applications to be written once and be usable for all file systems that support the interface. Our key insight is that these applications operate on common file system abstractions, such as file system objects (e.g., blocks, inodes, and directory entries), and the mappings from logical blocks of a file to their physical locations. We propose the Extended Virtual File System (eVFS) interface that provides fine-grained access to these abstractions, allowing the development of generic file system management applications. We demonstrate the benefits of our approach by building a file-system agnostic conversion tool that performs in-place conversion of a source file system to a completely different destination file system, showing that arbitrary modifications to the file system format can be handled by the interface.

Bio:
Kuei (Jack) Sun is a fourth year PhD student supervised by Prof. Ashvin Goel and Prof. Angela Demke Brown. The focus of his research is on simplifying the development of file-system aware applications, as well as improving the robustness of these applications.

Challenges with Real-World Smartwatch based Audio Monitoring

Wednesday June 6th, 12-1PM @ BA5205

Speaker: Daniyal Liaqat
Title:
Challenges with Real-World Smartwatch based Audio Monitoring

Abstract:
Audio data from a microphone can be a rich source of information. The speech and audio processing community has explored using audio data to detect emotion, depression, Alzheimer’s disease and even children’s age, weight and height. The mobile community has looked at using smartphone based audio to detect coughing and other respiratory sounds and help predict students’ GPA.
However, audio data from these studies tends to be collected in more controlled environments using well placed, high quality microphones or from phone calls. Applying these kinds of analyses to continuous and in-the-wild audio could have tremendous applications, particularly in the context of health monitoring. As part of a health monitoring study, we use smartwatches to collect in-the-wild audio from real patients. In this paper we characterize the quality of the audio data we collected. Our findings include that the smartwatch based audio is good enough to discern speech and respiratory sounds. However, extracting these sounds is difficult because of the wide variety of noise in the signal and current tools perform poorly at dealing with this noise. We also find that the quality of the microphone allows annotators to differentiate the source of speech and coughing, which adds another level of complexity to analyzing this audio.

Automatically Mitigating and Fixing Software Vulnerabilities

Friday February 23rd, 12-1PM @ BA5205

Speaker: Zhen(James) Huang

Title:
Automatically Mitigating and Fixing Software Vulnerabilities

Abstract:
With the rise of smart phones and IoTs, computer systems have become an indispensable part of our lives. Our reliance on computer systems make software security extremely important. However, software security is continuously threatened by software vulnerabilities because software vulnerabilities are commonly used by adversaries to compromise software security, yet manually fixing software vulnerabilities cannot keep pace with the rampant exploits of software vulnerabilities. While it is ideal to fix software vulnerabilities, creating a fix can take time. A faster alternative to fixing software vulnerabilities is mitigating software vulnerabilities via configuration workarounds, which is frequently used in practice to address software vulnerabilities rapidly ahead of the release of security patches. In this talk, I will demonstrate the need for automatic solutions to address software vulnerabilities with a study on the lifecycle and complexity of real-world security patches, and describe systems that I have built to mitigate more software vulnerabilities than configuration workarounds, and to automatically fix real-world software vulnerabilities. These systems leverage novel program analysis techniques to address two main challenges: 1) mitigating large number of software vulnerabilities rapidly and safely, and 2) generating sound security patches for software vulnerabilities involving complex code structure and data structure. I will conclude this talk with future directions on automatically mitigating and fixing software vulnerabilities.

Bio:
Zhen Huang is a Ph.D candidate in the Department of Electrical & Computer Engineering at University of Toronto. His research focuses on automatically mitigating and fixing software vulnerabilities. Using novel program analysis techniques, he has built two systems to address software vulnerabilities. A system called Talos enables software to defend against exploits to software vulnerabilities rapidly, and a system called Senx automatically fixes software vulnerabilities.

Spiffy: Interpreting Metadata for File System Applications

Thursday February 8th, 12-1PM @ BA5205

Speaker: Jack Sun

Title:
Spiffy: Interpreting Metadata for File System Applications

Abstract:
Many file system applications such as defragmentation tools, file system checkers or data recovery tools, operate at the storage layer. Today, developers of these storage applications require detailed knowledge of the file system format, which takes a significant amount of time to learn, often by trial and error, due to insufficient documentation or specification of the format. Furthermore, these applications perform ad-hoc processing of the file-system metadata, leading to bugs and vulnerabilities.

We propose Spiffy, an annotation language for specifying the on-disk format of a file system. File system developers annotate the data structures of a file system, and we use these annotations to generate a library that allows identifying, parsing and traversing file-system metadata, providing support for both offline and online storage applications. This approach simplifies the development of storage applications that work across different file systems because it reduces the amount of file-system specific code that needs to be written.

We have written annotations for the Linux Ext4, Btrfs and F2FS file systems, and developed several applications for these file systems, including a type-specific metadata corruptor, a file system converter, and an online storage layer cache that preferentially caches files for certain users. Our experiments show that applications that use the library to access file system metadata can achieve good performance and are robust against file system corruption errors.

Bio:
Kuei (Jack) Sun is a fourth year PhD student supervised by Prof. Ashvin Goel and Prof. Angela Demke Brown. The focus of his research is on simplifying the development of file-system aware applications, as well as improving the robustness of these applications.

Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting

Friday December 15th, 12-1PM @ BA5205

Speaker: Clement Fung

Title:
Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting

Abstract:
The problem of machine learning (ML) over distributed data sources arises in a variety of domains. Unfortunately, today’s distributed ML systems use an unsophisticated threat model:
data sources must trust a central ML process. We propose a brokered learning abstraction that provides data sources with provable privacy guarantees while allowing them to contribute data towards a globally-learned model in an untrusted setting. We realize this abstraction by building on the state of the art in multi-party distributed ML and differential privacy methods to construct TorMentor, a system that is deployed as a hidden Tor service.

Bio:
Clement Fung is a second year master’s student in Computer Science at UBC, supervised by Prof. Ivan Beschastnikh. Originally from Toronto, he completed his undergraduate at the University of Waterloo in 2016. His interests are in privacy-preserving machine learning and distributed systems.

Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold

Wednesday October 25th, 12-1PM @ BA5205

Speaker: Xu Zhao

Title:
Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold

Abstract:
Complex and unforeseen failures in distributed systems must be diagnosed and replicated in a development environment so that developers can understand the underlying problem and verify the resolution. System logs often form the only source of diagnostic information, and developers reconstruct a failure using manual guesswork. This is an unpredictable and time-consuming process which can lead to costly service outages while a failure is repaired.

This paper describes Pensieve, a tool capable of reconstructing near-minimal failure reproduction steps from log les and system bytecode, without human involvement. Unlike existing solutions that use symbolic execution to search for the entire path leading to the failure, Pensieve is based on the Partial Trace Observation, which states that programmers do not simulate the entire execution to understand the failure, but follow a combination of control and data dependencies to reconstruct a simplified trace that only contains events that are likely to be relevant to the failure. Pensieve follows a set of carefully designed rules to infer a chain of causally dependent events leading to the failure symptom while aggressively skipping unrelated code paths to avoid the path-explosion overheads of symbolic execution models.

Bio:
Xu Zhao is a 3rd-year Ph.D. student at the University of Toronto, under the supervision of Prof. Ding Yuan.
His research interest lies in the area of performance of distributed systems and failure diagnosis.

His current work focuses on automated placement of logging statements and non-intrusive performance profiling for distributed systems.cluding BBC, The Register, Dailydot, and others.

Pensieve: Non-Intrusive Failure Reproduction for Distributed Systems using the Event Chaining Approach

Monday October 23rd, 12-1PM @ BA5205

Speaker: Yongle Zhang

Title:
Pensieve: Non-Intrusive Failure Reproduction for Distributed Systems using the Event Chaining Approach

Abstract:
Complex and unforeseen failures in distributed systems must be diagnosed and replicated in a development environment so that developers can understand the underlying problem and verify the resolution. System logs often form the only source of diagnostic information, and developers reconstruct a failure using manual guesswork. This is an unpredictable and time-consuming process which can lead to costly service outages while a failure is repaired.

This paper describes Pensieve, a tool capable of reconstructing near-minimal failure reproduction steps from log les and system bytecode, without human involvement. Unlike existing solutions that use symbolic execution to search for the entire path leading to the failure, Pensieve is based on the Partial Trace Observation, which states that programmers do not simulate the entire execution to understand the failure, but follow a combination of control and data dependencies to reconstruct a simplified trace that only contains events that are likely to be relevant to the failure. Pensieve follows a set of carefully designed rules to infer a chain of causally dependent events leading to the failure symptom while aggressively skipping unrelated code paths to avoid the path-explosion overheads of symbolic execution models.

Bio:
Yongle Zhang is a PhD student at University of Toronto, working on computer systems and reliability with Prof. Ding Yuan. His current research focuses on system profiling and failure diagnosis.

Directed Compositional Symbolic Execution for MSP430

Thursday August 31th, 12-1PM @ BA5205

Speaker: Ivan Pustogarov

Title:
Directed Compositional Symbolic Execution for MSP430

Abstract:
As embedded systems become ubiquitous and gradually shift from isolated offline to interconnected online systems, their security becomes a major concern. Embedded systems software is usually written in low-level memory-unsafe programming languages such as C which makes them particularly susceptible to memory corruption vulnerabilities.  In addition, such systems are often equipped with sensors and the firmware controlling them is commonly designed assuming a benign environment which makes them susceptible for signal spoofing attacks.
Symbolic execution is an efficient approach that can help in identifying and understanding these threats: combination of symbolic execution and black-box fuzzing can achieve high code coverage; its ability to automatically generate inputs that drive the program into particular states can be used to better understand possible signal spoofing attacks. Existing symbolic execution tools, however, do not work for firmware code and in contrast to the large body of research for traditional architectures such as x86, there are few tools for lower-end embedded architectures. 
 
In the first part of this talk, I will discuss our experience in building a directed, compositional  symbolic execution framework that targets software for the popular MSP430 family of microcontrollers.  I will give a few more details about our modular approach and  how we tackled interrupt-driven control flow, extensive use of peripheral devices, and hardware-related memory areas that are common for embedded programming and which frustrate traditional symbolic execution tools. I will then describe how we used our tool to partially automate a signal spoofing attack against a recently proposed gesture recognition system to trick the firmware into “recognizing” a gesture of the adversary’s choosing.  In the second part of the talk, I will briefly describe our preliminary work on handling dynamic memory allocation routines
(c/re/malloc) with symbolic sizes that are known to significantly increase the number of execution states if symbolically executed “as-is”.

Bio:
Ivan Pustogarov is a Postdoctoral Researcher at Cornell University where he works on program analysis for security of embedded systems. He received his PhD from the University of Luxembourg with the focus on network security in 2015.
 
His research interests lie in the area of system security and center on program/binary analysis for embedded systems. His recent project focused on developing program analysis tools and techniques for the popular MSP430 family of microcontrollers. His research on network security includes practical low-resource off-path attacks on the Tor and Bitcoin P2P networks. The flaws described in his publications had a direct impact on users’ security and caused redesign of parts of the Tor protocol and its core code. His research has been published in S&P, CCS, AsiaCCS, ESORICS etc., and received attention from the media including BBC, The Register, Dailydot, and others.

How to Learn Klingon Without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy

Thursday Aug 3rd, 12-1PM @ BA5205

Speaker: Prof Haixin Duan from Tsinghua

Title:
How to Learn Klingon Without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy

Abstract:
Online underground economy is an important channel that connects the merchants of illegal products and their buyers, which is also constantly monitored by legal authorities. As one common way for evasion, the merchants and buyers together create a vocabulary of jargons (called “black keywords” in this paper) to disguise the transaction (e.g., “smack” is one street name for “heroin” [1]).  Understanding black keywords is of great importance to track and disrupt the underground economy, but it is also prohibitively difficult: the investigators have to infiltrate the inner circle of criminals to learn their meanings, a task both risky and time- consuming. In this talk Prof. Duan  will introduce their attempt towards capturing and understanding the ever-changing black keywords. We investigated the underground business promoted through blackhat SEO (search engine optimization) and demonstrate that the black keywords targeted by the SEOers can be discovered through a fully automated approach. Together with Baidu, the leading search engine in China,  Prof. Duan’s team built a system called KDES (Keywords Detection and Expansion System), and applied it to the search results of Baidu. So far, they have already identified 478,879 black keywords which were clustered under 1,522 core words based on text similarity. They further extracted the information like emails, mobile phone numbers and instant messenger IDs from the pages and domains relevant to the underground business. Such information is helpful to understand the underground economy of China in particular.

Bio:
Dr. Haixin Duan is a professor in the Institute for Network Science and Cyberspace at Tsinghua University. He was once a visiting scholar at UC Berkeley and a senior scientist in International Computer Science Institute(ICSI).  Professor Duan focuses his research on network security, including security of network protocols, intrusion detection and underground economy detection. His research results were published in top security conferences like Security & Privacy, USENIX Security, CCS and NDSS. One of his research won Distinguished Paper Award of NDSS 2016. Prof. Duan won the  Excellence Talent Award for Cybersecurity (one of ten awardees from both academia and industry in China).