Scaling up Binary Analysis via Knowledge-oriented Techniques

Friday June 24th, 11-12PM @ BA5205

Speaker: Zhenkai Liang

Title:
Bridging the design and implementation of distributed systems with program analysis

Abstract:
Binary analysis is a fundamental technique in software and system security. It has a wide range of applications, such as vulnerability discovery, attack response, malware analysis, and software testing and debugging. Due to the lack of high-level semantics and complex program behaviors, it is challenging for binary analysis solutions to scale up to large binaries in practice. Existing solutions are often driven by specific tasks, where the practical time limit hinders comprehensively understanding of binaries. Furthermore, it is also difficult to integrate the knowledge generated across different solutions. In this talk, we discuss our research in scaling up binary analysis in a knowledge-oriented manner. We believe knowledge abstraction is the key to scale up binary analysis, where binary analysis solutions generate understandings that can be shared and reused in other solution. Our investigation includes techniques for knowledge extraction, tools for knowledge integration, and platforms for knowledge accumulations and sharing. The accumulated knowledge not only allows broader and deeper analysis into binaries. It also enables emerging data-driven and learning techniques to be effectively adopted in binary analysis solutions. In this talk, I will also share our experience and reflection in system security education.

Bio:
Zhenkai Liang is an Associate Professor of the School of Computing, National University of Singapore. His main research interests are in system and software security, web security, mobile security, and program analysis. He is also the Co-Lead PI of National Cybersecurity R&D Lab in Singapore. He has served as the technical program committee members of many system security conferences, including the ACM Conference on Computer and Communications Security (CCS), USENIX Security Symposium and the Network and Distributed System Security Symposium (NDSS), as well as a member of NDSS Steering Group. As a co-author, he received the Best Paper Award in ICECCS 2014, the Best Paper Award in W2SP 2014, the ACM SIGSOFT Distinguished Paper Award at ESEC/FSE 2009, the Best Paper Award at USENIX Security Symposium 2007, and the Outstanding Paper Award at ACSAC 2003. He also won the Annual Teaching Excellence Award of National University of Singapore in 2014 and 2015. He received his Ph.D. degree in Computer Science from Stony Brook University in 2006, and B.S. degrees in Computer Science and Economics from Peking University in 1999. His website is: https://www.comp.nus.edu.sg/~liangzk/

Towards Automated Post-mortem Debugging of Distributed Systems

Thursday Feb 7th, 12-1AM @ BA5205

Speaker: Xu Zhao

Title:
Towards Automated Post-mortem Debugging of Distributed Systems
Abstract:
Diagnosing failures in the production environment are notoriously difficult and extremely time-consuming. Most existing debugging tools are intrusive, which incur non-negligible performance overhead and do not fit for failure diagnosis in production. As a result, the state-of-art production failure diagnosis still heavily relies on the logs generated by the conventional printf-debugging.

However, manual debugging with raw logs is a pain, especially when the logs are in low quality or high volume. This talk will focus on how to automate this process. First, I will introduce Log20, a tool that automatically places the log printing statements to improve the log quality. It solves the problem of “where to log” by enabling the developers to choose the right balance between the usefulness and performance overhead of the logs. Second, I will present the Flow Reconstruction Principle, a principle that developers must follow to be able to reconstruct the system execution flow from the logs. I will also show how to apply this principle in the log analysis tool, Stitch, to diagnose real-world failures.

Bio:
Xu Zhao is a Ph.D. candidate in the Department of Electrical and Computer Engineering, University of Toronto. His research interests are in building automatic tools to enhance software logging and diagnose software failures. Before coming to Toronto, Xu earned his Bachelor’s degree in Computer Science and Engineering from the Tsinghua University in 2013. Xu was awarded the Facebook Fellowship in 2018.

On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes

Friday Dec 7th, 1-2AM @ BA5205

Speaker: Gala Yagdar

Title:
On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes

Abstract:
Large-scale storage systems lie at the heart of the big data revolution. As these systems grow in scale and capacity, their complexity grows accordingly, building on new storage media, hybrid memory hierarchies, and distributed architectures. Numerous layers of abstraction hide this complexity from the applications, but also hide valuable information that could improve the system’s performance considerably. I will demonstrate how to bridge this semantic gap in the context of erasure codes, which are used to guarantee data availability and durability.

Locally repairable codes (LRCs) offer tradeoffs between storage overhead and repair cost. They facilitate more efficient recovery scenarios by storing additional parity blocks in the system, but these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing codes differ in their use of the additional parity blocks, but also in their locality semantics and in the parameters for which they are defined. As a result, existing theoretical models cannot be used to directly compare different LRCs to determine which code will offer the best recovery performance, and at what cost.

In this study, we performed the first systematic comparison of existing LRC approaches in light of two new metrics: the average degraded read cost, and the normalized repair cost. We show the tradeoff between these costs and the code’s fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster deployed on Amazon EC2 further demonstrates that the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each system setup.

Bio:
Gala Yagdar from the Technion will be giving a talk on her Usenix ATC’18 work. This talk is based on joint work with Oleg Kolosov, Matan Liram, Eitan Yaakobi, Itzhak Tamo and Alexander Barg, published in USENIX ATC ‘18.

Dealing with Vulnerabilities in Device Drivers

Monday, Oct 15th, 12-1AM @ BA5205

Speaker: Ardalan Amiri Sani

Title:
Dealing with Vulnerabilities in Device Drivers

Abstract:
Vulnerabilities in the device drivers of today’s commodity operating systems (e.g., Android) remain a security concern due to the monolithic structure of the kernel. In this talk, we investigate three methods to mitigate this concern, with a focus on mobile devices. These approaches include a novel tool to find and fix these vulnerabilities, an efficient vetting layer to make exploits harder, and a device driver design possible for I/O devices with virtualization support.

Bio:
Ardalan Amiri Sani is an Assistant Professor in the Computer Science department at UC Irvine. His research is at the intersection of mobile computing, security, and operating systems. His work has appeared in various top-tier conferences such as MobiSys (including a best paper award), USENIX Security, CCS, and ASPLOS. Ardalan received his Ph.D. from Rice University in 2015.

Bridging the design and implementation of distributed systems with program analysis

Tuesday August 21st, 12-1PM @ BA5205

Speaker: Ivan Beschastnikh

Title:
Bridging the design and implementation of distributed systems with program analysis

Abstract:
Much of today’s software runs in a distributed context: mobile apps communicate with the cloud, web apps interface with complex distributed backends, and cloud-based systems use geo-distribution and replication for performance, scalability, and fault tolerance. However, distributed systems that power most of today’s infrastructure pose unique challenges for software developers. For example, reasoning about concurrent activities of system nodes and even understanding the system’s communication topology can be difficult.

In this talk I will overview three program analysis techniques developed in my group that address these challenges. First, I will present Dinv, a dynamic analysis technique for inferring likely distributed state properties of distributed systems. By relating state across nodes in the system Dinv infers properties that help reason about system correctness. Second, I will review Dara, a model checker for distributed systems that introduces new techniques to cope with state explosion by combining traditional abstract model checking with dynamic model inference techniques. Finally, I will discuss PGo, a compiler that compiles formal specifications written in PlusCal/TLA+ into runnable distributed system implementations in the Go language. All three projects employ program analysis in the context of distributed systems and aim to bridge the gap between the design and implementations of such systems.

Bio:
Ivan Beschastnikh is an Assistant Professor in the Department of Computer Science at the University of British Columbia. He finished his PhD at the University of Washington in 2013 and received his formative training at the University of Chicago. He has broad research interests that touch on systems and software engineering. His recent projects span distributed systems, program analysis, networks, and security. Visit his homepage to learn more: http://www.cs.ubc.ca/~bestchai/

Breaking Apart the VFS for Managing File Systems

Tuesday July 3rd, 12-1PM @ BA5205

Speaker: Kuei (Jack) Sun

Title:
Breaking Apart the VFS for Managing File Systems

Abstract:
File system management applications, such as data scrubbers, defragmentation tools, resizing tools, and partition editors, are essential for maintaining, optimizing, and administering storage systems. These applications require fine-grained control over file-system metadata and data, such as the ability to migrate a data block to another physical location. Such control is not available with the VFS API, and so these applications bypass the VFS and access and modify file-system metadata directly. As a result, these applications do not work across file systems, and must be developed from scratch for each file system, which involves significant engineering effort and impedes adoption of new file systems.
Our goal is to design an interface that allows these management applications to be written once and be usable for all file systems that support the interface. Our key insight is that these applications operate on common file system abstractions, such as file system objects (e.g., blocks, inodes, and directory entries), and the mappings from logical blocks of a file to their physical locations. We propose the Extended Virtual File System (eVFS) interface that provides fine-grained access to these abstractions, allowing the development of generic file system management applications. We demonstrate the benefits of our approach by building a file-system agnostic conversion tool that performs in-place conversion of a source file system to a completely different destination file system, showing that arbitrary modifications to the file system format can be handled by the interface.

Bio:
Kuei (Jack) Sun is a fourth year PhD student supervised by Prof. Ashvin Goel and Prof. Angela Demke Brown. The focus of his research is on simplifying the development of file-system aware applications, as well as improving the robustness of these applications.

Challenges with Real-World Smartwatch based Audio Monitoring

Wednesday June 6th, 12-1PM @ BA5205

Speaker: Daniyal Liaqat
Title:
Challenges with Real-World Smartwatch based Audio Monitoring

Abstract:
Audio data from a microphone can be a rich source of information. The speech and audio processing community has explored using audio data to detect emotion, depression, Alzheimer’s disease and even children’s age, weight and height. The mobile community has looked at using smartphone based audio to detect coughing and other respiratory sounds and help predict students’ GPA.
However, audio data from these studies tends to be collected in more controlled environments using well placed, high quality microphones or from phone calls. Applying these kinds of analyses to continuous and in-the-wild audio could have tremendous applications, particularly in the context of health monitoring. As part of a health monitoring study, we use smartwatches to collect in-the-wild audio from real patients. In this paper we characterize the quality of the audio data we collected. Our findings include that the smartwatch based audio is good enough to discern speech and respiratory sounds. However, extracting these sounds is difficult because of the wide variety of noise in the signal and current tools perform poorly at dealing with this noise. We also find that the quality of the microphone allows annotators to differentiate the source of speech and coughing, which adds another level of complexity to analyzing this audio.

Automatically Mitigating and Fixing Software Vulnerabilities

Friday February 23rd, 12-1PM @ BA5205

Speaker: Zhen(James) Huang

Title:
Automatically Mitigating and Fixing Software Vulnerabilities

Abstract:
With the rise of smart phones and IoTs, computer systems have become an indispensable part of our lives. Our reliance on computer systems make software security extremely important. However, software security is continuously threatened by software vulnerabilities because software vulnerabilities are commonly used by adversaries to compromise software security, yet manually fixing software vulnerabilities cannot keep pace with the rampant exploits of software vulnerabilities. While it is ideal to fix software vulnerabilities, creating a fix can take time. A faster alternative to fixing software vulnerabilities is mitigating software vulnerabilities via configuration workarounds, which is frequently used in practice to address software vulnerabilities rapidly ahead of the release of security patches. In this talk, I will demonstrate the need for automatic solutions to address software vulnerabilities with a study on the lifecycle and complexity of real-world security patches, and describe systems that I have built to mitigate more software vulnerabilities than configuration workarounds, and to automatically fix real-world software vulnerabilities. These systems leverage novel program analysis techniques to address two main challenges: 1) mitigating large number of software vulnerabilities rapidly and safely, and 2) generating sound security patches for software vulnerabilities involving complex code structure and data structure. I will conclude this talk with future directions on automatically mitigating and fixing software vulnerabilities.

Bio:
Zhen Huang is a Ph.D candidate in the Department of Electrical & Computer Engineering at University of Toronto. His research focuses on automatically mitigating and fixing software vulnerabilities. Using novel program analysis techniques, he has built two systems to address software vulnerabilities. A system called Talos enables software to defend against exploits to software vulnerabilities rapidly, and a system called Senx automatically fixes software vulnerabilities.

Spiffy: Interpreting Metadata for File System Applications

Thursday February 8th, 12-1PM @ BA5205

Speaker: Jack Sun

Title:
Spiffy: Interpreting Metadata for File System Applications

Abstract:
Many file system applications such as defragmentation tools, file system checkers or data recovery tools, operate at the storage layer. Today, developers of these storage applications require detailed knowledge of the file system format, which takes a significant amount of time to learn, often by trial and error, due to insufficient documentation or specification of the format. Furthermore, these applications perform ad-hoc processing of the file-system metadata, leading to bugs and vulnerabilities.

We propose Spiffy, an annotation language for specifying the on-disk format of a file system. File system developers annotate the data structures of a file system, and we use these annotations to generate a library that allows identifying, parsing and traversing file-system metadata, providing support for both offline and online storage applications. This approach simplifies the development of storage applications that work across different file systems because it reduces the amount of file-system specific code that needs to be written.

We have written annotations for the Linux Ext4, Btrfs and F2FS file systems, and developed several applications for these file systems, including a type-specific metadata corruptor, a file system converter, and an online storage layer cache that preferentially caches files for certain users. Our experiments show that applications that use the library to access file system metadata can achieve good performance and are robust against file system corruption errors.

Bio:
Kuei (Jack) Sun is a fourth year PhD student supervised by Prof. Ashvin Goel and Prof. Angela Demke Brown. The focus of his research is on simplifying the development of file-system aware applications, as well as improving the robustness of these applications.

Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting

Friday December 15th, 12-1PM @ BA5205

Speaker: Clement Fung

Title:
Dancing in the Dark: Private Multi-Party Machine Learning in an Untrusted Setting

Abstract:
The problem of machine learning (ML) over distributed data sources arises in a variety of domains. Unfortunately, today’s distributed ML systems use an unsophisticated threat model:
data sources must trust a central ML process. We propose a brokered learning abstraction that provides data sources with provable privacy guarantees while allowing them to contribute data towards a globally-learned model in an untrusted setting. We realize this abstraction by building on the state of the art in multi-party distributed ML and differential privacy methods to construct TorMentor, a system that is deployed as a hidden Tor service.

Bio:
Clement Fung is a second year master’s student in Computer Science at UBC, supervised by Prof. Ivan Beschastnikh. Originally from Toronto, he completed his undergraduate at the University of Waterloo in 2016. His interests are in privacy-preserving machine learning and distributed systems.