The Game of Twenty Questions: Do You Know Where to Log?

Thursday May 4th, 12-1PM @ BA5205

Speaker: Xu Zhao

Title:
The Game of Twenty Questions: Do You Know Where to Log?

Abstract:
A production system’s printed logs are often the only source of runtime information available for postmortem debugging, performance profiling, security auditing, and user behavior analytics.  Therefore, the quality of this data is critically important. Recent work has attempted to enhance log quality by recording additional variable values, but logging statement placement, i.e., where to place a logging statement, which is the most challenging and fundamental problem for improving log quality, has not been adequately addressed so far. This position paper proposes an automated placement of logging statements by measuring the uncertainty of software that can be eliminated. Guided by ideas from information theory, authors describe a simple approach that automates logging statement placement. Preliminary results suggest that the algorithm can effectively cover, and further improve, existing logging statements placed by developers. It can compute an optimal log-placement that disambiguates the entire function call path with only 0.218% of
slowdown.

Bio:
Xu Zhao is a 2nd year PhD student at the University of Toronto, under the supervision of Prof. Ding Yuan. His research interests lie in the area of performance of distributed systems and failure diagnosis. His current work focuses on automated placement of logging statements and non-intrusive performance profiling for distributed systems.

Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle

Thursday October 27th, 12-1PM @ BA5205

Speaker: Xu Zhao

Title:
Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle

Abstract:
Understanding the performance behavior of distributed server stacks at scale is non-trivial. The servicing of just a single request can trigger numerous sub-requests across heterogeneous software components; and many similar requests are serviced concurrently and in parallel. When a user experiences poor performance, it is extremely difficult to identify the root cause, as well as the software components and machines that are the culprits. This work describes Stitch, a non-intrusive tool capable of profiling the performance of an entire distributed software stack solely using the unstructured logs output by heterogeneous software components. Stitch is substantially different from all prior related tools in that it is capable of constructing a system model of an entire software stack without building any domain knowledge into Stitch. Instead, it automatically reconstructs the extensive domain knowledge of the programmers who wrote the code; it does this by relying on the Flow Reconstruction Principle which states that programmers log events such that one can reliably reconstruct the execution flow a posteriori.

Bio:
Xu is a second year PhD student under Prof. Ding Yuan. His research focuses on performance failure debugging and log analysis in distributed systems.