CIS Seminars & Events

Spring 2013 Colloquium Series

January 11

Domagoj Babic
Facebook, Inc
"Synthesis of Symbolic Input-Output Specifications with Sigm"

Read the Abstract

Abstract: I will present Sigma*, a novel technique for synthesis of symbolic models of software behavior. Sigma* addresses the challenge of synthesizing models of software by using symbolic conjectures and abstraction. By combining dynamic symbolic execution to discover symbolic input-output steps of the programs and counterexample guided abstraction refinement to over-approximate program behavior, Sigma* transforms arbitrary source representation of programs into faithful input-output models. I define a class of stream filters --- programs that process streams of data items --- for which Sigma* converges to a complete model if abstraction refinement eventually builds up a sufficiently strong abstraction. In other words, Sigma* is complete relative to abstraction. Thus, Sigma* enables fully automatic analysis of behavioral properties such as commutativity, reversibility and idempotence, which is useful for web sanitizer verification and stream programs compiler optimizations, as I show experimentally. I also show how models inferred by Sigma* can boost performance of stream programs by parallelized code generation.  

February 21

Isil Dillig
Computer Science Dept. College of William & Mary
" Introspective Static Analysis via Abductive Inference"

Read the Abstract and Bio

Abstract: When software verification tools fail to prove the correctness a program, there are two possibilities: Either the program is indeed buggy  or the warning generated by the analysis is a false alarm. In this situation, the burden is on the programmer to manually inspect the warning and decide whether the program contains a real bug. However, this task is quite error-prone and time consuming and remains an impediment to the adoption of static analysis in the real world. In this talk, we present a new "introspective" approach to static analysis that helps programmers understand and diagnose error reports generated by the analysis. When our analysis fails to verify the program, it can meaningfully interact with users by generating small and relevant queries that capture exactly the information the analysis is missing to validate or refute the existence of an error in the program. The inference of such missing facts is an instance of logical abduction, and we present a new algorithm for performing abductive inference in logics that admit quantifier elimination. Since our abduction algorithm computes logically weakest solutions with as few variables as possible, the queries our technique presents to the user capture the minimum information needed to diagnose the error report. A user study we performed involving 56 programmers hired through ODesk indicates that our new technique dramatically improves the usefulness of static analysis to programmers.

Bio: Isil Dillig is currently an assistant professor of computer science at the College of William &Mary. She received her PhD in computer science from Stanford University in 2012. Isil also holds bachelors and masters degrees in computer science, also from Stanford University. Her research interests include program analysis, programming languages, formal methods, and automated logical reasoning.

February 26

Thomas Dillig
Computer Science Dept. College of William & Mary
" Practical Program Verification using Integer Linear Programming"

Read the Abstract and Bio

Abstract: A popular approach to program verification is to represent the abstraction of the program symbolically as a logical formula and use a constraint solver to decide the absence of errors in the program. More precise abstractions of the program result in fewer false alarms, but they also generate more complex formulas that are harder to solve. Therefore, most existing approaches compromise precision of the abstraction in order to generate formulas that are easier to solve. Unfortunately, this trade off is unattractive because compromising precision results in many false alarms. In this talk, we present a new algorithm called Cuts-from-Proofs for deciding satisfiability of linear integer constraints typically generated by program verification systems. Our new technique generates custom cutting planes by computing proofs of unsatisfiability obtained using Hermite normal forms. Experimental results show that Cuts-from-Proofs outperforms existing algorithms by two orders of magnitude when solving many instances of linear integer inequality systems. We use this algorithm in the context of verifying memory safety of C and C++ programs and show that our algorithm makes it possible to automatically verify memory safety in real programs with very few false alarms.

Bio: Thomas Dillig is an assistant professor at the College of William & Mary. He obtained a PhD in computer science from Stanford University in 2012. The goal of Tom's research is to make programs more reliable through static reasoning. Specifically, Tom is interested in developing precise static analysis techniques that work on real programs. Tom is also interested in constraint solving and simplification techniques and applying them to program analysis problems.

February 28

Geoff Voelker
Dept. of Computer Science and Engineering University of California, San Diego Talk
" Exploring the Technical and Economic Factors Underlying Internet Scams"

Read the Abstract

Abstract: Today, the large-scale compromise of Internet hosts serves as a platform for supporting a range of criminal activity in the so-called Internet underground economy. In this talk I will start by quickly surveying work that our group has performed over the past decade on the problems posed by these threats, and how our research directions have evolved over time in response to them. In the remainder of the talk, I will go into detail on recent work that our group has performed in an end-to-end analysis of the spam value chain. Using extensive measurements over months of diverse spam data, broad crawling of naming and hosting infrastructures, and product purchases from a wide variety of spam-advertised sites, I'll characterize the modern spam ecosystem including system infrastructure, business models, cost accounting, and consumer demand. I'll end by characterizing the relative prospects for anti-spam interventions at multiple levels, and initial results of interventions in the payment tier.
This work is part of a long-standing collaborative effort between UCSD and ICSI.
http://www.ccied.org | http://www.sysnet.ucsd.edu/botnets | http://www.sysnet.ucsd.edu/frontier/

March 14

Fred Schneider
Computer Science Dept. Cornell University Talk
"Blueprint for a Science of Cyber-security"

Read the Abstract

Abstract: Cyber-security today is focused largely on defending against known attacks. We learn about the latest attack and find a patch to defend against it. Our defenses thus improve only after they have been successfully penetrated. This is a recipe to ensure some attackers succeed---not a recipe for achieving system trustworthiness. We must move beyond reacting to yesterday's attacks and instead start building systems whose trustworthiness derives from first principles. Yet, today we lack such a science base for cybersecurity. That science of security would have to include attacks, defense mechanisms, and security properties; its laws would characterize how these relate. This talk will discuss examples of such laws and suggest avenues for future exploration.  

For more information on our speaker: http://www.cs.cornell.edu/fbs/

March 19

Zachary Tatlock
Dept. of Computer Science and Engineering University of California, San Diego Talk
"Securing Software via Design and Proof"

Read the Abstract and Bio

Abstract: As our dependence on software grows, so too do the risks posed by programming errors. In principle, programmers could eliminate the dangerous and exploitable errors that plague modern systems by formally proving their code correct. Unfortunately, the overwhelming burden of constructing such machine-checkable proofs has made formal verification prohibitively expensive for most applications. In this talk I will describe techniques I developed to radically reduce the formal verification proof burden. In particular, I will focus on formal shim verification, a new method to scale formal verification up to large systems. With formal shim verification, a system is partitioned into components which must interact and access resources through a narrow interface known as the shim. By sandboxing all untrusted components and verifying the shim, we can establish formal correctness guarantees for the entire system while only reasoning about a tiny fraction of the code. We applied formal shim verification to guarantee several important security properties in a new, modern web browser dubbed QUARK. In addition, I will also briefly discuss my previous work on automated, domain specific language (DSL) based techniques to reduce the proof burden for formally verifying compiler optimizations.

Bio: Zach Tatlock is a PhD candidate in Computer Science and Engineering at UC San Diego where he is a member of the Programming Systems group. He received BS degrees in Computer Science and Mathematics from Purdue University. His research draws upon proof assistants, Satisfiability Modulo Theories (SMT) solvers, and type systems to improve software reliability and security in domains ranging from embedded database query languages and compiler optimizations to web browsers.

March 21

Allison Lewko
Microsoft Research New England
"Expanding Capabilities for Functional Encryption"

Read the Abstract and Bio

Abstract: Functional Encryption represents a new vision for designing cryptosystems that simultaneously achieve flexible data sharing and strong security. Using a functional encryption scheme, a data owner can specify an access policy that will be enforced by the encryption scheme itself, allowing only authorized users to decrypt. In this talk, I will present functional encryption schemes supporting expressive access policies while retaining provable security guarantees against powerful adversaries. In particular, I will describe a new decentralized system enabling data encryption across different trust domains.

Bio: Allison Bishop Lewko is a postdoctoral researcher at Microsoft Research New England. She completed her Ph.D. in computer science at The University of Texas at Austin in May 2012. She received her bachelor’s degree in mathematics from Princeton University and a masters in mathematics from The University of Cambridge. Much of her research is focused on developing advanced cryptosystems with strong security guarantees, though she is also active in the areas of complexity theory, distributed computing, harmonic analysis, and combinatorics.

March 26

Daniel Hsu
Microsoft Research New England
"Fast learning algorithms for discovering the hidden structure in data"

Read the Abstract and Bio

Abstract: A major challenge in machine learning is to reliably and automatically discover hidden structure in data with minimal human intervention.  For instance, one may be interested in understanding the stratification of a population into subgroups, the thematic make-up of a collection of documents, or the dynamical process governing a complex time series.  Many of the core statistical estimation problems for these applications are, in general, provably intractable for both computational and statistical reasons; and therefore progress is made by shifting the focus to realistic instances that rule out the intractable cases.  In this talk, I'll describe a general computational approach for correctly estimating a wide class of statistical models, including Gaussian mixture models, Hidden Markov models, Latent Dirichlet Allocation, Probabilistic Context Free Grammars,and several more.  The key idea is to exploit the structure of low-order correlations that is present in high-dimensional data.  The scope of the new approach extends beyond the purview of previous algorithms; and it leads to both new theoretical guarantees for unsupervised machine learning, as well as fast and practical algorithms for large-scale data analysis.

Bio: Daniel Hsu is a postdoc at Microsoft Research New England.  Previously, hewas a postdoc with the Department of Statistics at Rutgers University and the Department of Statistics at the University of Pennsylvania from 2010 to2011, supervised by Tong Zhang and Sham M. Kakade.  He received his Ph.D.in Computer Science in 2010 from UC San Diego, where he was advised by Sanjoy Dasgupta; and his B.S. in Computer Science and Engineering in 2004from UC Berkeley.  His research interests are in algorithmic statistics and machine learning.

April 4

Saul Gorn Memorial Event
John C. Mitchell Dept. of Computer Science Stanford University
"JavaScript and Web Security: Isolation and Communication in the Programmable Web"

Read the Abstract and Bio

Abstract: The World Wide Web is our most important distributed computer system. In the modern web, pages seen by viewers contain executable programs from many sources, loaded into the browser in increasingly complex ways. This complexity is an important part of the advertising infrastructure, for example, allowing advertising companies to pass information between them and effectively auction off a portion of the user’s screen to the advertiser most interested in reaching that individual. Maps, games, and other apps are also served through sites that do not know what they do or how they work.

In this environment, it is important to allow web designers and site developers to write sophisticated programs that run in end-user browsers. At the same time, some programs may maliciously try to steal information, or worse. Therefore, separate programs from separate sources must be allowed to run, communicate with each other, but not carry out malicious attacks. Over the last several years, we have studied this problem and developed methods for isolating communicating JavaScript programs. Our methods prevent attacks we found on widely used commercial sites and are provably secure, using methods from the mathematical theory of programming languages.  In addition, these approaches are relevant to mobile security, because approximately three-quarters of current Android apps use an embedded “browser” to interact with web sites.

Bio: John Mitchell is the Mary and Gordon Crary Family Professor in the Stanford School of Engineering, Professor of Computer Science, and Vice Provost for Online Learning. His research in computer security focuses on cloud security, mobile and web security, privacy, and network security. He has also worked on programming language analysis and design, formal methods, and applications of mathematical logic to computer science. Prof. Mitchell currently leads research projects funded by the US Air Force, the Office of Naval Research, private companies and foundations; he is the Stanford Principal Investigator of the multidisciplinary TRUST NSF Science and Technology Center and Chief Computer Scientist of the DHHS-funded SHARPS project on healthcare security and privacy. He is a consultant and advisor to companies and the author of over 150 research articles and two books.

Read more about the Saul Gorn Lecture Series

April 9

Erin Solovey
Humans and Automation Lab MIT Talk
"Enhancing Interactive Systems with Real-time Brain Input"

Read the Abstract and Bio

Abstract: Most human-computer interaction techniques cannot fully capture the richness of the user’s thoughts and intentions when interacting with a computer system. For example, when we communicate with other people, we do not simply use words, but also accompanying cues that give the other person additional insight to our thoughts. At the same time, several physiological changes occur that may or may not be detected by the other person. When we communicate with computers, we also generate these additional signals, but the computer cannot sense such signals, and therefore ignores them. Detecting these signals in real time and incorporating them into the user interface could improve the communication channel between the computer and the human user with little additional effort required of the user.

In this talk, I will discuss our research demonstrating effective use of brain sensor data to expand the bandwidth between the human and computer. Using a relatively new brain-sensing tool called functional near-infrared spectroscopy (fNIRS), we can detect signals within the brain that indicate various cognitive states. This device provides data on brain activity while remaining portable and non-invasive, which opens new doors for human-computer interaction research. The real-time cognitive state information can be used as an implicit, supplemental input channel to provide the user with a richer and more supportive environment, particularly in challenging or high workload situations such as management of unmanned aerial vehicles, driving, air traffic control, video games, healthcare, education, and anything involving information overload, interruptions or multitasking. In addition, while most of my research has focused on the broader population of healthy users, many of the results would benefit disabled users as well, by providing additional channels of communication in a lightweight manner.

Bio: Erin Solovey is a postdoctoral fellow in the Humans and Automation Lab (HAL) at MIT. Her main research area is human-computer interaction, specifically next-generation interaction techniques. Dr. Solovey has received several awards including the NSF/CRA Computing Innovation Fellowship and a CHI Best Paper Honorable Mention Award, and her work has been covered in MIT Technology Review, Slashdot, Engadget and others. She has also conducted research at MIT Lincoln Laboratory and Microsoft Research. Previously, she was a software engineer at Oracle, and also has experience at several startups. She received her bachelor’s degree in computer science from Harvard University, and her M.S. and Ph.D. in computer science from Tufts University, working in the Human-Computer Interaction Lab.

April 11

Nadia Heninger
Computer Science Dept. Princeton University
"RSA in the real world"

Read the Abstract

Abstract: I study computer security and applied cryptography using a theoretician's mathematical toolkit. Security vulnerability analysis can often be a painstaking and implementation-specific process. My approach uses cryptographic and algorithmic ideas to reason about the security of deployed systems, to question assumptions underlying the security of these systems, and to understand and model threats.
In this talk, I will use RSA, the world's most widely used public key cryptosystem, as a vehicle to explore the interaction between cryptographic algorithms and real-world usage:

  • Discovering widespread catastrophic failures in the random number generators in network devices by computing the greatest common divisors of millions of RSA public keys collected in the wild.
  • Reconstructing complete private keys using only a few bits of the private key revealed in the course of a side-channel attack.

In addition to their impact on security, many of the ideas arising in the course of this work have surprising connections across computer science, leading to, for example, new algorithms for decoding families of error-correcting codes, applications within theoretical cryptography, and practical privacy-enhancing technologies.

April 16

Ruzica Piskac
Max Planck Institute for Software Systems
"Software Synthesis using Automated Reasoning"

Read the Abstract and Bio

Abstract: Software synthesis is a technique for automatically generating code from a given specification. The goal of software synthesis is to make software development easier while increasing both the productivity of the programmer and the correctness of the produced code. In this talk I will present an approach to synthesis that relies on the use of automated reasoning and decision procedures. First I will describe how to generalize decision procedures into predictable and complete synthesis procedures. Here completeness means that the procedure is guaranteed to find code that satisfies the given specification. I will illustrate the process of turning a decision procedure into a synthesis procedure using linear integer arithmetic as an example. In addition, I will outline a procedure that synthesizes code snippets from specifications implicitly given in the form of type constraints. As there can be multiple solutions, more than one code snippet can be a good candidate. We use an additional weight function to rank the derived snippets and direct the search for them. In practical evaluation, our technique scales to programs with thousands of visible declarations in scope and succeeds in 0.5 seconds to suggest program expressions of interest. I will conclude with an outlook on possible future research directions and applications of synthesis procedures. I believe that in the future we can make programming easier and more reliable by combining program analysis, software synthesis, and automated reasoning.

Bio: Ruzica Piskac is a tenure-track faculty member at the Max Planck Institute for Software Systems (MPI-SWS) and head of the Synthesis, Analysis and Automated Reasoning Group. Her research interests span the areas of software verification, automated reasoning, code synthesis and program termination. A common thread in Ruzica's research is improving software reliability and trustworthiness using formal techniques. Prior to joining MPI-SWS, she was a graduate student at the EPFL (2007-2011), where she worked under the supervision of Viktor Kuncak. She was awarded the Patrick Denantes Memorial Prize for her dissertation. She holds a Master's degree in Computer Science, obtained at the Max-Planck Institute for Computer Science in Saarbruecken, Germany. She also holds a Dipl.-Ing. Degree in mathematics from University of Zagreb, Croatia.

April 18

Quoc V. Le
Computer Science Dept. Stanford University Talk
"Scaling deep learning to 10,000 cores and beyond"

Read the Abstract and Bio

Abstract: Deep learning and unsupervised feature learning offer the potential to transform many domains such as vision, speech, and natural language processing. However, these methods have been fundamentally limited by our computational abilities, and typically applied to small-sized problems. In this talk, I describe the key ideas that enabled scaling deep learning algorithms to train a large model on a cluster of 16,000 CPU cores (2000 machines). This network has 1.15 billion parameters, which is more than 100x larger than the next largest network reported in the literature.

Such network, when applied at the huge scale, is able to learn abstract concepts in a much more general manner than previously demonstrated. Specifically, we find that by training on 10 million unlabeled images, the network produces features that are very selective for high-level concepts such as human faces and cats. Using these features, we also obtain breakthrough performance gains on several large-scale computer vision tasks.

Thanks to its scalability and insensitivity to modalities, our framework is also used successfully to achieve performance leaps in other domains, such as speech recognition and natural language understanding.

Bio: Quoc V. Le is a PhD student at Stanford and a software engineer at Google. At Stanford and Google, Quoc works on large scale brain simulation using unsupervised feature learning and deep learning. His recent work in deep learning and big data, that yields state-of-the-art performances in many pattern recognition tasks, was widely distributed and discussed on various technology blogs and news sites. Quoc obtained his undergraduate degree with First Class Honors and Distinguished Scholar at the Australian National University. During his undergraduate, he worked on kernel methods, ranking, structured output prediction and optimization. His undergraduate honors thesis work won best paper award as ECML 2007. Quoc was also a researcher at National ICT Australia, Microsoft Research and Max Planck Institute of Biological Cybernetics.