CIS Seminars & Events
Spring 2017 Colloquium Series
Unless otherwise noted, our lectures are held weekly on Tuesday and/or Thursday from 3:00 p.m. to 4:15 p.m. in Wu and Chen Auditorium, Levine Hall.
Wednesday, January 18th
Senior Research Microsoft New York
Berger Auditorium, Skirkanich Hall
3:00 pm - 4:15 pm
"Machine Learning for Social Science"
In this talk, I will introduce the audience to the emerging area of computational social science, focusing on how machine learning for social science differs from machine learning in other contexts. I will present two related models -- both based on Bayesian Poisson tensor decomposition -- for uncovering latent structure from count data. The first is for uncovering topics in previously classified government documents, while the second is for uncovering multilateral relations from country-to-country interaction data. Finally, I will talk briefly about the broader ethical implications of analyzing social data.
Hanna Wallach is a Senior Researcher at Microsoft Research New York City and an Adjunct Associate Professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst. She is also a member of UMass's Computational Social Science Institute. Hanna develops machine learning methods for analyzing the structure, content, and dynamics of social processes. Her work is inherently interdisciplinary: she collaborates with political scientists, sociologists, and journalists to understand how organizations work by analyzing publicly available interaction data, such as email networks, document collections, press releases, meeting transcripts, and news articles. To complement this agenda, she also studies issues of fairness, accountability, and transparency as they relate to machine learning. Hanna's research has had broad impact in machine learning, natural language processing, and computational social science. In 2010, her work on infinite belief networks won the best paper award at the Artificial Intelligence and Statistics conference; in 2014, she was named one of Glamour magazine's "35 Women Under 35 Who Are Changing the Tech Industry"; in 2015, she was elected to the International Machine Learning Society's Board of Trustees; in 2016, she was named co winner of the 2016 Borg Early Career Award; and in 2017, she will be program co-chair of the Neural Information Processing Systems conference. She is the recipient of several National Science Foundation grants, an Intelligence Advanced Research Projects Activity grant, and a grant from the Office of Juvenile Justice and Delinquency Prevention. Hanna is committed to increasing diversity and has worked for over a decade to address the underrepresentation of women in computing. She co-founded two projects---the first of their kind---to increase women's involvement in free and open source software development: Debian Women and the GNOME Women's Summer Outreach Program. She also co-founded the
annual Women in Machine Learning Workshop, which is now in its twelfth year.
Electrical Engineering and Computer Sciences
"Resurrecting Laplace's Demon: The Case for Deterministic Models
In this talk, I will argue that deterministic models have historically proved proved extremely valuable in engineering, despite fundamental limits. I examine the role that models play in engineering and contrast it with the role that they play in science, and I argue that determinism is an extraordinarily valuable property in engineering, even more than science. I will then show that deterministic models for cyber-physical systems, which combine computation with physical dynamics, remain elusive in practice. I will argue that the next big advance in engineering methods must include deterministic models for CPS, and I will show that such models are both possible and practical. I will then examine some fundamental limits of determinism, showing that chaos limits its utility for prediction, and that incompleteness means that at least for CPS, nondeterminism is inevitable.
Edward A. Lee is the Robert S. Pepper Distinguished Professor in the Electrical Engineering and Computer Sciences (EECS) department at U.C. Berkeley. His research interests center on design, modeling, and analysis of embedded, real-time computational systems. He is the director of the nine-university TerraSwarm Research Center (http://terraswarm.org), a director of Chess, the Berkeley Center for Hybrid and Embedded Software Systems, and the director of the Berkeley Ptolemy project. From 2005-2008, he served as chair of the EE Division and then chair of the EECS Department at UC Berkeley. He is co-author of six books and hundreds of papers. He has led the development of several influential open-source software packages, notably Ptolemy and its various spinoffs. He received his BS degree in 1979 from Yale University, with a double major in Computer Science and Engineering and Applied Science, an SM degree in EECS from MIT in 1981, and a PhD in EECS from UC Berkeley in 1986. From 1979 to 1982 he was a member of technical staff at Bell Labs in Holmdel, New Jersey, in the Advanced Data Communications Laboratory. He is a co-founder of BDTI, Inc., where he is currently a Senior Technical Advisor, and has consulted for a number of other companies. He is a Fellow of the IEEE, was an NSF Presidential Young Investigator, won the 1997 Frederick Emmons Terman Award for Engineering Education, and received the 2016 Outstanding Technical Achievement and Leadership Award from the IEEE Technical Committee on Real-Time Systems (TCRTS).
Department of Computer Science
Heterogeneous parallelism and specialization have become widely-used design levers for achieving high computer systems performance and power efficiency, from smartphones to datacenters. Unfortunately, heterogeneity greatly increases complexity at the hardware-software interface, and as a result, it brings increased challenges for software reliability, interoperability, and performance portability Over the past three years, my group has explored a set of issues for heterogeneously parallel systems, particularly related to specifying and verifying memory consistency models (MCMs), from high-level languages, down through compilers and operating systems and ISAs, and eventually to heterogeneous platforms comprised of CPUs, GPUs, and accelerators. The suite of tools we have developed (http://check.cs.princeton.edu ) offers comprehensive and fast analysis of memory ordering behavior across multiple system levels. As such, our tools have been used to find bugs in existing and proposed processors and in commercial compilers. They have also been used to identify shortcomings in the specifications of high-level languages (C++11) and instruction set architectures (RISC-V). Although memory models are traditionally considered nitpicky, boring, and even soul-crushing, my talk will show why they are of central importance for both hardware and software people today, and will also look forward to discuss future work related to MCM verification for accelerator-oriented parallelism and IoT devices.
Margaret Martonosi is the Hugh Trumbull Adams '35 Professor of Computer Science at Princeton University, where she has been on the faculty since 1994. Martonosi's research focuses on computer architecture and mobile computing, particularly power-efficient systems. Past projects include the Wattch power modeling tool and the ZebraNet mobile sensor network, which was deployed for wildlife tracking in Kenya. Martonosi is a Fellow of both IEEE and ACM. Her major awards include Princeton University's 2010 Graduate Mentoring Award, the Anita Borg Institute's 2013 Technical Leadership Award, NCWIT's 2013 Undergraduate Research Mentoring Award, and ISCA’s 2015 Long-Term Influential Paper Award.
Computer Science Department
University of California, Los Angeles
Variation in human DNA sequences account for a significant amount of genetic risk factors for common disease such as hypertension, diabetes, Alzheimer's disease, and cancer. Identifying the human sequence variation that makes up the genetic basis of common disease will have a tremendous impact on medicine in many ways. Recent efforts to identify these genetic factors through large scale association studies which compare information on variation between a set of healthy and diseased individuals have been remarkably successful. However, despite the success of these initial studies, many challenges and open questions remain on how to design and analyze the results of association studies. Many of these challenges involve analysis of recently developed
but revolutionary sequencing technologies. In this talk, I will discuss a few of the computational and statistical challenges in the design and analysis of genetic studie
Eleazar Eskin’s research focuses on developing computational methods for analysis of genetic variation. He is currently a Professor in the Computer Science and Human Genetics departments at the University of California Los Angeles. Previously, he was an Assistant Professor in Residence in Computer Science Engineering at the University of California, San Diego. Eleazar completed his Ph. D. in the Computer Science Department of Columbia University in New York City. After graduation, he spent one year in the Computer Science Department at the Hebrew University in Jerusalem, Israel.
Computer Science Dept.
"Theory for New Machine Learning Problems and Applications"
Machine learning has recently achieved great empirical success. This comes along with new challenges, such as sophisticated models that lack rigorous analysis, simple algorithms with practical success on hard optimization problems, and handling large scale datasets under resource constraints. In this talk, I will present some of my work in addressing such challenges.
This first part of the talk focuses on learning semantic representations for text data. Recent advances in natural language processing build upon the approach of embedding words as low dimensional vectors. The fundamental observation that empirically justifies this approach is that these vectors can capture semantic relations. A probabilistic model for generating text is proposed to mathematically explain this observation and existing popular embedding algorithms. It also reveals surprising connections to classical notions such as Pointwise Mutual Information, and allows to design novel, simple, and practical algorithms for applications such as sentence embedding.
In the second part, I will describe my work on distributed unsupervised learning over large-scale data distributed over different locations. For the prototypical tasks clustering, Principal Component Analysis (PCA), and kernel PCA, I will present algorithms that have provable guarantees on solution quality, communication cost nearly optimal in key parameters, and strong empirical performance.
Yingyu Liang is an associate research scholar in the Computer Science Department at Princeton University. His research interests are providing rigorous analysis for machine learning models and designing efficient algorithms for applications. He received a B.S. in 2008 and an M.S. in 2010 in Computer Science from Tsinghua University, and a Ph.D. degree in Computer Science from Georgia Institute of Technology in 2014. He was a postdoctoral researcher in 2014-2016 in the Computer Science Department at Princeton University.
Computer Science Dept.
"Beyond Deductive Inference in Program Analysis"
Osbert Bastani is a Ph.D. student in Computer Science at Stanford University advised by Alex Aiken. He is interested in improving the automation of program analysis tools using techniques from machine learning, artificial intelligence, and program synthesis. His work is motivated by applications to building secure systems, and analyzing software systems that rely on machine learning models.
"Towards versatile visual recognition systems"
To be useful to downstream applications, visual recognition systems have to solve a diverse array of tasks: they need to recognize a large number of categories, localize instances of these categories precisely in the visual field, estimate their pose accurately and so on. This set of requirements is also not fixed a priori and can change over time, requiring recognition systems to learn new tasks quickly and with minimal training. In contrast, visual recognition systems today only produce a shallow understanding of images, restricted to recognizing categories in an image, and expanding this shallow understanding requires the expensive collection of training data.
In this talk I will describe my work on removing this shortcoming. I will show we can build recognition systems that produce richer outputs, such as pixel-precise localization of detected objects, and how we can make progress towards making these systems capable of visual reasoning. Building models for these new goals requires a lot of training data. To reduce this requirement, I will present ways of leveraging past visual experience to learn new tasks, such as recognizing unseen categories, from very little data.
I am currently a post-doctoral scholar in Facebook AI Research (FAIR). Before joining FAIR, I did my PhD with Prof. Jitendra Malik at the University of California, Berkeley, where I was awarded the Berkeley Fellowship and the Microsoft Research Fellowship. My interests are in object recognition in computer vision and machine learning.
'"New Algorithms for High-Dimensional Data"
Skirkanich Hall, Berger Auditorium Rm 13
3:00 pm - 4:15 pm
A popular approach in data analysis is to represent a dataset in a high-dimensional feature space, and reduce a given task to a geometric computational problem. However, most of the classic geometric algorithms scale poorly as the dimension grows and are typically not applicable to the high-dimensional regime. This necessitates the development of new algorithmic approaches that overcome this "curse of dimensionality". In this talk I will give an overview of my work in this area.
* I will describe new algorithms for the high-dimensional Nearest Neighbor Search (NNS) problem, where the goal is to preprocess a dataset so that, given a query object, one can quickly retrieve one or several closest objects from the dataset. Our algorithms improve, for the first time, upon the popular Locality-Sensitive Hashing (LSH) approach.
* Next, I will show how to make several algorithmic ideas underlying the above theoretical results practical. This yields an implementation which is competitive with the state of the art heuristics for the NNS problem. The implementation is a part of FALCONN: a new open-source library for similarity search.
* Finally, I will talk about limits of distance-preserving sketching, i.e., compression methods that approximately preserve the distances between high-dimensional vectors. In particular, we will show that for the well-studied Earth Mover Distance (EMD) such efficient compression is impossible.
The common theme that unifies these and other algorithms for high-dimensional data is the use of "efficient data representations": randomized hashing, sketching, dimensionality reduction, metric embeddings, and others. One goal of the talk is to describe a holistic view of all of these techniques.
Ilya Razenshteyn is a graduate student at the Theory of Computation group of MIT CSAIL advised by Piotr Indyk. He is broadly interested in the theory of algorithms for massive data with a bias towards algorithms which have the potential of being useful for applications. More specific interests include: similarity search, sketching, metric embeddings, high-dimensional geometry, streaming algorithms, and compressed sensing. Ilya graduated with M.Sc. in Mathematics from Moscow State University back in 2012. His awards include Akamai Presidential Fellowship and Simons Foundation Junior Fellowship.
Department of Computer Science
Johns Hopkins University
"Securing Deployed Cryptographic Systems"
In 2015 more than 150 million records and $400 billion were lost due to publicly-reported criminal and nation-state cyberattacks in the United States alone. The failure of our existing security infrastructure motivates the need for improved technologies, and cryptography provides a powerful tool for doing this. There is a misperception that the cryptography we use today is a "solved problem" and the real security weaknesses are in software or other areas of the system. This is, in fact, not true at all, and over the past several years we have seen a number of serious vulnerabilities in the cryptographic pieces of systems, some with large consequences.
In this talk I will discuss two aspects of securing deployed cryptographic systems. I will first talk about the evaluation of systems in the wild, using the example of how to efficiently and effectively recover user passwords submitted over TLS encrypted with RC4, with applications to many methods of web authentication as well as the popular IMAP protocol for email. I will then address my work on developing tools to design and create cryptographic systems and bridge the often large gap between theory and practice by introducing AutoGroup+, a tool that automatically translates cryptographic schemes from the mathematical setting used in the literature to that typically used in practice, giving both a secure and optimal output.
Christina Garman is a Ph.D. student at Johns Hopkins University where she is advised by Professor Matthew Green. Her research interests focus largely on practical and applied cryptography. More specifically, her work has focused on the security of deployed cryptographic systems from all aspects, including the evaluation of real systems, improving the tools that we have to design and create them, and actually creating real, deployable systems. Some of her recent work has been on demonstrating flaws in Apple’s iMessage end to end encryption, cryptographic automation, decentralized anonymous e-cash, and decentralized anonymous credentials. Her work has been publicized in The Washington Post, Wired, and The Economist, and she received a 2016 ACM CCS Best Paper Award.
“Learning to Live with Errors: Architectural Solutions for Memory Reliability at Extreme Scaling"
Skirkanich Hall, Berger Auditorium Rm 13
3:00 pm - 4:15 pm
High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is because we rely on technology scaling to improve memory density, and at small feature sizes, memory cells tend to break easily. Today, memory reliability is seen as the key impediment towards using high-density devices, adopting new technologies, and even building the next Exascale supercomputer. To ensure even a bare-minimum level of reliability, present-day solutions tend to have high performance, power and area overheads. Ideally, we would like memory systems to remain robust, scalable, and implementable while keeping the overheads to a minimum. In this talk, I will discuss how simple cross-layer architectural techniques can provide orders of magnitude higher reliability and enable seamless scalability with negligible overheads. I will also highlight how the fundamentals of memory reliability techniques can be extended into the domains of Security, Low-Power IoT radio controllers, and Quantum Accelerators.
Prashant J. Nair is a Ph.D. candidate in Georgia Institute of Technology where he is advised by Professor Moinuddin Qureshi. Prior to this, he received his MS (2011-2013) from Georgia Institute of Technology and his BE in Electronics Engineering (2005-2009) from University of Mumbai. His research interests include reliability, performance, power and refresh optimizations for current and upcoming memory systems. In these areas, he has authored and co-authored 9 papers in top-tier venues such as ISCA, MICRO, HPCA, ASPLOS and DSN. To highlight the importance of memory reliability to the academic community, he also organized the “Memory Reliability Forum” that was co-located with HPCA-2016. He has served as the Submissions Co-chair of MICRO 2015 and in the ERC of ISCA 2016. During the course of his Ph.D., he has also interned at several reputed industrial labs including AMD, Samsung, Intel and IBM.
Ali Jose' Mashtizadeh
Secure Computer Group
"Systems and Tools for Reliable Software: Replication, Reproducibility, and Security"
The past decade has seen a rapid acceleration in the development of new and transformative applications in many areas including transportation, medicine, ﬁnance, and communication. Most of these applications are made possible by the increasing diversity and scale of hardware and software systems.
While this brings unprecedented opportunity, it also increases the probability of failures and the difﬁculty of diagnosing them. Increased scale and transience has also made management increasingly challenging. Devices can come and go for a variety of reasons including mobility, failure and recovery, and scaling capacity to meet demand.
In this talk, I will be presenting several systems that I built to address the resulting challenges to reliability, management, and security.
Ori is a reliable distributed ﬁle system for devices at the network edge. Ori automates many of the tasks of storage reliability and recovery through replication, taking advantage of fast LANs and low cost local storage in edge networks.
Castor is record/replay system for multi-core applications with predictable and consistently low overheads. This makes it practical to leave record/replay on in production systems, to reproduce difficult bugs when they occur, and to support recovering from hardware failures through fault tolerance.
Cryptographic CFI (CCFI) is a dynamic approach to control flow integrity. Unlike previous CFI systems that rely purely on static analysis, CCFI can classify pointers based on dynamic and runtime characteristics. This limits the attacks to only actively used code paths, resulting in a substantially smaller attack surface.
Electrical Engineering and Computer Science
University of California-Berkeley
"Towards a Theory of Safe and Interactive Autonomy"
Today’s society is rapidly advancing towards cyber-physical systems (CPS) that interact and collaborate with humans, e.g., semi-autonomous vehicles interacting with drivers and pedestrians, medical robots used in collaboration with doctors, or service robots interacting with their users in smart homes. The safety-critical nature of these systems requires us to provide provably correct guarantees about their performance in interaction with humans. The goal of my research is to enable such human-cyber-physical systems (h-CPS) to be safe and interactive. I aim to develop a formalism for design of algorithms and mathematical models that facilitate correct-by-construction control for safe and interactive autonomy.
In this talk, I will first discuss interactive autonomy, where we use algorithmic human-robot interaction to be mindful of the effects of autonomous systems on humans, and further leverage these effects for better safety, efficiency, coordination, and estimation. I will then talk about safe autonomy, where we provide correctness guarantees, while taking into account the uncertainty arising from the environment. Further, I will discuss a falsification algorithm to show robustness with respect to learned human models. While the algorithms and techniques introduced can be applied to many h-CPS applications, in this talk, I will focus on the implications of my work for semi-autonomous driving.
Electrical and Computer Engineering
University of Texas, Austin
"Energy-Efficient Mobile Web Computing"
Mobile computing is experiencing a technological renaissance, and the Web is Florence. Throughout the past decade, the Web has redefined the way people retrieve information, communicate with one another, and extract insights. Although rife with opportunity, the energy-constrained nature of mobile devices is a major roadblock to the potential that next-generation Web technologies promise. In this talk, I will show a path for achieving an energy-efficient mobile Web by rethinking the conventional abstractions across the hardware/software interface along with deep introspection of Web domain knowledge. In particular, I will describe an energy-efficient mobile processor architecture specialized for Web technologies as well as programming language support that empowers Web developers to make calculated trade-offs between energy-efficiency and end-user quality-of-service. Together, they form the core of my hardware-software co-design philosophy toward the next major milestone of the Web evolution: the Watt-Wise Web. As computing systems in the Internet-of-things era increasingly rely on fundamental Web technologies while operating under even more stringent energy constraints, Watt-Wise Web is here to stay.
Yuhao Zhu is a Visiting Research Fellow in the School of Engineering and Applied Sciences at Harvard University and a final year Ph.D. candidate in the Department of Electrical and Computer Engineering at The University of Texas at Austin. He is interested in designing and prototyping better hardware and software systems to make next-generation edge and cloud computing fast, energy-efficient, intelligent, and safe. His dissertation work on energy-efficient mobile computing has been supported by the Google Ph.D. fellowship. His paper awards include the Best of Computer Architecture Letters in 2014 and IEEE Micro TopPicks in Computer Architecture Honorable Mention in 2016.
Computer Engineering Lab
University of Michigan
"Architecting Persistent Memory Systems"
Persistent Memory (PM) technologies (also known as Non-Volatile RAM, e.g., Intel’s 3D XPoint) offer the exciting possibility of disk-like durability with the performance of main memory. Persistent memory systems provide applications with direct access to storage media via processor load and store instructions rather than having to rely on performance sapping software intermediaries like the operating system, aiding the development of high-performance, recoverable software. For example, I envision storage software that provides the safety and correctness of a conventional database management system like PostgreSQL and the performance of an in-memory store like Redis. However, today’s computing systems have been optimized for block storage devices and cannot fully exploit the benefits of PMs. Designing efficient systems for this new storage paradigm requires a careful rethink of computer architectures, programming interfaces, and application software.
While maintaining recoverable data structures in main memory is the central appeal of persistent memories, current systems do not provide efficient mechanisms (if any) to do so. Ensuring the recoverability of these data structures requires constraining the order of PM writes, whereas current architectures are designed to reorder memory accesses, transparent to the programmer, for performance. In this talk, I will introduce recently proposed programming interfaces, called persistency models, that will allow programmers to express the required order of PM writes. Then, I will present my work on developing efficient hardware implementations to enforce the PM write order prescribed by persistency models and tailoring software for these new programming interfaces.
Bio: Aasheesh Kolli is a doctoral candidate in Computer Science and Engineering at the University of Michigan. He investigates application software, programming interfaces, and computer architectures in light of emerging persistent memory technologies. His work has resulted in multiple research papers, including a best paper nomination, at venues like the International Symposium on Microarchitecture (MICRO) and the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
Computer Science Dept.
University of South Carolina
Congnitive Sciences Dept.
Massachusetts Institute of Technology
The John P. Imlay Jr. Dean of Computing and Professor