I graduated from Brown University in May 2016 (ScB Mathematics – Computer Science).
My research interests include: (1) programming languages and systems for data stream processing; (2) formal verification and testing; and (3) logical foundations of computing.
I am affiliated with the Penn PL Club:
Current research: Correctness and performance guarantees for stream processing systems
Numerous specialized software platforms now exist for processing large quantities of data and responding in real time. Such stream processing systems are popular because they allow the programmer to specify the computation in an intuitive way (e.g., as a high-level query, as a sequence of stream transformations, or as a dataflow graph), and the system will deploy and distribute the computation automatically. Popular modern stream processing systems include Apache Spark Streaming and Apache Flink.
My long-term goal is to make stream processing systems easier to use by providing better guarantees to the developer about correctness and performance. I have past and ongoing projects in these dimensions:
Correctness and reliability: We have built tool support on top of Apache Storm (PLDI 2019) and Apache Flink (in submission) for testing and verification to prevent bugs due to data parallelism. Our work in Apache Storm shows that, by manipulating data streams with extra type information, programmers can statically ensure that their code parallelizes correctly. Our work in Apache Flink focuses on testing, showing that programmers can use similar annotations to automatically test applications for bugs due to parallelism, without having to modify the application source code. Finally, we are currently designing a specialized stream processing system, called Flumina (in submission), which automatically distributes sequential streaming computations while preserving correctness.
Performance: We have proposed techniques for compilation and optimization of streaming computations. For high-level query languages incorporating user-defined stateful and quantitative computations, we show that a new intermediate representation can be used to achieve efficient compilation with static bounds on performance (POPL 2019), and formally study the benefits of related program representations (ICALP 2017, TCS 2019). In the vein of distributed compilation, our Flumina system aims to distribute sequential streaming computations while optimizing for a desired performance metric. In particular, we have written an optimizer tailored to the edge computing and IoT domains, where the metric is to minimize network load.
*equal contribution †authors in alphabetical order
Streamable regular transductions,† R. Alur, D. Fisman, K. Mamouras, M. Raghothaman, and C. Stanford. Theoretical Computer Science, February 2020.
Data-trace types for distributed stream processing systems, K. Mamouras, C. Stanford, R. Alur, Z. Ives, and V. Tannen. Programming Language Design and Implementation (PLDI), June 2019. Video Abstract
Interfaces for stream processing systems,† R. Alur, K. Mamouras, C. Stanford, and V. Tannen. Invited contribution to Principles of Modeling: Festschrift Symposium in honor of Edward A. Lee, October 2017.
Drafts and Submissions
*equal contribution †authors in alphabetical order
Flumina: correct distribution of stateful streaming computations, K. Kallas,* F. Niksic,* C. Stanford,* and R. Alur. In submission, Fall 2019. 2-Minute Elevator Pitch from September 2019
Differential testing for stream processing systems, K. Kallas,* F. Niksic,* C. Stanford,* and R. Alur. In submission, Fall 2019.
Internship at Amazon Web Services, Automated Reasoning Group (ARG) (summer 2019)
I developed tools to automate the security review process at AWS. I leveraged SMT-based technology built at ARG (see Semantic-based Automated Reasoning for AWS Access Policies (FMCAD 2018) and this AWS blog post) to analyze the permissions configurations of cloud resources. I used this analysis in conjunction with other account data to more easily detect AWS account configurations deviating from security best practice.
More information about ARG can be found at Byron Cook’s page.
Formal verification of properties of knowledge (spring 2016)
My undergraduate capstone project at Brown was on verifying the solutions to several epistemic logic puzzles in the Alloy programming language (alternate link). The abstract is here. You can also download the Alloy files and run them yourself; they are here.
Undergraduate research (summer 2015)
I attended the Complexity Across Disciplines Undergraduate Math REU in summer 2015. The work was in graph theory and combinatorics related to computational biology. We studied a particular operation, called a ‘context-directed reversal’, on signed permutations — permutations of where each element is additionally given a sign.
The main result classifies exactly which signed permutations are sortable by context-directed reversals. We are unable to provide a formula for the number of such signed permutations, but we relate signed permutations to a subclass of graphs, and provide a formula for the number of graphs sortable by an analogous graph operation. We prove that asymptotically it is 1/3 of all graphs.
Geo-distributed stream processing systems (spring 2020)
Existing distributed stream processing systems (DSPS) rely on sending data to a central computing cluster or data center to be processed, which means there are fundamental limits to (1) the end-to-end latency and (2) the network bandwidth used to communicate with the data center, particularly when the data to be processed is highly geo-distributed. In this survey, I look at some works which extend DSPS to the geo-distributed setting by executing stream processing jobs over a geo-distributed network of nodes. This area is not well explored, and this selection of works is fairly preliminary. This was for my WPE II exam (a written and oral presentation requirement in the University of Pennsylvania CIS doctoral program). Here is the written report and slides.
Online, no-regret machine learning on large sets of experts (spring 2018)
Classical algorithms for online, no-regret learning from expert advice (e.g. randomized weighted majority) work efficiently only for a small number of experts. Specifically, the streaming algorithm in this setting takes O(n) time to process each data item, where n is the number of experts. For a project in CIS 625 (Computational Learning Theory), we discussed some of the literature that attempts to extend the ideas of no-regret learning to efficient algorithms for large or even infinite classes of experts. To see our view on two very different approaches, see the project report.
SPEED computational complexity estimation (fall 2016)
For a project in CIS 673 (Computer-Aided Verification), I gave a presentation on the paper SPEED: precise and efficient static estimation of program computational complexity. You can read my take on the paper in my project report, or take a look at the slides. See also the SPEED webpage.
Model theory notes (spring 2015)
With a group of 8 other students, I ran a group independent study project (GISP) on model theory, in the spring semester of 2015. The website for this class is still accessible here.
Here are some of the notes I wrote for the class:
(The notes are not entirely free of errors.)