I am on the academic job market this year!
I am a PhD student in computer science at the University of Pennsylvania since Fall 2016, advised by Rajeev Alur. I am expected to graduate in May 2022. You can also view my CV online.
Research
My research lies in programming languages and systems. In particular, I am interested in programming languages and formal methods applications to data processing, distributed systems, and networks. I am affiliated with the Penn PL Club:
Publications
^{*}equal contribution ^{†}authors in alphabetical order

A Robust Theory of SeriesParallel Graphs.^{†} Rajeev Alur, Caleb Stanford, and Christopher Watson. In submission.

Guided Incremental Dead State Detection. Caleb Stanford and Margus Veanes. In submission.

Stream Processing with DependencyGuided Synchronization. Konstantinos Kallas,^{*} Filip Niksic,^{*} Caleb Stanford,^{*} and Rajeev Alur, Principles and Practice of Parallel Programming (PPoPP), February 2022. Extended version; 2Minute Elevator Pitch (September 2019); Poster (October 2019)

Correctness in Stream Processing: Challenges and Opportunities. Caleb Stanford, Konstantinos Kallas, and Rajeev Alur, Conference on Innovative Data Systems Research (CIDR), January 2022. Slides; Video

Symbolic Boolean Derivatives for Efficiently Solving Extended Regular Expression Constraints. Caleb Stanford, Margus Veanes, and Nikolaj Bjørner, Programming Language Design and Implementation (PLDI), June 2021. Slides; Talk (lightning and full)

Synchronization Schemas.^{†} Rajeev Alur, Phillip Hilliard, Zachary Ives, Konstantinos Kallas, Konstantinos Mamouras, Filip Niksic, Caleb Stanford, Val Tannen, and Anton Xue, Invited Contribution to Principles of Database Systems (PODS), June 2021. Invited paper.

DiffStream: Differential Output Testing for Stream Processing Programs. Konstantinos Kallas,^{*} Filip Niksic,^{*} Caleb Stanford,^{*} and Rajeev Alur, ObjectOriented Programming, Systems, Languages, and Applications (OOPSLA), November 2020. Slides; Video

Streamable Regular Transductions.^{†} Rajeev Alur, Dana Fisman, Konstantinos Mamouras, Mukund Raghothaman, and Caleb Stanford, Theoretical Computer Science (TCS), February 2020.

DataTrace Types for Distributed Stream Processing Systems. Konstantinos Mamouras, Caleb Stanford, Rajeev Alur, Zachary Ives, and Val Tannen, Programming Language Design and Implementation (PLDI), June 2019. Video Abstract

Modular Quantitative Monitoring.^{†} Rajeev Alur, Konstantinos Mamouras, and Caleb Stanford, Principles of Programming Languages (POPL), January 2019. Slides; Video

Interfaces for Stream Processing Systems.^{†} Rajeev Alur, Konstantinos Mamouras, Caleb Stanford, and Val Tannen, Principles of Modeling: Festschrift Symposium in honor of Edward A. Lee, October 2017. Invited paper.

AutomataBased Stream Processing.^{†} Rajeev Alur, Konstantinos Mamouras, and Caleb Stanford, International Colloquium on Automata, Languages, and Programming (ICALP), July 2017. Slides
Software

Guided incremental digraphs: a data structure for incrementally tracking live and dead states for SMT applications.

dZ3: A new constraint solver for regular expressions, now the default in Z3. (benchmarks; experimental scripts)

DiffStream: Differential testing for Apache Flink programs.

Flumina: A programming model for online applications with parallelizable synchronization primitives and predictable semantics.

Data transducers: A generalpurpose intermediate representation for online monitoring with performance guarantees.
Other Projects

Internship at Microsoft, Research in Software Engineering Group (RiSE) (summer 2020)
I worked on a new constraint solver in the satisfiability modulo theories (SMT) solver Z3 for solving regular expression constraints on strings, based on an incremental (lazy) technique called symbolic derivatives. We support Boolean combinations of regular expression constraints more efficiently than competing solvers. For more information, see our PLDI21 paper or try out Z3 here.

Internship at Amazon Web Services, Automated Reasoning Group (ARG) (summer 2019)
I developed tools to automate the security review process at AWS. I leveraged SMTbased technology built at ARG (see Semanticbased Automated Reasoning for AWS Access Policies (FMCAD 2018) and this AWS blog post) to analyze the permissions configurations of cloud resources. I used this analysis in conjunction with other account data to more easily detect AWS account configurations deviating from security best practice. More information about ARG can be found at Byron Cook’s page.

Formal verification of properties of knowledge (spring 2016)
My undergraduate capstone project at Brown was on verifying the solutions to several epistemic logic puzzles in the Alloy programming language. The abstract is here. You can also download the Alloy files and run them yourself; they are here.

Undergraduate research (summer 2015)
I attended the Complexity Across Disciplines Undergraduate Math REU in summer 2015. The work was in graph theory and combinatorics related to computational biology. Here are some slides on our work and a final poster which we presented at the Joint Math Meetings (JMM) in January 2016, winning an Outstanding Presentation Award.
We studied a particular operation, called a ‘contextdirected reversal’, on signed permutations — permutations of 1,…,n where each element is additionally given a sign. The main result classifies exactly which signed permutations are sortable by contextdirected reversals. We are unable to provide a formula for the number of such signed permutations, but we relate signed permutations to a subclass of graphs, and provide a formula for the number of graphs sortable by an analogous graph operation. We prove that asymptotically it is 1/3 of all graphs.
Exposition

CIS 198: Rust Programming (spring 2021)
I was the instructor for CIS198: Rust Programming at UPenn, an undergraduate introduction to Rust. The course lecture notes are publicly available.

Geodistributed stream processing systems (spring 2020)
Existing distributed stream processing systems (DSPS) rely on sending data to a central computing cluster or data center to be processed, which means there are fundamental limits to (1) the endtoend latency and (2) the network bandwidth used to communicate with the data center, particularly when the data to be processed is highly geodistributed. In this survey, I look at some works which extend DSPS to the geodistributed setting by executing stream processing jobs over a geodistributed network of nodes. This area is not well explored, and some of the works surveyed are fairly preliminary. This was for my WPE II exam (a written and oral presentation requirement in the University of Pennsylvania CIS doctoral program). Here is the written report and slides.

Online, noregret machine learning on large sets of experts (spring 2018)
Classical algorithms for online, noregret learning from expert advice (e.g. randomized weighted majority) work efficiently only for a small number of experts. Specifically, the streaming algorithm in this setting takes O(n) time to process each data item, where n is the number of experts. For a project in CIS 625 (Computational Learning Theory), we discussed some of the literature that attempts to extend the ideas of noregret learning to efficient algorithms for large or even infinite classes of experts. To see our view on two very different approaches, see the project report.

SPEED computational complexity estimation (fall 2016)
For a project in CIS 673 (ComputerAided Verification), I gave a presentation on the paper SPEED: precise and efficient static estimation of program computational complexity. You can read my take on the paper in my project report, or take a look at the slides.

Model theory notes (spring 2015)
With a group of 8 other students, I ran a group independent study project (GISP) on model theory, in the spring semester of 2015. The website for this class may be still accessible here. Here are some of the notes I wrote for the class on natural deduction, filters and ultrafilters, and ultrapowers.