# Big ideas in Computer Science and Engineering

André DeHon
Started: July 31, 1999
Last Update: August 21, 1999
• Mechanical procedures exist for finding the solutions to problems. That is, for many questions/problems, we can write down a series of steps and simple predicates which define precisely how to find the correct solution. This process is completely mechanical, not requiring any "human" judgment to complete.
• We can build physical machines which implement these procedures and perform the calculations for us.
• There are simple, universal models of computing which capture the basic capabilities of these machines (e.g. automata, pushdown automata, Turing Machines).
• The Turing Machine model is "robust" in the sense that "reasonable" additions to it, or alternative formulations of computing models have the same asymptotic power of computability (Church's thesis).
• "reasonable" meaning they, for the most part, correspond to things we imagine a real machine could support. In particular, there are stronger models when the machine is allowed to do "unreasonable" things like consult an oracle.
• Deterministic/guaranteed procedures do not exist for all problems (Halting Problem, uncomputability). An important component of CS theory is to classify problems as computable or uncomputable.
• Of the problems which are computable, tasks have different computational difficulty (complexity). Another important component of CS theory allows us to analyze algorithms and assess their complexity. (complexity classes [P,NP,PSPACE, IP, ...], order analysis [O(), Omega(), Theta()])
• techniques for analyzing worst-case complexity of particular algorithms
• techniques for analyzing expected-case complexity
• techniques for 'proving' bounds on the complexity of a computation (reductions)
• classification of a host of well known problem archetypes
• known good algorithms for well known problem archetypes
• common idioms/solution techniques (e.g. divide-and-conquer, linear programming, dynamic programming, graph algorithms)
• There are alternatives to directly solving hard problems optimally. CS theory also tells us what we can give up in the guarantee of solution quality to reduce computational complexity.
• approximation algorithms -- can find solution in less time, within some bound of the optimal solution
• online algorithms -- consequences of receiving problem information incrementally instead of all at once
• polynomial heuristic solutions -- algorithms with shorter runtime which won't necessarily find the optimal solution
• randomized algorithms -- algorithms which find a solution in a given time complexity within a given bound of optimum or with a certain probability of being correct. (includes simulated annealing)
• Checking whether you have a valid (acceptable) solution is typically much easier than computing the solution itself.
• When it's not clear how to deterministically find a good (or optimal) solution, this fact that checking is moderately easy allows us to find a good solution by generating candidates and testing for validity. An important component of CS art and practice is how to best "search" a space of solutions.
• branch-and-bound
• pruning (knowledge-based, induction)
• ordering heuristics -- (here heuristic is for time optimization)
• alpha-beta-search
• The complexity of computations can be used for authentication and privacy. That is, we can select computations to encode/decode data that is "tractable" when all the information is available, but untractable when key information is missing.
• Computations may be distributed among many communicating agents, often reducing the time necessary to compute a result.
• shape of parallelism-time curve
• limits to parallelizability
• techniques for parallelization (thinking about and managing parallelism)
• pipelined/assembly-line (stream)
• CSP / coroutines
• SPMD/vector
• functional units (VLIW)
• BSP (coarse-grained, barrier synchronization)
• visibly atomic
• sequential semantics with speculative concurrency
• assume will get no dependencies or can predict the changes which will be made by "earlier" operations, check assumptions before "committing" operation.
• e.g. speculative loads, prefetch, instruction issue, optimistic database transaction concurrency, virtual time
• Communication among agents is expensive and is often a major bottleneck limiting practically achievable parallelism.
• algorithms/techniques to minimize the need for communications
• communication costs and effects
• A simple measure of this communication bottleneck is the bisection bandwidth of a network. It places an upper bound on the amount of data a machine can transfer and hence the amount of communication possible.
• Since wires typically take up finite space, the bisection bandwidth also implies a lower bound on the physical area/resources required by a machine
• network design
• VLSI theory
• Large systems have an increased likelihood of encountering faulty components. Distributed systems may put the control and management of resources out of the hands of a single entity or organization. CS deals with how we build reliable computing systems in the face of unreliable components.
• fault detection and correction
• faulty memory
• noisy/faulty communications
• noisy/faulty computation nodes
• The desired computation can be captured precisely and unambiguously. Computer science deals with how we construct languages to describe computations, and how we make them convenient for human use.
• languages
• syntax (grammars)
• semantics (denotational, operational)
• A description which is convenient for the user is not necessarily efficient for implementation on a physical machine.
• verbose symbols and syntax (minimize errors)
• abstraction (hide implementation details, also optimization opportunities)
• generality (written to handle general (more expensive) case)
• intentional redundancy (catch errors early)
• We do not have to emulate the user's description of a computation to implement it correctly. We simply need to implement a computation which gives the same visible result (has the same meaning) as the user's computation (compilation, CAD).
• semantic transformations
• apply all information known at transformation time -- leaving only residue of computation for unknown data
• eliminate redundancy
• collapse across user abstractions
• reorder/reformulate operation to tailor to machine costs
• The representation used for data in a computation can have a big effect on efficiency of operation and ease of human comprehension
• effects on computational and storage efficiency (e.g. arrays and fixed structures vs. tagged lists, red-black trees; sparse vs. dense representations of data)
• easing human comprehension (e.g. rich data structures)
• Our physical world allows us to build very large computer systems. The practical limit to the useful size of a computer system (or at least, the size of the function efficiently supported by a computer system) is almost always human comprehension, not the physical capacity required. Consequently, a major concern of computer science is techniques to manage and reduce complexity.
• abstractions/information hiding
• modularity
• problem decomposition, hierarchy
• component isolation
• invariants
• common idioms/paradigms for organization (e.g. procedures, frames, streams, objects, APIs, servers)
• A computing machine can be implemented out of X.
• X=mechanical interactions, relays, tubes, transistors, DNA, molecules...
• common/useful abstractions (e.g. digital abstraction, flops, memory, communication channels)
• disciplines achieving correctness in the face of noise/uncertainty (e.g. voltage levels, timing models and disciplines)
• We can extend our notion of abstraction/information hiding to machine design. In particular, the machine code and operating environment for a machine represents the abstract interface it provides to the outside world. Any implementation which provides the same semantics to the machine code is viable.
• consequently, we have the notion of ISAs or architecture families which all run the same machine code but which admit to a variety of implementations (e.g. IBM 360, VAX, MIPS, SPARC, x86).
• different implementation may be cheaper/slower versus more costly/faster
• implementations can evolve to exploit new technology without changing the interface (the machine code)
• Machine code is just another language specifying precisely the computation to be performed.
• a computational engine need only provide the intended semantics, leaving it plenty of freedom as to how it implements the semantics.
• like any other language, it can be translated from the input format to another native format (perhaps another machine's native machine format) as long as it provides the original semantics (e.g. softPC, binary translation)
• The engineering side of computer science is about: How do we minimize the resources we use in order to perform a computation (set of computations).
• minimize: time, energy, area (hardware: memory, wires, compute)
• Physical machines have finite/limited real resources.
• Typically, the state of a physical resource can be stored in memory more compactly (in less physical resources) than the physical resource itself. (this is a pragmatic observation about typical implementation technologies in common use.)
• We can provide the abstraction of more physical resources by virtualizing the physical resources. That is, sharing the physical resource among multiple uses over time. To accomplish this, we store the state of a particular usage of the physical resources in cheaper storage.
• e.g. virtual memory, virtual channels, multitasking, time-sharing
• Typically, the abstraction easiest for a user to reason about is that the there are "infinite" virtual resources (i.e. common aphorism is that architectures should have 0, 1, or an infinite number of a given resource type)
• Pragmatically, since we must save state for each virtual instance, we are limited by the finiteness of the physical memory storage medium holding the state. (This, of course, is also the way in which our physical machines fail to really be as powerful as Turing Machines.)
• Different logical tasks running on the same hardware can be isolated from each other. (e.g. virtual memory, process isolation)
• Uncertainties or variations in the running time of portions of concurrent (or virtually concurrent) tasks can lead to different sequences of events for the same program (and in some cases even the same input data). i.e. the detailed operational behavior is non-determinate.
• sources of non-determinism: e.g.
• variable response time due to variable loading from other tasks sharing some/or all of the required resources
• variable interleaving/scheduling of activation due to scheduling policy, history in scheduling process, or even intentional randomness in scheduling decisions
• data dependent run times for portions of task
• variable latency of operation due to state/history effects (e.g. caching, paging)
• non-determinism can be hard to cope with and manage complexity, so often it is an effect we try to minimize or eliminate.
• programming constructs, guards, invariants that make non-determinate scheduling invisible to the functional behavior of programs.
• Specialized procedure/machine can generally be cheaper/faster than general machine. i.e. early bound data/information can be folded into formulation of computation to reduce work necessary.
there is room to map out a useful theory and basic principles here.
• many pragmatic examples of this
• special-purpose versus general-purpose hardware, components
• special-case optimizations
• partial evaluation / program specialization
• run-time code generation (e.g. synthesis)
• working hypothesis: implementation cost is a function of information known and native complexity
• Computations occur at different time scales (rates). To minimize work, when possible, hoist a computation out of a high rate region into a slower region.
• trivial example: loop invariants/hoisting
• binding time identification and exploitation
• compilation vs. interpretation; i.e. take time up front to compute more efficient computational residue to perform at run time
• install-time, run-time specialization
• The data we need to store or transmit only needs to be in proportion to the information content.
• compression (minimize storage/transmission) --> information theory
• There is computation involved in compression/decompression, so there is usually a tradeoff between computational costs and storage/transmission costs.
• The problems we typically see contain structure. Exploiting that structure allows us to build cheaper (typical case) building blocks than we would otherwise. e.g.
• locality in memory reference (time and space) --> caching
• locality in communications --> Rent's Rule (limit necessary bisection bandwidth)
• common sub-expressions --> multi-level logic, exploit structure
• The set of events or cases which a computation or subsystem handles is often highly localized. Typical or average case performance can be improved significantly by exploiting this localization.
• Alternately, good average case performance can be maintained at substantially lower cost.
• Aphorism: make the common case fast (fast case common)
• Worst-case performance can suffer as a consequence, so such optimizations may be inappropriate when guaranteed/bounded performance is required for all cases.
• In some cases the worst-case behavior of an algorithm or problem instance for a machine can be avoided (with very high probability) by adding intentional randomization. This allows us to turn worst-case scenarios into average-case scenarios. (e.g. best examples are in routing and retry scenarios for networking. Any deterministic algorithm would have bad worst-case (hot-spot) performance, whereas randomization can avoid this problem without requiring global knowledge of the system state.)
• Feedback is the key to diagnosing discrepancies between one's model of the world and reality. This is really just the heart of the scientific method. It should be used by developers to improve programs (debug functional problems, identify and improve performance problems). Moreover, it can be embedded in programs so that they adapt to their data and environment.
• A systems built of a fixed set of resources of different types must provide them in a fixed ratio. However, tasks often require resources in different proportions. Consequently, some resources will necessarily go underused or wasted. The trick is usually to make sure the most expensive resource is used efficiently.
• A data structure or computation can either be dynamic or static.
• Static structures and computations can be very efficient when the size and shape of the computation is constant or has little variance.
• Dynamic structures/computations are necessary when the size or scope of the problem is unbounded. They cost more per element or item, but they only have to be as large (as complex) as a particular problem instance.
• more...
• Aphorism: systems can be made more general by adding a level of indirection; this unbounds the set of things that can be done, but adds a cost for the indirection.

André DeHon