Eric Koskinen
Visiting Assistant Professor

Computer Science Department
Courant Institute of Mathematical Sciences
New York University


ejk@cims.nyu.edu
 
Biography   Research   Publications
 

My research is centered around devising mathematical techniques that aid the development of software that is safe and able to take advantage of multicore architectures for improved performance. The techniques I develop are geared toward forming a foundation for better ways of developing real industrial systems. The following are some representative recent projects.

Representative Projects

Guaranteeing Safety & Liveness of Software

[POPL 2011, CAV 2011 (Award), FMSD 2012, PLDI 2013]

We have recently developed mathematical results and tools that enable engineers to better understand the behavior of their implementations. Temporal logics (e.g. LTL, PSL, SVA, ITL) have become popular as they provide a fairly natural language for engineers and managers to specify safety and liveness properties. While decades of work have lead to tools for verifying temporal logic property of (finite-state) hardware systems, few viable tools exist for (infinite-state) software. Others have focused solely on either safety properties (SLAM, Blast, Astree, etc.) or liveness (RankFinder, Terminator, etc.) and adaptations of finite-state techniques are problematic. In the past few years, we have developed several novel techniques and algorithms that, together, make temporal verification for software a reality. The tool that we built proved a wide variety of nontrivial safety and liveness properties of systems code including the PostgreSQL database server, Apache web server, and Windows OS kernel that previously could not be proved.

Language support for concurrency

[PPoPP 2011, SPAA 2008, SPAA 2008, POPL 2010]

Introducing concurrency into software is often hampered by the interference problem: when one thread is modifying the shared state, modifications may occur due to other concurrently executing threads. A promising solution is for programmers to designate regions of code as atomic, indicating that they should appear to take effect instantaneously from the standpoint of other threads. For example, when modifying a shared hashtable, a thread may want to move a value to a new key without permitting a concurrent thread from observing the intermediate state where the value is not present:

  atomic { ht.remove(4); ht.set(5 => 'bar') }
Support for atomic has gained momentum in the form of transactional memory which will be adopted in GCC 4.7 and enjoys hardware support by IBM and in the Intel Haswell chip. Unfortunately, techniques designed for prototype short-lived transactions do not scale to long-lived transactions that will appear in realistic software. This is because they rely on a crude method of determining when two transactions conflict: they record memory locations that have been read or written and abort a transaction when it has accessed a memory location that has been written by another transaction. In practice, this leads to frequent aborts and thus performance problems. In our work on transactional boosting (PPoPP 2008), we questioned the viability of such systems and proposed an alternative that permits greater performance. In boosting, one characterizes the interaction between transactions by reasoning about whether their effects commute. This sequential, abstract-level reasoning permits far greater parallelism over traditional read/write-based STMs, which vastly over-approximate conflict and thus frequently abort valid transaction interleavings. We later provided a formal semantics for a broader class of transactions called coarse-grained transactions (POPL 2010). This incorporated several previous ad hoc methods of integrating abstract data-type libraries into transactions.

Understanding program performance

[PLDI 2009]

Modern software engineers lack sufficient tools for understanding the performance characteristics of their implementations. Traditional methods such as testing and performance profiling are useful, at best, in determining the average-case scenario for a particular hardware platform. In recent years we have shown how symbolic complexity bounds can be obtained automatically via static analysis. These bounds guide programmers toward efficiency as they are designing algorithms, and make them cognizant of when their algorithm is, say, exponential. We demonstrated that bounds can be obtained even for imperative programs with elaborate control-flow such as nested and multi-path loops. The tool that we built, Speed (PLDI 2009), was the first published result that reported experiences running a bound analysis on a real code-base, and it was quite effective. Working with a product team at Microsoft, we applied the tool to a significant server product with over 670,000 lines of code and were able to discover bounds for 90% of the procedures.

Understanding program behavior at runtime

[EuroSys 2008]

Often the source code of a system is unavailable, either because it is proprietary, or because it is too complex for developers to understand when diagnosing a system. Nonetheless, dynamic analyses can shed light on the behavior of such a system. We built a systems tool called BorderPatrol that monitors components of a distributed system and automatically determines request paths. We showed that our technique works with Zeus (a closed-source commercial product) as well as Apache, PostgreSQL, and TurboGears.

Improving support for atomic sections

[OOPSLA 2013]

A further limitation of existing STMs---optimistic as well as pessimistic---is that they are unable to fully exhaust the parallelism potential because they lack the means to guide concurrent threads toward choices that minimize interference. Thus, transactional memory implementations have to frequently abort and retry threads as they naively trample each other. We have recently found a new way of supporting atomic sections that permits threads to coordinate and exploit their inherent nondeterminism to achieve significantly more parallelism. In this new approach, threads decide what to do based on the potential future behavior of other threads. This stands in contrast with transactional-memory (TM) systems, in which conflict is based on what has been done in the past (operations/commits to the shared state). This form of coordination is serializable (a de facto concurrency correctness criteria) and has a vast performance improvement, often rivaling hand-crafted algorithms.

Ensuring correctness of programs with nondeterministic behavior

[PLDI 2013]

Modern software consists of many possible branching behaviors. Ensuring correctness often involves characterizing universal branching behaviors (that something must happen on every possible next step), existential branching behaviors (that there exists a next step where something is possible), and mixtures of the two. These kinds of behaviors are important in many applications, including planning, games, security analysis, disproving, precondition synthesis, environment synthesis, etc. Yet existing techniques for ensuring the correctness of such properties (in, for example, temporal logics such as CTL, CTL*, and mu-calculus) have limitions that have traditionally restricted their applicability to software (e.g. push-down systems only, universal path quantifiers only, etc). Wr recently introduced an automation strategy that has enabled verification of branching-time properties of programs where previous approaches failed. The method works reliably for properties with non-trivial mixtures of universal and existential temporal operators. The basis of the approach is the observation that existential reasoning can be reduced to universal reasoning if the system's state-space is appropriately restricted. This restriction on the state-space must meet a constraint derived from recent work on proving non-termination. The observation leads to a new route for implementation based on existing tools. We demonstrated that this approach is practical by applying a preliminary implementation to a set of benchmarks drawn from the Windows operating system, the PostgreSQL database server, and the SoftUpdates patching system.