r. alexander milowski, geek

Computational Pipelining Illustration

gap-pipeline.png

gap-pipeline-data.png

gap-pipeline-time.png

Ideas

  1. A logical pipeline specification might just be an XProc pipeline with all custom steps where each step translates to a sub-pipeline.
  2. If a specific sub-pipeline has certain characteristics, the agent will not really run the pipeline. Rather, it will orchestrate running the sub-pipelines independently as resources become available and possibly on other systems (e.g. Google/Hadoop MapReduce style).
  3. Each custom step outputs artifacts to be deposited into a repository regardless of how they are executed.  These artifacts become resources on the web and have a URL for addressing them.
  4. As a result, a pipeline's state can be restored by looking at the artifacts and their metadata contained in the repository for a given execution session.
  5. If done correctly, there should be no difference between running a pipeline that actually runs as a single execution and one where every custom step is actually a separate process run at possibly different times.

New Requirements

  1. We want a logical specification of the process like in the very first diagram for the Integer Gap computation.
  2. We want to hide the data manipulation and other implementation specific steps.
  3. We want to save the data at specific points but not necessarily every bit of data that flows between every implementation step.
  4. We need to be able to stop, re-start, and/or transport computations from place to place depending on the resources or constraints of the local environment.
  5. We need a data repository and overall agent that guides the process stepwise through the logical pipeline depending on what has been returned into the repository.

Problems with Single Pipelines

  • A single pipeline requires that all steps complete for the result to be produced.
  • If a single step has a time or space complexity that exceeds the current capacity of the process, all the results of the steps are lost.
  • If the pipeline allows simultaneous execution, two complex steps may exhaust the resources of the system even though each might execute individually.
  • Also, the data that flows between the steps might be very interesting if the result of the pipeline is interesting.
  • But there is no guarantee that you'll have that data between the steps.

Gap Pipeline Overall Complexity

  • The Gröbner Basis computation is notoriously for taking long to compute (e.g. hours).
  • The irredundant irriducibile decomposition computation takes even a longer amount of time (e.g. days).
  • Further, this decomposition can require quite a bit of space to compute.
  • The use of the decomposition may generate thousands (or millions) of very small linear programs to solve and sort by solution.
  • The whole process suffers from scalability issues due to the time and resources needed to complete the process.

Gap Pipeline Time Complexity

Gap Pipeline data time diagram

Gap Pipeline Data Complexity

Gap Pipeline data complexity diagram
  • The irredundant irriducibile decomposition can be very large.
  • That generates a lot of linear programs to solve.
  • That generates a lot of solutions to sort through.

Gap Computation Pipeline

Gap Computation Pipeline
  • Each step is color coded according to the complexity of the computation.
  • In the middle is a loop over each LP generated from each component of the decomposition.
  • The result is the maximum objective function value (a single rational number) from amongst all the solutions.

An Example: Computing Integer Programming Gaps

  • This computation comes from optimization theory.
  • The Integer Programming Gap[1] is measure between the optimal value of the integer solution and the linear relaxation of the problem.
  • The gap is a measure of the quality of the linear relaxation as an estimate.
  • Often the integer solution is unobtainable but the linear relaxation is easy to compute.
  • ...but is it any good?  That is what the gap measures.
  1. R. A. Milowski. Computing irredundant irreducible decompositions and the scarf complex of large scale monomial ideals. Master’s thesis, San Francisco State University, 2004.

An Illustration of Computational Pipelining

R. Alexander Milowski