Computational Pipelining Illustration
gap-pipeline.png
published 2011-11-30 16:20:43.598-08:00
gap-pipeline-data.png
published 2011-11-30 16:20:40.386-08:00
gap-pipeline-time.png
published 2011-11-30 16:20:37.258-08:00
Ideas
- A logical pipeline specification might just be an XProc pipeline with all
custom steps
where each step translates to a sub-pipeline.
- If a specific sub-pipeline has certain characteristics, the
agent
will not really run the pipeline. Rather, it will orchestrate running the sub-pipelines independently as resources become available and possibly on other systems (e.g. Google/Hadoop MapReduce style).
- Each custom step outputs artifacts to be deposited into a repository regardless of how they are executed. These artifacts become
resources
on the web and have a URL for addressing them.
- As a result, a pipeline's state can be restored by looking at the artifacts and their metadata contained in the repository for a given execution session.
- If done correctly, there should be no difference between running a pipeline that actually runs as a single execution and one where every custom step is actually a separate process run at possibly different times.
New Requirements
- We want a logical specification of the process like in the very first diagram for the Integer Gap computation.
- We want to hide the data manipulation and other implementation specific steps.
- We want to save the
data
at specific points but not necessarily every bit of data that flows between every implementation step.
- We need to be able to stop, re-start, and/or transport computations from place to place depending on the resources or constraints of the local environment.
- We need a data repository and overall
agent
that guides the process stepwise through the logical pipeline depending on what has been returned into the repository.
Problems with Single Pipelines
- A single pipeline requires that all steps complete for the result to be produced.
- If a single step has a time or space complexity that exceeds the current capacity of the process, all the results of the steps are lost.
- If the pipeline allows simultaneous execution, two complex steps may exhaust the resources of the system even though each might execute individually.
- Also, the data that flows between the steps might be very interesting if the result of the pipeline is interesting.
- But there is no guarantee that you'll have that data between the steps.
Gap Pipeline Overall Complexity
- The Gröbner Basis computation is notoriously for taking long to compute (e.g. hours).
- The irredundant irriducibile decomposition computation takes even a longer amount of time (e.g. days).
- Further, this decomposition can require quite a bit of space to compute.
- The use of the decomposition may generate thousands (or millions) of very small linear programs to solve and sort by solution.
- The whole process suffers from scalability issues due to the time and resources needed to complete the process.
Gap Pipeline Time Complexity
Gap Pipeline Data Complexity
An Example: Computing Integer Programming Gaps
- This computation comes from optimization theory.
- The Integer Programming Gap
[1] is measure between the optimal value of the integer solution and the linear relaxation of the problem.
- The gap is a measure of the quality of the linear relaxation as an estimate.
- Often the integer solution is unobtainable but the linear relaxation is easy to compute.
- ...but is it any good? That is what the gap measures.
An Illustration of Computational Pipelining