This part of the implementation manual describes in detail the design
of the various components of the dataflow system, and then how all
these components fit together. It is expected that this should help a
programmer understand the source code and extend the system if
As mentioned earlier the main components are
- The user interface that is used to edit a dataflow
- The scheduler that executes the dataflow
- A load/save dataflow module to preserve a dataflow
- A parser to load the module pallette.
Each of these modules is described in some detail below.
User interface design
The user-interface is written in FLTK, and was constructed using
Fluid. It provides the user with a palette of modules which can be used
to construct his dataflow. The dataflow is represented as a graph
with lines representing arcs and buttons representing nodes. The nodes
can be moved around, deleted and connections between modules can be
added or deleted.
This is a Fl_Window subclass which represents the main window.
This class is a static class used to initialize the Module
Palette. The module description file parser goes here. It reads in the
module library file and initializes the list of available modules.
A subclass of the Fl_Button widget called the ModuleButton was created
to represent nodes in the dataflow. It overrides the draw() method, to
show its input and output ports; and the handle() method to facilitate
connections adn moving these objects around. A ModuleButton maintains
a pointer to its corresponding Module object in its user_data() pointer.
This panel is the canvas on which the dataflow is created. It is a
subclass of Fl_Scroll and overrides the draw() method to draw the
connections between ModuleButtons. ModuleButtons are added as children
of the DflowPanel.
This is a temporary storage for connection endpoints. When a
ModuleButton port is selected, the (ModuleButton, port) combination is
stored with this object. When both endpoints are specified the
ModuleGraph::connect() method uses these endpoints to run the
This represents a connection. ie a
(startModuleButton, startPort, endModuleButton, endPort) tuple.
Represents a (x, y) point in the DflowPanel.
A subclass of the Fl_Box class which represents the status bar. It has
a display(char *) method used to update the status message.
A static class with a convenience method to update the status bar message.
Interaction with the scheduler
The user interface and scheduler both work off a common data structure
called the ModuleGraph. When modules are added to the dataflow, a new
ModuleButton is created and added to the DflowPanel. Along with this a
Module instance is added to the ModuleGraph. A pointer to the Module
is stored with the ModuleButton. A connection is made between two
modules by first selecting an output port of the first module and then
an input port of the next module. As part of the connection algorithm,
a new Connection object is added to the list of connections in the
ModuleGraph. In addition to this pointers are set to the graph data
The user interface hierarchy is better explained in the following
John A Konglathu
Scheduling the dataflow
The dataflow is characterized by the set of operations (modules
that process data) and the dependencies between them. We use the
static model where an operator (denoted by a node in the dataflow
graph) is executable when tokens (data) are present on the input
A module in our interface is defined as an independent program that
takes input from files and produces output that is stored in
files. However, the model can be extended to accomodate library
functions also. Each module (program) is described by a module
descriptor class which defines the module completely. This is similar
to the data type definition in standard programming languages. The
module descriptor consists of the set of optional flags, and the
set of input and output ports. The optional flags are command
line strings that are passed to the program while the ports are
described by file names. The modules that appear in the dataflow graph
are specific instances defined by the module descriptor class. The
module class consists of data that pertain to the connections made in
the dataflow graph, which are consistent with its descriptor.
The module also has a specific function, run, that executes the operation
defined by the corresponding program. This function executes when all
the inputs of the module are available. Since the dataflow graph has
no loops, the scheduler can try to execute all the modules whose
inputs are defined until all the modules are executed. In order to
accomodate loops, we need special nodes in the graph.
In the next few sections, we will describe the structure of all the
classes used in the dataflow backend. The structure of the classes are
designed to work with the /usr/Image programs but can easily
be extended to other operations like function calls also.
Module Descriptor class
It is used to describe the module completely. It consists of a set of
input and output ports, a set of flags, and the command used to
execute the program.
For example, the gcc program takes files as input and produces
executables as output. Options like -c determine the type of
Port Descriptor class
In our interface, the ports pertain to data files, but can be used to
accomodate functions also. The port descriptor defines the type of
data that is present in the port and the command line paramter (if
any) that must be used to run the program and also its position in the
command line string.
In the gcc example, the -o flag indicates the file
to place the output in.
The convolve program assumes that the files are passed in the
order input file, kernel file, and output file
Flag Descriptor class
Like the port descriptor, the flag descriptor carries the string that
must be passed as a command line argument to the program. Unlike port
descriptors, flags are assumed position independent.
Again, returning to the gcc example, the -c option
is a flag that can be placed anywhere in the command line.
The module class defines the module as it appears in the dataflow
graph. It is not only consistent with the module descriptor of the
module that it represents, but also defines the dependencies as
present in the dataflow. The module class, in an abstract sense, is an
instantiation of the module descriptor where the data dependencies are
defined. In our implementation, the module has a reference to the
module descriptor, the command that is used to execute the module and
the set of input and output edges that determine the data dependencies
in the dataflow. The module can execute only when all its inputs are
valid and once execution is complete, it validates all its outputs.
In case of output modules (modules that generate no output), the
execution can be concurrent with the rest of the dataflow.
The module is a data processing entity that processes input data and
generates output data. In our interface, the inputs and outputs are
stored as files. Once a module has sucessfully completed execution,
the outputs of the module are validated which in turn can cause other
modules to proceed to completion. The output class has a reference to
an automatically generated file name and the port in the module
descriptor it corresponds to.
The module in the dataflow graph has a set of inputs which must be
valid before the module can be executed. These inputs may be the
outputs of other modules in the dataflow. We choose to store a
reference to the output data of the other module in each module's
input. Once the output of the module is valid, the corresponding input
is also automically validated. In addition to the reference to the
output, the input class also has the port that the input corresponds
to in the module descriptor.
Putting it all together
We will now describe the integration of the parts of the system into
the whole interface. The user interface interacts with the user to
construct a dataflow from the module descriptors. The dataflow
consists of a set of modules and the connections between them. The
dataflow is embodied in the inputs of the modules by references to
outputs of other modules.
Once the entire dataflow is defined, the user can execute the
dataflow. The scheduler looks at executable modules (defined by those
modules whose inputs are valid) and runs them using the run method
defined in the module object. Once all the modules are executed once,
the scheduler transfers control back to the user interface. The
modules are executed exactly once as there are no loops in the
The user interface interacts with the scheduler part through a static
class called the ModuleGraph. This class has specific methods which
allow the dataflow to be constructed and run. The user interface is
seperated from the working of the dataflow.