Overview

This part of the implementation manual describes in detail the design of the various components of the dataflow system, and then how all these components fit together. It is expected that this should help a programmer understand the source code and extend the system if necessary.

As mentioned earlier the main components are

Each of these modules is described in some detail below.

User interface design

The user-interface is written in FLTK, and was constructed using Fluid. It provides the user with a palette of modules which can be used to construct his dataflow. The dataflow is represented as a graph with lines representing arcs and buttons representing nodes. The nodes can be moved around, deleted and connections between modules can be added or deleted.

DflowWindow class

This is a Fl_Window subclass which represents the main window.

ModuleBrowser class

This class is a static class used to initialize the Module Palette. The module description file parser goes here. It reads in the module library file and initializes the list of available modules.

ModuleButton class

A subclass of the Fl_Button widget called the ModuleButton was created to represent nodes in the dataflow. It overrides the draw() method, to show its input and output ports; and the handle() method to facilitate connections adn moving these objects around. A ModuleButton maintains a pointer to its corresponding Module object in its user_data() pointer.

DflowPanel class

This panel is the canvas on which the dataflow is created. It is a subclass of Fl_Scroll and overrides the draw() method to draw the connections between ModuleButtons. ModuleButtons are added as children of the DflowPanel.

ModuleClipboard class

This is a temporary storage for connection endpoints. When a ModuleButton port is selected, the (ModuleButton, port) combination is stored with this object. When both endpoints are specified the ModuleGraph::connect() method uses these endpoints to run the connection algorithm.

Connection class

This represents a connection. ie a (startModuleButton, startPort, endModuleButton, endPort) tuple.

Point class

Represents a (x, y) point in the DflowPanel.

DflowStatus class

A subclass of the Fl_Box class which represents the status bar. It has a display(char *) method used to update the status message.

StatusBar class

A static class with a convenience method to update the status bar message.

Interaction with the scheduler

The user interface and scheduler both work off a common data structure called the ModuleGraph. When modules are added to the dataflow, a new ModuleButton is created and added to the DflowPanel. Along with this a Module instance is added to the ModuleGraph. A pointer to the Module is stored with the ModuleButton. A connection is made between two modules by first selecting an output port of the first module and then an input port of the next module. As part of the connection algorithm, a new Connection object is added to the list of connections in the ModuleGraph. In addition to this pointers are set to the graph data structure.

Structure

The user interface hierarchy is better explained in the following diagram.

John A Konglathu


Scheduling the dataflow

The dataflow is characterized by the set of operations (modules that process data) and the dependencies between them. We use the static model where an operator (denoted by a node in the dataflow graph) is executable when tokens (data) are present on the input edges.

A module in our interface is defined as an independent program that takes input from files and produces output that is stored in files. However, the model can be extended to accomodate library functions also. Each module (program) is described by a module descriptor class which defines the module completely. This is similar to the data type definition in standard programming languages. The module descriptor consists of the set of optional flags, and the set of input and output ports. The optional flags are command line strings that are passed to the program while the ports are described by file names. The modules that appear in the dataflow graph are specific instances defined by the module descriptor class. The module class consists of data that pertain to the connections made in the dataflow graph, which are consistent with its descriptor.

The module also has a specific function, run, that executes the operation defined by the corresponding program. This function executes when all the inputs of the module are available. Since the dataflow graph has no loops, the scheduler can try to execute all the modules whose inputs are defined until all the modules are executed. In order to accomodate loops, we need special nodes in the graph.

In the next few sections, we will describe the structure of all the classes used in the dataflow backend. The structure of the classes are designed to work with the /usr/Image programs but can easily be extended to other operations like function calls also.

Module Descriptor class

It is used to describe the module completely. It consists of a set of input and output ports, a set of flags, and the command used to execute the program.

For example, the gcc program takes files as input and produces executables as output. Options like -c determine the type of output produced.

Port Descriptor class

In our interface, the ports pertain to data files, but can be used to accomodate functions also. The port descriptor defines the type of data that is present in the port and the command line paramter (if any) that must be used to run the program and also its position in the command line string.

In the gcc example, the -o flag indicates the file to place the output in. The convolve program assumes that the files are passed in the order input file, kernel file, and output file

Flag Descriptor class

Like the port descriptor, the flag descriptor carries the string that must be passed as a command line argument to the program. Unlike port descriptors, flags are assumed position independent.

Again, returning to the gcc example, the -c option is a flag that can be placed anywhere in the command line.

Module class

The module class defines the module as it appears in the dataflow graph. It is not only consistent with the module descriptor of the module that it represents, but also defines the dependencies as present in the dataflow. The module class, in an abstract sense, is an instantiation of the module descriptor where the data dependencies are defined. In our implementation, the module has a reference to the module descriptor, the command that is used to execute the module and the set of input and output edges that determine the data dependencies in the dataflow. The module can execute only when all its inputs are valid and once execution is complete, it validates all its outputs. In case of output modules (modules that generate no output), the execution can be concurrent with the rest of the dataflow.

Output class

The module is a data processing entity that processes input data and generates output data. In our interface, the inputs and outputs are stored as files. Once a module has sucessfully completed execution, the outputs of the module are validated which in turn can cause other modules to proceed to completion. The output class has a reference to an automatically generated file name and the port in the module descriptor it corresponds to.

Input class

The module in the dataflow graph has a set of inputs which must be valid before the module can be executed. These inputs may be the outputs of other modules in the dataflow. We choose to store a reference to the output data of the other module in each module's input. Once the output of the module is valid, the corresponding input is also automically validated. In addition to the reference to the output, the input class also has the port that the input corresponds to in the module descriptor.

Putting it all together

We will now describe the integration of the parts of the system into the whole interface. The user interface interacts with the user to construct a dataflow from the module descriptors. The dataflow consists of a set of modules and the connections between them. The dataflow is embodied in the inputs of the modules by references to outputs of other modules.

Once the entire dataflow is defined, the user can execute the dataflow. The scheduler looks at executable modules (defined by those modules whose inputs are valid) and runs them using the run method defined in the module object. Once all the modules are executed once, the scheduler transfers control back to the user interface. The modules are executed exactly once as there are no loops in the dataflow.

The user interface interacts with the scheduler part through a static class called the ModuleGraph. This class has specific methods which allow the dataflow to be constructed and run. The user interface is seperated from the working of the dataflow.

Sadagopan Rajaram