in this series
- Let's try: Apache Beam part 1 – simple batch
- Let's try: Apache Beam part 2 – draw the graph
Continue for part 2.
As we know that Apache Beam pipeline will process like a waterfall from top to bottom, and also no cycle. This is what we call "DAG" or "Directed Acyclic Graph".
We write Beam code in Python and we also can generate a DAG in visual figure using a few steps.
1. Install Graphviz
graphviz is a common package for generating any diagram using DOT language. We need to install this first and there are many installation method depends on your platform. See all download list at https://graphviz.org/download/
For me, I prefer using
brew install graphviz
graphviz has been installed properly with the command.
dot -V # capital `V`
Then we should see its version.
read more about
brew at link below.
2. Apply RenderRunner in Beam
Now we go back to our Beam code and update the code like this.
We are using
RenderRunner to generate a DOT script for
graphviz. Read more about this runner at this doc.
Also we put
beam.options.pipeline_options.PipelineOptions() for the parameter
options as well or it won't generate a figure.
Let's say we have a complete code like this one.
What we should do next is to run this with parameter
--render_output="<path>". For example:
python3 main.py --render_output="dag.png"
Therefore we will see "dag.png" as follows.
However, if we name the step like this.
The figure it generated also has the name we put.