D3.js Tips and Tricks: Sankey Diagrams: What is a Sankey Diagram?

The following post is a portion of the D3 Tips and Tricks document which is free to download. To use this post in context, consider it with the others in the blog or just download the pdf and / or the examples from the downloads page:-)

-------------------------------------------------------

What is a Sankey Diagram?

A Sankey diagram is a type of flow diagram where the ‘flow’ is represented by arrows of varying thickness depending on the quantity of flow.

They are often used to visualize energy, material or cost transfers and are especially useful in demonstrating proportionality to a flow where different parts of the diagram represent different quantities in a system.
Probably the most famous example of a Sankey diagram is Charles Minard’s Map of Napoleon’s Russian Campaign of 1812.

From Wikipedia;

“Étienne-Jules Marey first called notice to this dramatic depiction of the fate of Napoleon’s army in the Russian campaign, saying it defies the pen of the historian in its brutal eloquence. Edward Tufte says it “may well be the best statistical graphic ever drawn” and uses it as a prime example in The Visual Display of Quantitative Information.”

Wikipedia has a great explanation of the diagram type and there is a wealth of information dedicated it on the inter-web. I heartily recommend http://www.sankey-diagrams.com/ for all things Sankey!

So it would come as little surprise that Mike Bostock has developed a plugin for Sankey diagrams (http://bost.ocks.org/mike/sankey/) so that we can all enjoy Sankey goodness with lashings of D3.

How d3.js Sankey Diagrams want their data formatted

If we think of Sankey diagrams consisting of ‘nodes’ and ‘links’…

… the data that generates them must be formatted as nodes and links as well.
For instance a JSON file with appropriate data to build the diagram above could look like the following;

{
"nodes":[
{"node":0,"name":"node0"},
{"node":1,"name":"node1"},
{"node":2,"name":"node2"},
{"node":3,"name":"node3"},
{"node":4,"name":"node4"}
],
"links":[
{"source":0,"target":2,"value":2},
{"source":1,"target":2,"value":2},
{"source":1,"target":3,"value":2},
{"source":0,"target":4,"value":2},
{"source":2,"target":3,"value":2},
{"source":2,"target":4,"value":2},
{"source":3,"target":4,"value":4}
]}

In the file above we have 5 nodes (0-5) sequentially numbered and with names appropriate to their position in the list.

The sequential numbering is only for the purpose of highlighting the structure of the data, since when we get D3 running, it will automatically index each of the nodes according to its position. In other words, we could have omitted the “node”:n parts since D3 will know where each node is anyway. The big deal is that WE need to know what each node is as well especially if we’re going to be building the data by hand (doing it dynamically would be cool, but let’s not get ahead of ourselves just yet).

The links part of the data can be broken down into individual source to target ‘links’ that have an associated value (could be a quantity or strength, but at least a numeric value).

The ‘source’ and target numbers are references to the list of nodes. So, “source”:1, “target”:2 means that this link is whatever node appears at position 1 going to whatever node appears at position 2. The important point to make here is that D3 will not be interested in the numerical value of the node, just it’s position in the list (starting at zero).

The next post will examine the d3.js code for a Sankey diagram.

The above description (and heaps of other stuff) is in the D3 Tips and Tricks document that can be accessed from the downloads page of d3noob.org (Hey! It's free. Why not?)