D3.js Tips and Tricks: Sankey Diagrams: A Description of the d3.js Code

Wednesday 20 February 2013

Sankey Diagrams: A Description of the d3.js Code

The following post is a portion of the D3 Tips and Tricks document which is free to download. To use this post in context, consider it with the others in the blog or just download the pdf and / or the examples from the downloads page:-)

-------------------------------------------------------

Description of the code

The code for the Sankey diagram is significantly different to that for a line graph although it shares the same core language and programming methodology.

The code we’ll go through is an adaptation of the version first demonstrated by Mike Bostock so it’s got a pretty good pedigree. I will begin with a version that uses data that is formatted so that it can be used directly with no manipulation, then in subsequent sections I will describe different techniques for getting data from different formats to work.

I found that getting data in the correct format was the biggest hurdle for getting a Sankey diagram to work. I make the assumption that this may be a similar story for others as well. We will start off assuming that the data is perfectly formatted, then where only the link data is available then where there is just names to work with (no numeric node values) and lastly, one that can be used for people with changeable data from a MySQL database.

I won’t try to go over every inch of the code as I did with the previous simple graph example (I’ll skip things like the HTML header) and will focus on the style sheet (CSS) portion and the JavaScript.
The complete code for this will also be available as an appendix and in the downloads section at d3noob.org.

On to the code…

<style>
.node rect {
  cursor: move;
  fill-opacity: .9;
  shape-rendering: crispEdges;
}
.node text {
  pointer-events: none;
  text-shadow: 0 1px 0 #fff;
}
.link {
  fill: none;
  stroke: #000;
  stroke-opacity: .2;
}
.link:hover {
  stroke-opacity: .5;
}
</style>

<body>
<p id="chart">
<script type="text/javascript" src="d3/d3.v3.js"></script>
<script src="js/sankey.js"></script>
<script>

var units = "Widgets";

var margin = {top: 10, right: 10, bottom: 10, left: 10},
    width = 700 - margin.left – margin.right,
    height = 300 - margin.top – margin.bottom;

var formatNumber = d3.format(",.0f"),    // zero decimal places
    format = function(d) { return formatNumber(d) + " " + units; },
    color = d3.scale.category20();

// append the svg canvas to the page
var svg = d3.select("#chart").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
  .append("g")
    .attr("transform", 
          "translate(" + margin.left + "," + margin.top + ")");

// Set the sankey diagram properties
var sankey = d3.sankey()
    .nodeWidth(36)
    .nodePadding(40)
    .size([width, height]);

var path = sankey.link();

// load the data
d3.json("data/sankey-formatted.json", function(error, graph) {

  sankey
      .nodes(graph.nodes)
      .links(graph.links)
      .layout(32);

// add in the links
  var link = svg.append("g").selectAll(".link")
      .data(graph.links)
    .enter().append("path")
      .attr("class", "link")
      .attr("d", path)
      .style("stroke-width", function(d) { return Math.max(1, d.dy); })
      .sort(function(a, b) { return b.dy - a.dy; });

// add the link titles
  link.append("title")
        .text(function(d) {
            return d.source.name + " → " + 
                d.target.name + "\n" + format(d.value); });

// add in the nodes
  var node = svg.append("g").selectAll(".node")
      .data(graph.nodes)
    .enter().append("g")
      .attr("class", "node")
      .attr("transform", function(d) { 
          return "translate(" + d.x + "," + d.y + ")"; })
    .call(d3.behavior.drag()
      .origin(function(d) { return d; })
      .on("dragstart", function() { 
          this.parentNode.appendChild(this); })
      .on("drag", dragmove));

// add the rectangles for the nodes
  node.append("rect")
      .attr("height", function(d) { return d.dy; })
      .attr("width", sankey.nodeWidth())
      .style("fill", function(d) { 
          return d.color = color(d.name.replace(/ .*/, "")); })
      .style("stroke", function(d) { 
          return d3.rgb(d.color).darker(2); })
    .append("title")
      .text(function(d) { 
          return d.name + "\n" + format(d.value); });

// add in the title for the nodes
  node.append("text")
      .attr("x", -6)
      .attr("y", function(d) { return d.dy / 2; })
      .attr("dy", ".35em")
      .attr("text-anchor", "end")
      .attr("transform", null)
      .text(function(d) { return d.name; })
    .filter(function(d) { return d.x < width / 2; })
      .attr("x", 6 + sankey.nodeWidth())
      .attr("text-anchor", "start");

// the function for moving the nodes
  function dragmove(d) {
    d3.select(this).attr("transform", 
        "translate(" + (
            d.x = Math.max(0, Math.min(width - d.dx, d3.event.x))
        )
        + "," + (
            d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))
        ) + ")");
    sankey.relayout();
    link.attr("d", path);
  }
});

So, going straight to the style sheet bounded by the <style> tags;

.node rect {
  cursor: move;
  fill-opacity: .9;
  shape-rendering: crispEdges;
}

.node text {
  pointer-events: none;
  text-shadow: 0 1px 0 #fff;
}

.link {
  fill: none;
  stroke: #000;
  stroke-opacity: .2;
}

.link:hover {
  stroke-opacity: .5;
}

The CSS in this example is mainly concerned with formatting of the mouse cursor as it moves around the diagram.

The first part…

.node rect {
  cursor: move;
  fill-opacity: .9;
  shape-rendering: crispEdges;
}

… provides the properties for the node rectangles. It changes the icon for the cursor when it moves over the rectangle to one that looks like it will move the rectangle (there is a range of different icons that can be defined here http://www.echoecho.com/csscursors.htm), sets the fill colour to mostly opaque and keeps the edges sharp.

The next block…

.node text {
  pointer-events: none;
  text-shadow: 0 1px 0 #fff;
}

… sets the properties for the text at each node. The mouse is told to essentially ignore the text in favour of anything that’s under it (in the case of moving or highlighting something else) and a slight shadow is applied for readability).

The following block…

.link {
  fill: none;
  stroke: #000;
  stroke-opacity: .2;
}

… makes sure that the link has no fill (it actually appears to be a bendy rectangle with very thick edges that make the element appear to be a solid block), colours the edges black (#000) and gives makes the edges almost transparent.

The last block….

.link:hover {
  stroke-opacity: .5;
}

… simply changes the opacity of the link when the mouse goes over it so that it’s more visible. If so desired, we could change the colour of the highlighted link by adding in a line to this block changing the colour like this stroke: red;.

Just before we get into the JavaScript, we do something a little different for d3.js. We tells it to use a plug-in with the following line;

<script src="js/sankey.js"></script>

The concept of a plug-in is that it is a separate piece of code that will allow additional functionality to a core block (which in this case is d3.js). There are a range of plug-ins available and we will need to source the sankey.js file from the repository and place that somewhere where our HTML code can access it. In this case I have put it in the js directory that resides in the root directory of the web page. 


The start of our JavaScript begins by defining a range of variables that we’ll be using. 

Our units are set as ‘Widgets’ (var units = "Widgets";), which is just a convenient generic (nonsense) term to provide the impression that the flow of items in this case is widgets being passed from one person to another.

We then set our canvas size and margins…

var margin = {top: 10, right: 10, bottom: 10, left: 10},
    width = 700 - margin.left – margin.right,
    height = 300 - margin.top – margin.bottom;

… before setting some formatting.

var formatNumber = d3.format(",.0f"),    // decimal places
    format = function(d) { return formatNumber(d) + " " + units; },
    color = d3.scale.category20();

The formatNumber function acts on a number to set it to zero decimal places in this case. In the original Mike Bostock example it was to three places, but for ‘widgets’ I’m presuming we don’t divide :-).

format is a function that returns a given number formatted with formatNumber as well as a space and our units of choice (‘Widgets’). This is used to display the values for the links and nodes later in the script.
The color = d3.scale.category20(); line is really interesting and provides access to a colour scale that is pre-defined for your convenience!. Later in the code we will see it in action.

Our next block sites our canvas onto our page in relation to the size and margins we have already defined;

var svg = d3.select("#chart").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
  .append("g")
    .attr("transform", 
          "translate(" + margin.left + "," + margin.top + ")");

Then we set the variables for our Sankey diagram;

var sankey = d3.sankey()
    .nodeWidth(36)
    .nodePadding(40)
    .size([width, height]);

Without trying to state the obvious, this sets the width of the nodes (.nodeWidth(36)), the padding between the nodes (.nodePadding(40)) and the size of the diagram(.size([width, height]);).

The following line defines the path variable as a pointer to the sankey function that make the links between the nodes to their clever thing of bending into the right places.;

var path = sankey.link();

I make the presumption that this is a defined function within sankey.js. Then we load the data for our sankey diagram with the following line;

d3.json("data/sankey-formatted.json", function(error, graph) {

As we have seen in previous usage of the d3.json, d3.csv and d3.tsv functions this is a wrapper that acts on all the code within it bringing the data in the form of graph to the remaining code.

I think it’s a good time to take a slightly closer look at the data that we’ll be using;

{
"nodes":[
{"node":0,"name":"node0"},
{"node":1,"name":"node1"},
{"node":2,"name":"node2"},
{"node":3,"name":"node3"},
{"node":4,"name":"node4"}
],
"links":[
{"source":0,"target":2,"value":2},
{"source":1,"target":2,"value":2},
{"source":1,"target":3,"value":2},
{"source":0,"target":4,"value":2},
{"source":2,"target":3,"value":2},
{"source":2,"target":4,"value":2},
{"source":3,"target":4,"value":4}
]}

I want to look at the data now, because it highlights how it is accessed throughout this portion of the code. It is split into two different blocks, ‘nodes’ and ‘links’. The subset of variables available under ‘nodes’ is ‘node’ and ‘name’. Likewise under ‘links’ we have ‘source’, ‘target’ and ‘value’. This means that when we want to act on a subset of our data we define which piece by defining the hierarchy that leads to it. For instance, if we want to define an action onto all the links, we would use graph.links (they’re kind of chained together).

Let me take this opportunity to apologise to all those programmers who actually know exactly what is going on here. It’s a mystery to me, but this is how I like to tell myself it works to help me get by :-)

So, now that we have our data loaded, we can assign the data to the sankey function so that it knows how to deal with it behind the scenes;

  sankey
      .nodes(graph.nodes)
      .links(graph.links)
      .layout(32);

In keeping with our previous description of what’s going on with the data, we have told the sankey function that the nodes it will be dealing with are in graph.nodes of our data structure.

I’m not sure what the .layout(32); portion of the code does, but I’d be interested know from any more knowledgeable readers. I’ve tried changing the values to no apparent affect and googling has drawn a blank. Internally to the sankey.js file it seems to indicate ‘iterations’ while it establishes computeNodeLinks, computeNodeValues, computeNodeBreadths, computeNodeDepths(iterations) and computeLinkDepths.

Then we add our links to the diagram with the following block of code;

  var link = svg.append("g").selectAll(".link")
      .data(graph.links)
    .enter().append("path")
      .attr("class", "link")
      .attr("d", path)
      .style("stroke-width", function(d) { return Math.max(1, d.dy); })
      .sort(function(a, b) { return b.dy - a.dy; });

This is an analogue of the block of code we examined way back in the section that we covered in explaining the code of our first simple graph.

We append svg elements for our links based on the data in graph.links, then add in the paths (using the appropriate CSS). We set the stroke width to the width of the value associated with each link or ‘1’. Whichever is the larger (by virtue of the Math.max function). As an interesting sideline, if we force this value to ‘10’ thusly…

      .style("stroke-width", 10)

… the graph looks quite interesting.

I have to admit that I don’t know what the sort line (.sort(function(a, b) { return b.dy - a.dy; });) is supposed to achieve. Again, I’d be interested know from any more knowledgeable readers. I’ve tried changing the values to no apparent affect.

The next block adds the titles to the links;

  link.append("title")
        .text(function(d) {
                return d.source.name + " → " + 
                        d.target.name + "\n" + format(d.value); });

This code appends a text element to each link when moused over that contains the source and target name (with a neat little arrow in between and the value (which when applied with the format function adds the units.

The next block appends the node objects (but not the rectangles or text) and contains the instructions to allow them to be arranged with the mouse.

  var node = svg.append("g").selectAll(".node")
      .data(graph.nodes)
    .enter().append("g")
      .attr("class", "node")
      .attr("transform", function(d) { 
          return "translate(" + d.x + "," + d.y + ")"; })
    .call(d3.behavior.drag()
      .origin(function(d) { return d; })
      .on("dragstart", function() { 
          this.parentNode.appendChild(this); })
      .on("drag", dragmove));

While it starts off in familiar territory with appending the node objects using the graph.nodes data and putting them in the appropriate place with the transform attribute, I can only assume that there is some trickery going on behind the scenes to make sure the mouse can do what it needs to do with the d3.behaviour,drag function. There is some excellent documentation on the wiki (https://github.com/mbostock/d3/wiki/Drag-behavior), but I can only presume that it knows what it’s doing :-). The dragmove function is laid out at the end of the code, and we will explain how that operates later.

I really enjoyed the next block;

  node.append("rect")
      .attr("height", function(d) { return d.dy; })
      .attr("width", sankey.nodeWidth())
      .style("fill", function(d) { 
          return d.color = color(d.name.replace(/ .*/, "")); })
      .style("stroke", function(d) { 
          return d3.rgb(d.color).darker(2); })
    .append("title")
      .text(function(d) { 
          return d.name + "\n" + format(d.value); });

It starts off with a fairly standard appending of a rectangle with a height generated by its value { return d.dy; } and a width dictated by the sankey.js file to fit the canvas (.attr(“width”, sankey.nodeWidth())`).
Then it gets interesting.

The colours are assigned in accordance with our earlier colour declaration and the individual colours are added to the nodes by finding the first part of the name for each node and assigning it a colour from the palate (the script looks for the first space in the name using a regular expression). For instance: ‘Widget X’, ‘Widget Y’ and ‘Widget’ will all be coloured the same even if the ‘Widget X’ and ‘Widget Y’ are inputs on the left and ‘Widget’ is a node in the middle.

The stroke around the outside of the rectangle is then done the the same shade, but darker. Then we return to the basics where we add the title of the node in a tool tip type effect along with the value for the node.

Then we add the titles for the nodes;

   node.append("text")
      .attr("x", -6)
      .attr("y", function(d) { return d.dy / 2; })
      .attr("dy", ".35em")
      .attr("text-anchor", "end")
      .attr("transform", null)
      .text(function(d) { return d.name; })
    .filter(function(d) { return d.x < width / 2; })
      .attr("x", 6 + sankey.nodeWidth())
      .attr("text-anchor", "start");

Again, this looks pretty familiar. We position the text titles carefully to the left of the nodes carefully. All except for those affected by the filter function (return d.x < width / 2;). Where if the position of the node on the x axis is less than half the width, the title is placed on the right of the node and anchored at the start of the text. Very neat.

The last block is also pretty neat, and contains a little surprise for those who are so inclined.

  function dragmove(d) {
    d3.select(this).attr("transform", 
       "translate(" + d.x + "," + (
                d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))
            ) + ")");
    sankey.relayout();
    link.attr("d", path);

This declares the function that controls the movement of the nodes with the mouse. It selects the item that it’s operating over (d3.select(this)) and then allows translation in the y axis while maintaining the link connection (sankey.relayout(); link.attr("d", path);).

But that’s not the cool part. A quick look at the code should reveal that if you can move a node in the y axis, there should be no reason why you can’t move it in the x axis as well!

Sure enough, if you replace the code above with this…

  function dragmove(d) {
    d3.select(this).attr("transform", 
        "translate(" + (
            d.x = Math.max(0, Math.min(width - d.dx, d3.event.x))
        )
        + "," + (
            d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))
        ) + ")");
    sankey.relayout();
    link.attr("d", path);

… you can move your nodes anywhere on the canvas.

I know it doesn’t seem to add anything to the diagram (in fact, it could be argued that there is a certain aspect of detraction) however, it doesn’t mean that one day the idea doesn’t come in handy :-). You can find a live version of this on Github via bl.ocks.org.

So, that’s the description for our basic Sankey diagram. From here we will look at different ways to get data formatted for use in them.

The above description (and heaps of other stuff) is in the D3 Tips and Tricks document that can be accessed from the downloads page of d3noob.org (Hey! It's free. Why not?)

53 comments:

scott_southworth13 June 2013 at 11:22
Hi, stumbled upon your work here and think it's great to be putting this kind of stuff out there! New to D3, but as a long-time programmer, I thought I'd provide answers for the two questions you posed:

1) the layout function lets you set how many 'passes' are performed by an algorithm trying to optimally place the nodes so they don't overlap. The higher the number, the better the placement -- but the longer it takes to run.

2) the sort function looks like it is choosing the order in which the pieces are drawn on the screen -- note that the mouse only highlights one element at a time when things are overlapping.

Thanks for the blog/book! I'll try to get my company to pay for a copy for us :)
-scott southworth
ReplyDelete
Replies
D3noob13 June 2013 at 20:22
This comment has been removed by the author.
ReplyDelete
Replies
Unknown26 June 2013 at 23:51
Hi,
It is nice work.Thank you for the article.
I want to use sankey for generating visitors flow diagram as in google analytics.

can you please help me how to go about that -- using 1 to many and repetitions

Thanks
Madhu
(madhusudan.k70@gmail.com)
ReplyDelete
Replies
Eric24 July 2013 at 06:51
Hey! Great article!

I've been trying to create a sankey diagram of some travel patterns, and every time I run it I get an error telling me that 'nodes' is undefined (despite the fact that my json is defining them). Thoughts?
ReplyDelete
Replies
rolfsf23 August 2013 at 10:14
Thanks for laying all these examples out - I'm new to D3 and not a programmer, so it definitely helps to have your explorations!

A question on the sankey diagram... is it fairly easy to disable the mouse drag?
ReplyDelete
Replies
Unknown8 October 2013 at 00:59
Hi,
I'm new to D3, D3 i think is brilliant in putting across the data in a very effective visualization. And Noob thank you for breaking down the each script into chunks. This actually helped me a lot in understand what each script is doing

Here i'm directly using a CSV data to generate the customer flow, what i'm trying to do is, add a hyperLink to the Nodes. For some reason, this is not working with the csv data. How ever i'm able to include a Hyperlink to the 'Links' between the nodes and it works fine

Can some one help me out with this, how to add a hyperlink to the nodes of the sankey diagram

Dataset schema
[ source, target , value , linkurl , Nodeurl* ]

ReplyDelete
Replies
Johannes Henseler25 November 2013 at 18:44
this is awesome! I will look at this closer … I tried to generate a JSON file from scratch, but working with IDs makes my head explode. With this csv file, I can use excel that can make sure the data is not duplicate.

I will try to change the sankey behaviour to do »auto« links. I need to define a link to collect all the input and deliver it to the target, similar to what timelyportfolio did here in r: http://timelyportfolio.github.io/rCharts_d3_sankey/example_build_network_sankey.html

I don't know how I can do that, but I'll try to go through your code and hopefully understand the sankey logic better to get this done. will post when I know more!
ReplyDelete
Replies
Johannes Henseler26 November 2013 at 15:48
yeah, it took me *ages* to figure out where to manipulate the code and a lot of console.logs later, I figured out I only need to change sankey's computeNodeValues() and got this:
http://bl.ocks.org/frischmilch/7667996

Thank you for this great resource as I am a d3 noob too :)
ReplyDelete
Replies
Unknown21 January 2014 at 23:36
Thank you for posting this! I've been looking for Sankey tutorial for a long time :)

One question. After I created a separated JSON file with my data, and put it in the same folder as my sankey.html, I always got error message saying ''XMLHttpRequest cannot load file:///blah/blah/sankey.json. Cross origin requests are only supported for HTTP. "

I don't know what's wrong with my code, so I googled it and some people said I need to build server and put my JSON file there because d3.json() only takes http format. I am not sure if it is the right way to do.

http://stackoverflow.com/questions/10752055/cross-origin-requests-are-only-supported-for-http-error-but-im-loading-a-co

Thanks for helping!
ReplyDelete
Replies
baynesmedia21 February 2014 at 13:19
Great stuff. Thanks very much for taking the time and trouble to share this with us. Much appreciated.
ReplyDelete
Replies
Unknown4 March 2014 at 03:10
Hi is it possible to add an HTML link on each edge?
In our scenario each edge is connected to an XML document (legal document) so we would like to click on the edge and so to navigate to the appropriate document.
Monica
ReplyDelete
Replies
Denes Csala27 May 2014 at 10:06
This comment has been removed by the author.
ReplyDelete
Replies
Unknown20 July 2014 at 06:27
Nice work.. Could you please help me how to use rest call in creating sankey diagram
ReplyDelete
Replies
Denes Csala30 October 2014 at 05:23
I have created a rather complicated sankey, working on the publishing and documentation now. It has what you ask for: http://food.csaladen.es
ReplyDelete
Replies
Denes Csala30 October 2014 at 10:01
Whew, thanks :) The work is very close to being completed, I will patch it and release it and write the supporting material for it towards by the end of this year hopefully! The examples on d3noob were crucial in developing,, so thanks!!
ReplyDelete
Replies
Unknown27 November 2014 at 09:35
Hi
I really like this library, but wondered if there is a way to change the look of the links to a more angular straight line.

The way I would like is for the link to come from the node 1 10px horizontally and then link to node 2 taking its path straight line from node 1 to node 2 and 10px into node 2 in the same way. Hope this make sense, and would appreciate any feedback.

Many thanks
Matt
ReplyDelete
Replies
Tara Talks21 April 2015 at 04:35
Hey,
Great post. Very useful. Thank you.
Can anyone tell me is it possible to have intermediary nodes? Using your example where node 1 would connect to 3 and 4 all in the one stream.
Any ideas?

Thanks,
Tara
ReplyDelete
Replies
Denes Csala24 June 2015 at 10:52
Hi Matt, depending on what version of the sankey.js you're using, around line 190 there is a function called sankey.link(). In the function there is a variable called curvature, which is 0.5 by default. Set that to whatever value you'd like!
ReplyDelete
Replies
yyyt10 February 2016 at 00:08
Hello Matt, how can I make the label on the node larger (the text)? Thanks!
ReplyDelete
Replies
Unknown8 June 2016 at 12:47
Dear D3noob,

I have finished my third year project and I would like to contribute it to the Apache Software Foundation. In order to the my project approved, I am required to get all the licences of all of the software that I have used.

During my research stage, I have found that your implementation of the Sankey Diagram is the tool that fits all of the functional requirements that I was looking for. I have seen on Mike Bostock`s page (http://bl.ocks.org/d3noob/5028304#license) that I need to contact you in order to get a licence.

Therefore, my question is the following:
Can I use your implementation of Sankey Diagram ( http://bl.ocks.org/d3noob/5028304 ) within the project that I want to share with Apache Community?

Many thanks,
Stefan
ReplyDelete
Replies
Anonymous9 June 2016 at 03:37
HI,,
Can we have Sankey without links and child nodes, I just need parent nodes...Please help me in this.
ReplyDelete
Replies
Unknown14 October 2016 at 06:12
hello.. As my master project I am trying to implement a data visualization tool. My professor asked me to implement special kind of sankey diagram where there could be both horizontal and vertical nodes (I don't mean node movement). I implemented horizontal and vertical separately (left-to-right data flow following Mike's blog and top-to-bottom approach following this link-http://benlogan1981.github.io/VerticalSankey/UBS.html) but no idea how to make a single diagram where there will both type of node together. Please help me!
ReplyDelete
Replies
Ani12 February 2017 at 10:59
Hi,
Thank you very much for such a wonderful detailed process flow.

I just looking for some help about the colour of the paths. Can I customise the colours of the paths I.e. different colours for different path and also a border for the paths.

Thanks in advance.
ReplyDelete
Replies
Unknown2 July 2017 at 23:41
can this be used to make a loop like flow from A to B and then a portion of it back to A ??
ReplyDelete
Replies
reddie4 August 2017 at 04:48
Hi d3noob,
I am not a d3 programmer but i understood pretty much of it. I am working on a sequencing project and tried creating sankeys succesfully, but my concern is whether there is any way to show the sequences based on the time period between the events? As of now all blocks are aligned at the same x-axis position. I want the blocks at a particular node to vary based on a metric like distance between the events? If yes, can you please let me know in which snippet of the code i need to change and what?
ReplyDelete
Replies