Raspberry Pi Pico Tips and Tricks

Friday 22 February 2013

Formatting data for Sankey diagrams in d3.js


The following post is a portion of the D3 Tips and Tricks document which is free to download. To use this post in context, consider it with the others in the blog or just download the pdf  and / or the examples from the downloads page:-)
-------------------------------------------------------

The following is a follow on from the previous posts on generating Sankey diagrams in d3.js and goes over some mechanisms for ingesting data.

From a JSON file with numeric link values

As explained in the previous section, data to form a Sankey diagram needs to be a combination of nodes and links.
{
"nodes":[
{"node":0,"name":"node0"},
{"node":1,"name":"node1"},
{"node":2,"name":"node2"},
{"node":3,"name":"node3"},
{"node":4,"name":"node4"}
],
"links":[
{"source":0,"target":2,"value":2},
{"source":1,"target":2,"value":2},
{"source":1,"target":3,"value":2},
{"source":0,"target":4,"value":2},
{"source":2,"target":3,"value":2},
{"source":2,"target":4,"value":2},
{"source":3,"target":4,"value":4}
]}
As we also noted earlier, the `“node”` entries in the `”nodes”` section of the json file are superfluous and are really only there for our benefit since D3 will automatically index the nodes starting at zero. As a test to check this out we can change our data to the following;
{
"nodes":[
{"name":"Barry"},
{"name":"Frodo"},
{"name":"Elvis"},
{"name":"Sarah"},
{"name":"Alice"}
],
"links":[
{"source":0,"target":2,"value":2},
{"source":1,"target":2,"value":2},
{"source":1,"target":3,"value":2},
{"source":0,"target":4,"value":2},
{"source":2,"target":3,"value":2},
{"source":2,"target":4,"value":2},
{"source":3,"target":4,"value":4}
]}
(for reference this file is saved as sankey-formatted-names-and-numbers.json and the html file is Sankey-formatted-names-and-numbers.html)

This will produce the following graph;
As you can see, essentially the same, but with easier to understand names.

As you can imagine, while the end result is great, the creation of the JSON file manually would be painful at best and doing something similar but with a greater number of nodes / links would be a nightmare.

So let's see if we can make the process a bit easier and more flexible.

From a JSON file with links as names

It would make thing much easier if you are building the data from hand to have nodes with names, and the 'source' and 'target' links have those same name values as identifiers.

In other words a list of unique names for the nodes (and perhaps some details) and a list of the links between those nodes using the names for the nodes.

So, something like this;
{
"nodes":[
{"name":"Barry"},
{"name":"Frodo"},
{"name":"Elvis"},
{"name":"Sarah"},
{"name":"Alice"}
],
"links":[
{"source":"Barry","target":"Elvis","value":2},
{"source":"Frodo","target":"Elvis","value":2},
{"source":"Frodo","target":"Sarah","value":2},
{"source":"Barry","target":"Alice","value":2},
{"source":"Elvis","target":"Sarah","value":2},
{"source":"Elvis","target":"Alice","value":2},
{"source":"Sarah","target":"Alice","value":4}
]}
Once again, D3 to the rescue!

The little piece of code that can do this for us is here;
    var nodeMap = {};
    graph.nodes.forEach(function(x) { nodeMap[x.name] = x; });
    graph.links = graph.links.map(function(x) {
      return {
        source: nodeMap[x.source],
        target: nodeMap[x.target],
        value: x.value
      };
    });
This elegant solution comes from here; http://stackoverflow.com/questions/14629853/json-representation-for-d3-networks and was provided by Chris Pettitt (nice job).

So if we sneak this piece of code into here...  
d3.json("data/sankey-formatted.json", function(error, graph) {

            //  <= Put the code here.

  sankey
      .nodes(graph.nodes)
      .links(graph.links)
      .layout(32);
… and this time we use our JSON file with just names (sankey-formatted-names.json) and our new html file (sankey-formatted-names.html) we find our Sankey diagram working perfectly!
Looking at our new piece of code...
    var nodeMap = {};
    graph.nodes.forEach(function(x) { nodeMap[x.name] = x; });
… the first thing it does is create an object called `nodeMap` (The difference between an array and an object in JavaScript is one that is still a little blurry to me and judging from online comments, I am not alone).

Then for each of the `graph.node` instances (where `x` is a range of numbers from 0 to the last node), we assign each node name to a number.

Then in the next piece of code...
    graph.links = graph.links.map(function(x) {
      return {
        source: nodeMap[x.source],
        target: nodeMap[x.target],
        value: x.value
      };

… we go through all the links we have and for each link, we map the appropriate number to the correct name.

Very clever.

From a CSV with 'source', 'target' and 'value' info only

 In the first iteration of this section I had no solution to creating a Sankey diagram using a csv file as the source of the data.

But cometh the hour, cometh the man. Enter @timelyportfolio who, while claiming no expertise in D3 or JavaScript was able to demonstrate a [solution](http://bl.ocks.org/timelyportfolio/5052095) to exactly the problem I was facing! Well done Sir! I salute you and name the technique the timelyportfolio csv method!

So here's the cleverness that @timelyportfolio demonstrated;

Using a csv file (in this case called `sankey.csv`) that looks like this;

source,target,value
Barry,Elvis,2
Frodo,Elvis,2
Frodo,Sarah,2
Barry,Alice,2
Elvis,Sarah,2
Elvis,Alice,2
Sarah,Alice,4


 We take this single line from our original Sankey diagram code;
d3.json("data/sankey-formatted.json", function(error, graph) {
 And replace it with the following block;
d3.csv("data/sankey.csv", function(error, data) {

  //set up graph in same style as original example but empty
  graph = {"nodes" : [], "links" : []};

    data.forEach(function (d) {
      graph.nodes.push({ "name": d.source });
      graph.nodes.push({ "name": d.target });
      graph.links.push({ "source": d.source,
                         "target": d.target,
                         "value": +d.value });
     });

     // return only the distinct / unique nodes
     graph.nodes = d3.keys(d3.nest()
       .key(function (d) { return d.name; })
       .map(graph.nodes));

     // loop through each link replacing the text with its index from node
     graph.links.forEach(function (d, i) {
       graph.links[i].source = graph.nodes.indexOf(graph.links[i].source);
       graph.links[i].target = graph.nodes.indexOf(graph.links[i].target);
     });

     //now loop through each nodes to make nodes an array of objects
     // rather than an array of strings
     graph.nodes.forEach(function (d, i) {
       graph.nodes[i] = { "name": d };
     });
 The comments in the code (and they are fuller in @timelyportfolio's [original gist solution](http://bl.ocks.org/timelyportfolio/5052095)) explain the operation;
d3.csv("data/sankey.csv", function(error, data) {
 … Loads the csv file from the data directory.
  graph = {"nodes" : [], "links" : []};
 … Declares `graph` to consist of two empty arrays called `nodes` and `links`.
      data.forEach(function (d) {
      graph.nodes.push({ "name": d.source });
      graph.nodes.push({ "name": d.target });
      graph.links.push({ "source": d.source,
                         "target": d.target,
                         "value": +d.value });
     });
 … Takes the `data` loaded with the csv file and for each row loads variables for the `source` and `target` into the `nodes` array then for each row loads variables for the `source` `target` and `value` into the `links` array.
     graph.nodes = d3.keys(d3.nest()
       .key(function (d) { return d.name; })
       .map(graph.nodes));
 … Is a routine that Mike Bostock described on [Google Groups](https://groups.google.com/forum/#!msg/d3-js/pl297cFtIQk/Eso4q_eBu1IJ) that (as I understand it) nests each node name as a key so that it returns with only unique nodes.
     graph.links.forEach(function (d, i) {
       graph.links[i].source = graph.nodes.indexOf(graph.links[i].source);
       graph.links[i].target = graph.nodes.indexOf(graph.links[i].target);
     });
 … Goes through each `link` entry and for each `source` and `target`, it finds the unique index number of that name in the nodes array and assigns the link source and target an appropriate number.

And finally...
     graph.nodes.forEach(function (d, i) {
       graph.nodes[i] = { "name": d };
     });
 … Goes through each node and (in the words of @timelyportfolio) “*make nodes an array of objects rather than an array of strings*” (I don't really know what that means :-(. I just know it works :-).)

There you have it. A Sankey diagram from a csv file. Well played @timelyportfolio!

Both the html file for the diagram (`Sankey.formatted-csv.html`) and the data file (`sankey.csv`) can be found in the downloads section of d3noob.org.

From MySQL as link information only automatically.

So, here we are. Faced with a dilemma of trying to get my csv formatted links into a Sankey diagram. In theory we then need to go through our file, identify all the unique nodes and format the entire blob into JSON for use.

There must be a better way!

Well, I'm not going to claim that this is any better since it's a little like cracking a walnut with a sledgehammer. But to a man with just a sledgehammer, everything’s a walnut.

So, let's use our newly developed MySQL and PHP skills to solve our problem. In fact, let's make it slightly harder for ourselves. Let's imagine that we don't even have a value associated with our data, just a big line of source and target links. Something like this;

source,target
Barry,Elvis
Barry,Elvis
Frodo,Elvis
Frodo,Elvis
Frodo,Sarah
Frodo,Sarah
Barry,Alice
Barry,Alice
Elvis,Sarah
Elvis,Sarah
Elvis,Alice
Elvis,Alice
Sarah,Alice
Sarah,Alice
Sarah,Alice
Sarah,Alice

First thing first, just as we did in the example on using MySQL, import your csv file into a MySQL table which we'll call `sankey1` in database `homedb`.

Now we want to write a query that pulls out all the DISTINCT names that appear it the 'source' and 'target' columns. This will form our 'nodes' portion of the JSON data.
SELECT DISTINCT(`source`) AS name FROM `sankey1`
UNION
SELECT DISTINCT(`target`) AS name FROM `sankey1`
GROUP BY name
This query actually mashes two separate queries together where each returns DISTINCT instances of each `source` and `target` from the source and target columns. By default, the UNION operator eliminates duplicate rows from the result which means we have a list of each node in the table.
Exxxeellennt....... (channelling Mr Burns)

Now we run a separate query that pulls out each distinct 'source' and 'target' combination and the number of times (COUNT(*)) that it occurs.
SELECT `source` AS source, `target`  as target, COUNT(*) as value 
FROM `sankey1`
GROUP BY source, target 
This query gets all the sources and all the targets and groups them by first the source and then the target. Each line is therefore unique and the `COUNT(*)` sums up the number of times that each unique combination occurs.
That was surprisingly easy wasn't it?

MySQL is good like that for the simple jobs, but of course we're a long way from finished since at this stage all we have is what looks like two tables in a spreadsheet.

So now we turn to PHP.

Remember from our previous exposure, we described PHP as the glue that could connect parts of web pages together. In this case we will use it to glue our MySQL database to our JavaScript.

What we need it to do is to carry out our queries and return the information in a format that d3.js can understand. In this instance we will select JSON as it's probably the most ubiquitous and it suits the format of our original manual data.

Let's cut to the chase and look at the code that we'll use.
<?php
    $username = "homedbuser"; 
    $password = "homedbuser";   
    $host = "localhost";
    $database="homedb";
    
    $server = mysql_connect($host, $username, $password);
    $connection = mysql_select_db($database, $server);

    $myquery = "
SELECT DISTINCT(`source`) AS name FROM `sankey1`
UNION
SELECT DISTINCT(`target`) AS name FROM `sankey1`
GROUP BY name
";
    $query = mysql_query($myquery);
    
    if ( ! $myquery ) {
        echo mysql_error();
        die;
    }
    
    $nodes = array();
    
    for ($x = 0; $x < mysql_num_rows($query); $x++) {
        $nodes[] = mysql_fetch_assoc($query);
    }

    $myquery = "
SELECT `source` AS source, `target`  as target, COUNT(*) as value 
FROM `sankey1`
GROUP BY source, target 
";
    $query = mysql_query($myquery);
    
    if ( ! $myquery ) {
        echo mysql_error();
       die;
    }
    
    $links = array();
    
    for ($x = 0; $x < mysql_num_rows($query); $x++) {
        $links[] = mysql_fetch_assoc($query);
    }

echo "{";
echo '"links": ', json_encode($links), "\n";
echo ',"nodes": ', json_encode($nodes), "\n";
echo "}";

    mysql_close($server);
?>
Astute readers will recognise that this is very similar to the script that we used to extract data from the MySQL database for generating a simple line graph. If you haven't checked it out, and you're unfamiliar with PHP, you will want to read that section first.

We declare all the appropriate variables that we will then use to connect to the database, then we connect to the database and run our query.

After that we store the nodes data in an array called `$nodes`.

Then we run our second query (we don't close the connection to the database since we're not finished with it yet).

The second query returns the link results into a second array called `$links` (pretty imaginative).

Now we come to a part that's a bit different. We still need to echo out the data in the same way that was required for our line graph, but in this case we need to add the data together with the associated `links` and `nodes` identifiers.  
echo "{";
echo '"links": ', json_encode($links), "\n";
echo ',"nodes": ', json_encode($nodes), "\n";
echo "}";
(if you look closely, the syntax will produce our a JSON formatted output)

So lastly, we need to call this PHP script from our html file in the same way that we did for the line graph. So amend the html file to change the loading of the JSON data to be from our PHP file thusly;
d3.json("php/sankey.php", function(error, graph) {
And there you have it! So many ways to get the data.

Both the PHP file (sankey.php) and the html file (sankey-mysql-import.html) are available in the downloads section on d3noob.org.

Sankey diagram case study

So armed with all this new found knowledge on building Sankey diagrams, what can you do?

Well, I suppose it all depends on your data set, but remember, Sankey diagrams are good at flows, but they won't do loops / cycles easily (although there has been some good work done in this direction here  and here).

So let's choose a flow.

In this case we'll selected the flow of data that represents a view of global, anthropogenic greenhouse gas (GHG) emissions. The diagram is a re-drawing of the excellent diagram on the World Resources Institute and as such my version pales in comparison to theirs.

However, the aim is to play with the technique, not to emulate :-).

So starting with the data presented in the original diagram, we have to capture the links into a csv file. I did this the hard way (since there didn't appear to be an electronic version of the data) by reading the graph and entering the figures into a csv file. From here we import it into our MySQL database and then convert it into sankey formatted JSON by using our PHP script that we played with in the example of extracting information from a MySQL database. In this case instead of needing to perform a `COUNT(*)` on the data, it's slightly easier since the value is already present.

Then, because we want this diagram to be hosted on Gist and accessible on bl.ocks.org, we run the PHP file directly into the browser so that it just shows the JSON data on the screen. We then save this file with the suffix `.json` and we have our data (in this case the file is named `sankeygreenhouse.json`).
Then we amend our html file to look at our new `.json` file and voila!



Sankeytastic!

You can find this as a live example and with all the code and data on bl.ocks.org.


The above description (and heaps of other stuff) is in the D3 Tips and Tricks document that can be accessed from the downloads page of d3noob.org (Hey! It's free. Why not?)

Wednesday 20 February 2013

Sankey Diagrams: A Description of the d3.js Code


The following post is a portion of the D3 Tips and Tricks document which is free to download. To use this post in context, consider it with the others in the blog or just download the pdf  and / or the examples from the downloads page:-)
-------------------------------------------------------

Description of the code

The code for the Sankey diagram is significantly different to that for a line graph although it shares the same core language and programming methodology.

The code we’ll go through is an adaptation of the version first demonstrated by Mike Bostock so it’s got a pretty good pedigree. I will begin with a version that uses data that is formatted so that it can be used directly with no manipulation, then in subsequent sections I will describe different techniques for getting data from different formats to work.

I found that getting data in the correct format was the biggest hurdle for getting a Sankey diagram to work. I make the assumption that this may be a similar story for others as well. We will start off assuming that the data is perfectly formatted, then where only the link data is available then where there is just names to work with (no numeric node values) and lastly, one that can be used for people with changeable data from a MySQL database.

I won’t try to go over every inch of the code as I did with the previous simple graph example (I’ll skip things like the HTML header) and will focus on the style sheet (CSS) portion and the JavaScript.
The complete code for this will also be available as an appendix and in the downloads section at d3noob.org.

On to the code…
<style>
.node rect {
  cursor: move;
  fill-opacity: .9;
  shape-rendering: crispEdges;
}
.node text {
  pointer-events: none;
  text-shadow: 0 1px 0 #fff;
}
.link {
  fill: none;
  stroke: #000;
  stroke-opacity: .2;
}
.link:hover {
  stroke-opacity: .5;
}
</style>

<body>
<p id="chart">
<script type="text/javascript" src="d3/d3.v3.js"></script>
<script src="js/sankey.js"></script>
<script>

var units = "Widgets";

var margin = {top: 10, right: 10, bottom: 10, left: 10},
    width = 700 - margin.left – margin.right,
    height = 300 - margin.top – margin.bottom;

var formatNumber = d3.format(",.0f"),    // zero decimal places
    format = function(d) { return formatNumber(d) + " " + units; },
    color = d3.scale.category20();

// append the svg canvas to the page
var svg = d3.select("#chart").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
  .append("g")
    .attr("transform", 
          "translate(" + margin.left + "," + margin.top + ")");

// Set the sankey diagram properties
var sankey = d3.sankey()
    .nodeWidth(36)
    .nodePadding(40)
    .size([width, height]);

var path = sankey.link();

// load the data
d3.json("data/sankey-formatted.json", function(error, graph) {

  sankey
      .nodes(graph.nodes)
      .links(graph.links)
      .layout(32);

// add in the links
  var link = svg.append("g").selectAll(".link")
      .data(graph.links)
    .enter().append("path")
      .attr("class", "link")
      .attr("d", path)
      .style("stroke-width", function(d) { return Math.max(1, d.dy); })
      .sort(function(a, b) { return b.dy - a.dy; });

// add the link titles
  link.append("title")
        .text(function(d) {
            return d.source.name + "" + 
                d.target.name + "\n" + format(d.value); });

// add in the nodes
  var node = svg.append("g").selectAll(".node")
      .data(graph.nodes)
    .enter().append("g")
      .attr("class", "node")
      .attr("transform", function(d) { 
          return "translate(" + d.x + "," + d.y + ")"; })
    .call(d3.behavior.drag()
      .origin(function(d) { return d; })
      .on("dragstart", function() { 
          this.parentNode.appendChild(this); })
      .on("drag", dragmove));

// add the rectangles for the nodes
  node.append("rect")
      .attr("height", function(d) { return d.dy; })
      .attr("width", sankey.nodeWidth())
      .style("fill", function(d) { 
          return d.color = color(d.name.replace(/ .*/, "")); })
      .style("stroke", function(d) { 
          return d3.rgb(d.color).darker(2); })
    .append("title")
      .text(function(d) { 
          return d.name + "\n" + format(d.value); });

// add in the title for the nodes
  node.append("text")
      .attr("x", -6)
      .attr("y", function(d) { return d.dy / 2; })
      .attr("dy", ".35em")
      .attr("text-anchor", "end")
      .attr("transform", null)
      .text(function(d) { return d.name; })
    .filter(function(d) { return d.x < width / 2; })
      .attr("x", 6 + sankey.nodeWidth())
      .attr("text-anchor", "start");

// the function for moving the nodes
  function dragmove(d) {
    d3.select(this).attr("transform", 
        "translate(" + (
            d.x = Math.max(0, Math.min(width - d.dx, d3.event.x))
        )
        + "," + (
            d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))
        ) + ")");
    sankey.relayout();
    link.attr("d", path);
  }
});
So, going straight to the style sheet bounded by the <style> tags;
.node rect {
  cursor: move;
  fill-opacity: .9;
  shape-rendering: crispEdges;
}

.node text {
  pointer-events: none;
  text-shadow: 0 1px 0 #fff;
}

.link {
  fill: none;
  stroke: #000;
  stroke-opacity: .2;
}

.link:hover {
  stroke-opacity: .5;
}
The CSS in this example is mainly concerned with formatting of the mouse cursor as it moves around the diagram.

The first part…
.node rect {
  cursor: move;
  fill-opacity: .9;
  shape-rendering: crispEdges;
}
… provides the properties for the node rectangles. It changes the icon for the cursor when it moves over the rectangle to one that looks like it will move the rectangle (there is a range of different icons that can be defined here http://www.echoecho.com/csscursors.htm), sets the fill colour to mostly opaque and keeps the edges sharp.

The next block…
.node text {
  pointer-events: none;
  text-shadow: 0 1px 0 #fff;
}
… sets the properties for the text at each node. The mouse is told to essentially ignore the text in favour of anything that’s under it (in the case of moving or highlighting something else) and a slight shadow is applied for readability).

The following block…
.link {
  fill: none;
  stroke: #000;
  stroke-opacity: .2;
}
… makes sure that the link has no fill (it actually appears to be a bendy rectangle with very thick edges that make the element appear to be a solid block), colours the edges black (#000) and gives makes the edges almost transparent.

The last block….
.link:hover {
  stroke-opacity: .5;
}
… simply changes the opacity of the link when the mouse goes over it so that it’s more visible. If so desired, we could change the colour of the highlighted link by adding in a line to this block changing the colour like this stroke: red;.

Just before we get into the JavaScript, we do something a little different for d3.js. We tells it to use a plug-in with the following line;

<script src="js/sankey.js"></script>

The concept of a plug-in is that it is a separate piece of code that will allow additional functionality to a core block (which in this case is d3.js). There are a range of plug-ins available and we will need to source the sankey.js file from the repository and place that somewhere where our HTML code can access it. In this case I have put it in the js directory that resides in the root directory of the web page. 


The start of our JavaScript begins by defining a range of variables that we’ll be using. 

Our units are set as ‘Widgets’ (var units = "Widgets";), which is just a convenient generic (nonsense) term to provide the impression that the flow of items in this case is widgets being passed from one person to another.
We then set our canvas size and margins…
var margin = {top: 10, right: 10, bottom: 10, left: 10},
    width = 700 - margin.left – margin.right,
    height = 300 - margin.top – margin.bottom;
… before setting some formatting.
var formatNumber = d3.format(",.0f"),    // decimal places
    format = function(d) { return formatNumber(d) + " " + units; },
    color = d3.scale.category20();
The formatNumber function acts on a number to set it to zero decimal places in this case. In the original Mike Bostock example it was to three places, but for ‘widgets’ I’m presuming we don’t divide :-).

format is a function that returns a given number formatted with formatNumber as well as a space and our units of choice (‘Widgets’). This is used to display the values for the links and nodes later in the script.
The color = d3.scale.category20(); line is really interesting and provides access to a colour scale that is pre-defined for your convenience!. Later in the code we will see it in action.

Our next block sites our canvas onto our page in relation to the size and margins we have already defined;
var svg = d3.select("#chart").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
  .append("g")
    .attr("transform", 
          "translate(" + margin.left + "," + margin.top + ")");
Then we set the variables for our Sankey diagram;
var sankey = d3.sankey()
    .nodeWidth(36)
    .nodePadding(40)
    .size([width, height]);
Without trying to state the obvious, this sets the width of the nodes (.nodeWidth(36)), the padding between the nodes (.nodePadding(40)) and the size of the diagram(.size([width, height]);).

The following line defines the path variable as a pointer to the sankey function that make the links between the nodes to their clever thing of bending into the right places.;
var path = sankey.link();
I make the presumption that this is a defined function within sankey.js. Then we load the data for our sankey diagram with the following line;
d3.json("data/sankey-formatted.json", function(error, graph) {
As we have seen in previous usage of the d3.jsond3.csv and d3.tsv functions this is a wrapper that acts on all the code within it bringing the data in the form of graph to the remaining code.

I think it’s a good time to take a slightly closer look at the data that we’ll be using;
{
"nodes":[
{"node":0,"name":"node0"},
{"node":1,"name":"node1"},
{"node":2,"name":"node2"},
{"node":3,"name":"node3"},
{"node":4,"name":"node4"}
],
"links":[
{"source":0,"target":2,"value":2},
{"source":1,"target":2,"value":2},
{"source":1,"target":3,"value":2},
{"source":0,"target":4,"value":2},
{"source":2,"target":3,"value":2},
{"source":2,"target":4,"value":2},
{"source":3,"target":4,"value":4}
]}
I want to look at the data now, because it highlights how it is accessed throughout this portion of the code. It is split into two different blocks, ‘nodes’ and ‘links’. The subset of variables available under ‘nodes’ is ‘node’ and ‘name’. Likewise under ‘links’ we have ‘source’, ‘target’ and ‘value’. This means that when we want to act on a subset of our data we define which piece by defining the hierarchy that leads to it. For instance, if we want to define an action onto all the links, we would use graph.links (they’re kind of chained together).

Let me take this opportunity to apologise to all those programmers who actually know exactly what is going on here. It’s a mystery to me, but this is how I like to tell myself it works to help me get by :-)

So, now that we have our data loaded, we can assign the data to the sankey function so that it knows how to deal with it behind the scenes;
  sankey
      .nodes(graph.nodes)
      .links(graph.links)
      .layout(32);
In keeping with our previous description of what’s going on with the data, we have told the sankey function that the nodes it will be dealing with are in graph.nodes of our data structure.

I’m not sure what the .layout(32); portion of the code does, but I’d be interested know from any more knowledgeable readers. I’ve tried changing the values to no apparent affect and googling has drawn a blank. Internally to the sankey.js file it seems to indicate ‘iterations’ while it establishes computeNodeLinks, computeNodeValues, computeNodeBreadths, computeNodeDepths(iterations) and computeLinkDepths.

Then we add our links to the diagram with the following block of code;
  var link = svg.append("g").selectAll(".link")
      .data(graph.links)
    .enter().append("path")
      .attr("class", "link")
      .attr("d", path)
      .style("stroke-width", function(d) { return Math.max(1, d.dy); })
      .sort(function(a, b) { return b.dy - a.dy; });
This is an analogue of the block of code we examined way back in the section that we covered in explaining the code of our first simple graph.

We append svg elements for our links based on the data in graph.links, then add in the paths (using the appropriate CSS). We set the stroke width to the width of the value associated with each link or ‘1’. Whichever is the larger (by virtue of the Math.max function). As an interesting sideline, if we force this value to ‘10’ thusly…
      .style("stroke-width", 10)
… the graph looks quite interesting.

I have to admit that I don’t know what the sort line (.sort(function(a, b) { return b.dy - a.dy; });) is supposed to achieve. Again, I’d be interested know from any more knowledgeable readers. I’ve tried changing the values to no apparent affect.

The next block adds the titles to the links;
  link.append("title")
        .text(function(d) {
                return d.source.name + "" + 
                        d.target.name + "\n" + format(d.value); });
This code appends a text element to each link when moused over that contains the source and target name (with a neat little arrow in between and the value (which when applied with the format function adds the units.

The next block appends the node objects (but not the rectangles or text) and contains the instructions to allow them to be arranged with the mouse.
  var node = svg.append("g").selectAll(".node")
      .data(graph.nodes)
    .enter().append("g")
      .attr("class", "node")
      .attr("transform", function(d) { 
          return "translate(" + d.x + "," + d.y + ")"; })
    .call(d3.behavior.drag()
      .origin(function(d) { return d; })
      .on("dragstart", function() { 
          this.parentNode.appendChild(this); })
      .on("drag", dragmove));
While it starts off in familiar territory with appending the node objects using the graph.nodes data and putting them in the appropriate place with the transform attribute, I can only assume that there is some trickery going on behind the scenes to make sure the mouse can do what it needs to do with the d3.behaviour,drag function. There is some excellent documentation on the wiki (https://github.com/mbostock/d3/wiki/Drag-behavior), but I can only presume that it knows what it’s doing :-). The dragmove function is laid out at the end of the code, and we will explain how that operates later.

I really enjoyed the next block;
  node.append("rect")
      .attr("height", function(d) { return d.dy; })
      .attr("width", sankey.nodeWidth())
      .style("fill", function(d) { 
          return d.color = color(d.name.replace(/ .*/, "")); })
      .style("stroke", function(d) { 
          return d3.rgb(d.color).darker(2); })
    .append("title")
      .text(function(d) { 
          return d.name + "\n" + format(d.value); });
It starts off with a fairly standard appending of a rectangle with a height generated by its value  { return d.dy; } and a width dictated by the sankey.js file to fit the canvas (.attr(“width”, sankey.nodeWidth())`).
Then it gets interesting.

The colours are assigned in accordance with our earlier colour declaration and the individual colours are added to the nodes by finding the first part of the name for each node and assigning it a colour from the palate (the script looks for the first space in the name using a regular expression). For instance: ‘Widget X’, ‘Widget Y’ and ‘Widget’ will all be coloured the same even if the ‘Widget X’ and ‘Widget Y’ are inputs on the left and ‘Widget’ is a node in the middle.

The stroke around the outside of the rectangle is then done the the same shade, but darker. Then we return to the basics where we add the title of the node in a tool tip type effect along with the value for the node.

Then we add the titles for the nodes;
   node.append("text")
      .attr("x", -6)
      .attr("y", function(d) { return d.dy / 2; })
      .attr("dy", ".35em")
      .attr("text-anchor", "end")
      .attr("transform", null)
      .text(function(d) { return d.name; })
    .filter(function(d) { return d.x < width / 2; })
      .attr("x", 6 + sankey.nodeWidth())
      .attr("text-anchor", "start");
Again, this looks pretty familiar. We position the text titles carefully to the left of the nodes carefully. All except for those affected by the filter function (return d.x < width / 2;). Where if the position of the node on the x axis is less than half the width, the title is placed on the right of the node and anchored at the start of the text. Very neat.

The last block is also pretty neat, and contains a little surprise for those who are so inclined.
  function dragmove(d) {
    d3.select(this).attr("transform", 
       "translate(" + d.x + "," + (
                d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))
            ) + ")");
    sankey.relayout();
    link.attr("d", path);
This declares the function that controls the movement of the nodes with the mouse. It selects the item that it’s operating over (d3.select(this)) and then allows translation in the y axis while maintaining the link connection (sankey.relayout(); link.attr("d", path);).

But that’s not the cool part. A quick look at the code should reveal that if you can move a node in the y axis, there should be no reason why you can’t move it in the x axis as well!

Sure enough, if you replace the code above with this…
  function dragmove(d) {
    d3.select(this).attr("transform", 
        "translate(" + (
            d.x = Math.max(0, Math.min(width - d.dx, d3.event.x))
        )
        + "," + (
            d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))
        ) + ")");
    sankey.relayout();
    link.attr("d", path);
… you can move your nodes anywhere on the canvas.

I know it doesn’t seem to add anything to the diagram (in fact, it could be argued that there is a certain aspect of detraction) however, it doesn’t mean that one day the idea doesn’t come in handy :-). You can find a live version of this on Github via bl.ocks.org.

So, that’s the description for our basic Sankey diagram. From here we will look at different ways to get data formatted for use in them.


The above description (and heaps of other stuff) is in the D3 Tips and Tricks document that can be accessed from the downloads page of d3noob.org (Hey! It's free. Why not?)