D3 Tips and Tricks v4

Thursday, 20 February 2014

Grouping and summing data using d3.nest

The following post is a portion of the D3 Tips and Tricks book which is free to download. To use this post in context, consider it with the others in the blog or just download the the book as a pdf / epub or mobi .
----------------------------------------------------------

Grouping and summing data (d3.nest).

Often we will wish to group elements in an array into a hierarchical structure similar to the GROUP BY operator in SQL (but with the scope for multiple levels). This can be achieved using the d3.nest operator. Additionally we will sometimes wish to collapse the elements that we are grouping in a specific way (for instance to sum values). This can be achieved using the rollup function.
The example we will use is having the following csv file consisting of a column of dates and corresponding values;
date,value
2011-03-23,3
2011-03-23,2
2011-03-24,3
2011-03-24,3
2011-03-24,6
2011-03-24,2
2011-03-24,7
2011-03-25,4
2011-03-25,5
2011-03-25,1
2011-03-25,4
We will nest the data according to the date and sum the data for each date so that our data is in the equivalent form of;
key,values
2011-03-23,5
2011-03-24,21
2011-03-25,14
We will do this with the following script;
d3.csv("source-data.csv", function(error, csv_data) {
 var data = d3.nest()
  .key(function(d) { return d.date;})
  .rollup(function(d) { 
   return d3.sum(d, function(g) {return g.value; });
  }).entries(csv_data);
...
});
We are assuming the data is in the form of our initial csv file and is named source-data.csv.
The first thing we do is load that file and assign the loaded arrar the variable name csv_data.
d3.csv("source-data.csv", function(error, csv_data) {
Then we declare our new array’s name will be data and we initiate the nest function;
 var data = d3.nest()
We assign the key for our new array as date. A ‘key’ is like a way of saying “This is the thing we will be grouping on”. In other words our resultant array will have a single entry for each unique date value.
  .key(function(d) { return d.date;})
Then we include the rollup function that takes all the individual value variables that are in each unique datefield and sums them;
  .rollup(function(d) { 
   return d3.sum(d, function(g) {return g.value; });
Lastly we tell the entire nest function which data array we will be using for our source of data.
  }).entries(csv_data);
You should note that our data will have changed name from date and value. This is as a function of the nestand rollup process. But never fear, it’s a simple task to re-name them if necessary using the following function (which could include a call to parse the date, but I have omitted it for clarity);
data.forEach(function(d) {
 d.date = d.key;
 d.value = d.values;
});

The description above (and heaps of other stuff) is in the D3 Tips and Tricks book that can be downloaded for free (or donate if you really want to :-)).

6 comments:

  1. Hi. I have a following json :
    {
    "ReturnCode":0,
    "ReturnMessage":"Success",
    "List":[
    {
    "Client":"Ad",
    "Department":"DP",
    "ProjectId":"12355",
    "ProjectName":"4940"
    },
    {
    "Client":"Ad",
    "Department":"SP",
    "ProjectId":"12355",
    "ProjectName":"4940"
    },
    {
    "Client":"Ad",
    "Department":"Co",
    "ProjectId":"12355",
    "ProjectName":"asdf"
    },
    {
    "Client":"Ad",
    "Department":"Co",
    "ProjectId":"212355",
    "ProjectName":"45ed"
    },
    {
    "Client":"Ad",
    "Department":"Co",
    "ProjectId":"212355",
    "ProjectName":"45ed "
    },
    {
    "Client":"we",
    "Department":" SP ",
    "ProjectId":"123455",
    "ProjectName":"asdf"
    },
    {
    "Client":"we",
    "Department":"Co",
    "ProjectId":"123455",
    "ProjectName":"asdf"
    },
    {
    "Client":"oc",
    "Department":"Co",
    "ProjectId":"24355",
    "ProjectName":"qwe"
    }]
    }
    Here I just need to count the number of projects to each client like below using d3.nest
    [{Key:”Ad”,value:2} , {Key:”we”,value:1},{Key:”oc”,value:1}]
    Any suggestion ?

    ReplyDelete
    Replies
    1. The best suggestion I can make is for you to solve the problem yourself. That way you will gain a better understanding of the process and be able to repeat it in the future. Having said that, it is definitely something that you will need to concentrate on for a while to get right, so I would recommend checking out the 'Mister Nester' page (http://bl.ocks.org/shancarter/raw/4748131/) which is excellent for illustrating the differences in the techniques. Good luck

      Delete
  2. Thank you for the illustrative example. In the rollup part I think you have g and d reversed. The rollup parameter should be a function over groups of the data, so that sum is called on an array g, and the individual data items are added up, d.value. So we have

    .rollup(function(g) {
    return d3.sum(g, function(d) {return d.value; });

    Programmatically they are the same, but I think it makes more logical sense this way.

    By the way, good call making the other commenter figure out his problem by himself.

    ReplyDelete
    Replies
    1. Great question. This had me thinking for quite a bit. And in fact I had a really interesting answer all lined up before I really understood what your question was stating. You are right. I believe that my code could be misconstrued. Your example is better and more logical. In fact I should take one more step and change the 'd' in `function(d) {return d.value; }` to something completely different like 'v'.

      Delete
  3. Hey Guys...
    Sorry For late response I did sorted this out long time b4 with some suggestions from.... http://stackoverflow.com/questions/32996575/counting-distinct-values-from-json-using-d3-nest/32997817#32997817

    soln:
    d3.json("json/data.json", function(data) {
    console.log(data);

    var nested_data = d3.nest()
    .key(function(d) { return d.Client; })
    .key(function(d) { return d.ProjectId; })
    .rollup(function(leaves) { return leaves.length; })
    .entries(data.List);

    for (var item in nested_data) { console.log(nested_data[item].key+'--'+ Object.keys(nested_data[item].values).length); }
    })

    ReplyDelete
  4. Oh, great, thank you for such a wonderful solution, it is very useful, thank you!
    Richard Brown data room due diligence

    ReplyDelete