The Author Bundle

Tuesday, 28 July 2015

Clipped Paths in d3.js (AKA clipPath)

The following post is a portion of the book D3 Tips and Tricks which is free to download from Leanpub. To use this post in context, consider it with the others in the blog or just download the the book as a pdf / epub or mobi .
----------------------------------------------------------

Clipped Path (AKA clipPath)

clipPath is the path of a SVG shape that can be used in combination with another shape to remove any parts of the combined shape that doesn’t fall within the clipPath. That sounds slightly confusing, so we will break it down a bit to hopefully clarify the explanation.
Let’s imagine that we want to display the intersection of two shapes. What we will do is define our clipPath which will act as a ‘cookie cutter’ which can cut out the shape we want (we will choose an ellipse). Then we will draw our base shape (which is analogous to the dough) that we will use our cookie cutter on (our dough will be shaped as a rectangle). The intersection of the cookie cutter and the dough is our clipped path.
Our clipPath (cookie cutter) element is an ellipse;
clipPath (cookie cutter)
Our shape that we will be clipping (the dough) is a rectangle;
Rectangle element (dough)
The intersection of the two is the clipped path (shaded grey);
Combination of the ellipse and the rectangle
The graphic examples above are misleading in the sense that the two basic shapes are not actually displayed. All that results from the use of the clipPath is the region that is the intersection of the two.
The clipped path
The following is an example of the code section required to draw the clipped path in conjunction with the HTML file outlined at the start of this chapter. The clipPath element is given the ID ‘ellipse-clip’ and a specified size and location. Then when the rectangle is appended. the clipPath is specified as an attribute (via a URL) using clip-path.


// define the clipPath
holder.append("clipPath")       // define a clip path
    .attr("id", "ellipse-clip") // give the clipPath an ID
  .append("ellipse")          // shape it as an ellipse
    .attr("cx", 175)         // position the x-centre
    .attr("cy", 100)         // position the y-centre
    .attr("rx", 100)         // set the x radius
    .attr("ry", 50);         // set the y radius

// draw clipped path on the screen
holder.append("rect")        // attach a rectangle
    .attr("x", 125)        // position the left of the rectangle
    .attr("y", 75)         // position the top of the rectangle
    .attr("clip-path", "url(#ellipse-clip)") // clip the rectangle
    .style("fill", "lightgrey")   // fill the clipped path with grey
    .attr("height", 100)    // set the height
    .attr("width", 200);    // set the width

This will produce a path as follows;
The clipped path
An example of this in use can bee seen in the difference chart explanation later in the book.


The description above (and heaps of other stuff) is in the D3 Tips and Tricks book that can be downloaded for free (or donate if you really want to :-)).

Saturday, 25 July 2015

Difference Charts with d3.js. Science vs Style

The following post is a portion of the book D3 Tips and Tricks which is free to download from Leanpub. To use this post in context, consider it with the others in the blog or just download the the book as a pdf / epub or mobi .
----------------------------------------------------------

Difference Chart: Science vs Style.

Dear readers, please forgive me for including this example in D3 Tips and Tricks. While it demonstrates a really cool graphing technique, I have chosen to apply it to a topic that has a potential to raise a couple of sets of eyebrows in the form of Messrs Roger Peng and Jeff Leek. Both work at the Johns Hopkins Bloomberg School of Public Health where Roger is an Associate Professor of Biostatistics, and Jeff is an Associate Professor of Biostatistics and Oncology.
While both are doing amazing work to improve peoples health and well-being (amongst other things), both are also authors of highly successful books published by Leanpub. In particular Roger has written R Programming for Data Science and Exploratory Data Analysis with R while Jeff has penned The Elements of Data Analytic Style. As we could anticipate, there is a possibility that there is something of a competitive elementto publishing for both gentlemen as they see the the number of downloads of their books climb ever higher.
While I would hate to promote an increase to these tensions, The opportunity was too attractive given that I had access to some data on the number of downloads that each of the books had been achieving and I really wanted to write about difference charts using d3.js (and the method of sourcing the data for the book Raspberry Pi: Measure, Record, Explore).
So at the risk of providing some form of offence to these fine gentlemen or inciting an increased rivalry, I have forged ahead and hopefully the worst that will happen is that someone interested in d3.js will also find some interesting reading in R Programming for Data Science,Exploratory Data Analysis with R or The Elements of Data Analytic Style. Ultimately we should be left with a graph that will look something like this;

Science vs Style - Daily Leanpub Book Sales

Purpose

A difference chart is a variation on a bivariate area chart. This is a line chart that includes two lines that are interlinked by filling the space between the lines. A difference chart as demonstrated in the example here by Mike Bostock is able to highlight the differences between the lines by filling the area between them with different colours depending on which line is the greater value.
As Mike points out in his example, this technique harks back at least as far as William Playfair when he was describing the time series of exports and imports of Denmark and Norway in 1786.


William Playfair’s Time Series of Exports and Imports of Denmark and Norway

All that remains is for us to work out how d3.js can help us out by doing the job programmatically. The example that I use here is based on that of Mike Bostock’s, with the addition of a few niceties in the form of a legend, a title, and some minor changes.
We will start with a simple example of the code and we will add blocks to finally arrive at the example with Legends and title.

The Code

The following is the code for the simple difference chart. A live version is available online at bl.ocks.org or GitHub. It is also available as the files ‘diff-basic.html’ and ‘downloads.csv’ as a download with the book D3 Tips and Tricks (in a zip file) when you download the book from Leanpub.


<!DOCTYPE html>
<meta charset="utf-8">
<style>

body { font: 10px sans-serif;}

text.shadow {
  stroke: white;
  stroke-width: 2px;
  opacity: 0.9;
}

.axis path,
.axis line {
  fill: none;
  stroke: #000;
  shape-rendering: crispEdges;
}

.x.axis path { display: none; }

.area.above { fill: rgb(252,141,89); }
.area.below { fill: rgb(145,207,96); }

.line {
  fill: none;
  stroke: #000;
  stroke-width: 1.5px;
}

</style>
<body>
<script src="http://d3js.org/d3.v3.min.js"></script>
<script>

var title = "Science vs Style - Daily Leanpub Book Sales";

var margin = {top: 20, right: 20, bottom: 50, left: 50},
    width = 960 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;

var parsedtg = d3.time.format("%Y-%m-%d").parse;

var x = d3.time.scale().range([0, width]);
var y = d3.scale.linear().range([height, 0]);

var xAxis = d3.svg.axis().scale(x).orient("bottom");
var yAxis = d3.svg.axis().scale(y).orient("left");

var lineScience = d3.svg.area()
    .interpolate("basis")
    .x(function(d) { return x(d.dtg); })
    .y(function(d) { return y(d["Science"]); });

var lineStyle = d3.svg.area()
    .interpolate("basis")
    .x(function(d) { return x(d.dtg); })
    .y(function(d) { return y(d["Style"]); });

var area = d3.svg.area()
    .interpolate("basis")
    .x(function(d) { return x(d.dtg); })
    .y1(function(d) { return y(d["Science"]); });

var svg = d3.select("body").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
  .append("g")
    .attr("transform",
        "translate(" + margin.left + "," + margin.top + ")");

d3.csv("downloads.csv", function(error, dataNest) {

  dataNest.forEach(function(d) {
      d.dtg = parsedtg(d.date_entered);
      d.downloaded = +d.downloaded;
  });

  var data = d3.nest()
      .key(function(d) {return d.dtg;})
      .entries(dataNest);

  data.forEach(function(d) {
      d.dtg = d.values[0]['dtg'];
      d["Science"] = d.values[0]['downloaded'];
      d["Style"] = d.values[1]['downloaded'];
  });

  for(i=data.length-1;i>0;i--) {
          data[i].Science   = data[i].Science  -data[(i-1)].Science ;
          data[i].Style     = data[i].Style    -data[(i-1)].Style ;
   }

  data.shift(); // Removes the first element in the array

    x.domain(d3.extent(data, function(d) { return d.dtg; }));
    y.domain([
//      d3.min(data, function(d) {
//          return Math.min(d["Science"], d["Style"]); }),
//      d3.max(data, function(d) {
//          return Math.max(d["Science"], d["Style"]); })
      0,1400
    ]);

    svg.datum(data);

    svg.append("clipPath")
        .attr("id", "clip-above")
      .append("path")
        .attr("d", area.y0(0));

    svg.append("clipPath")
        .attr("id", "clip-below")
      .append("path")
        .attr("d", area.y0(height));

    svg.append("path")
        .attr("class", "area above")
        .attr("clip-path", "url(#clip-above)")
        .attr("d", area.y0(function(d) { return y(d["Style"]); }));

    svg.append("path")
        .attr("class", "area below")
        .attr("clip-path", "url(#clip-below)")
        .attr("d", area.y0(function(d) { return y(d["Style"]); }));

    svg.append("path")
        .attr("class", "line")
        .style("stroke", "darkgreen")
        .attr("d", lineScience);

    svg.append("path")
        .attr("class", "line")
        .style("stroke", "red")
        .attr("d", lineStyle);

    svg.append("g")
        .attr("class", "x axis")
        .attr("transform", "translate(0," + height + ")")
        .call(xAxis);

    svg.append("g")
        .attr("class", "y axis")
        .call(yAxis);

});

</script>
</body>

A sample of the associated csv file (downloads.csv) is formatted as follows;


date_entered,downloaded,book_name
2015-04-19,5481,R Programming for Data Science
2015-04-19,23751,The Elements of Data Analytic Style
2015-04-20,5691,R Programming for Data Science
2015-04-20,23782,The Elements of Data Analytic Style
2015-04-21,6379,R Programming for Data Science
2015-04-21,23820,The Elements of Data Analytic Style
2015-04-22,7281,R Programming for Data Science
2015-04-22,23857,The Elements of Data Analytic Style
2015-04-23,7554,R Programming for Data Science
2015-04-23,23881,The Elements of Data Analytic Style
2015-04-24,9331,R Programming for Data Science
2015-04-24,23932,The Elements of Data Analytic Style

Description

The graph has some portions that are common to the simple line graph example.
We start the HTML file, load some styling for the upcoming elements, set up the margins, time formatting scales, ranges and axes.
Because the graph is composed of two lines we need to declare two separate line functions;


var lineScience = d3.svg.area()
    .interpolate("basis")
    .x(function(d) { return x(d.dtg); })
    .y(function(d) { return y(d["Science"]); });

var lineStyle = d3.svg.area()
    .interpolate("basis")
    .x(function(d) { return x(d.dtg); })
    .y(function(d) { return y(d["Style"]); });

To fill an area we declare an area function using one of the lines as the baseline (y1) and when it comes time to fill the area later in the script we declare y0 separately to define the area to be filled as an intersection of two paths.


var area = d3.svg.area()
    .interpolate("basis")
    .x(function(d) { return x(d.dtg); })
    .y1(function(d) { return y(d["Science"]); });

In this instance we are using the green ‘Science’ line as the y1 line.
The svg area is then set up using the height, width and margin values and we load our csv files with our number of downloads for each book. We then carry out a standard forEach to ensure that the time and numerical values are formatted correctly.
Nesting the data
The data that we are starting with is formatted in a way that we could reasonably expect data to be available in this instance where a value is saved for distinct elements on an element by element basis. This style of recording data makes it easy to add new elements into the data stream or a database rather than relying on having them as discrete columns.


date_entered,downloaded,book_name
2015-04-19,5481,R Programming for Data Science
2015-04-19,23751,The Elements of Data Analytic Style
2015-04-20,5691,R Programming for Data Science
2015-04-20,23782,The Elements of Data Analytic Style
2015-04-21,6379,R Programming for Data Science
2015-04-21,23820,The Elements of Data Analytic Style

In this case, we will need to ‘pivot’ the data to produce a multi-column representation where we have a single row for each date, and the number of downloads for each book as seperate columns as follows;


date_entered,R Programming for Data Science,The Elements of Data Analytic Sty\
le
2015-04-19,5481,23751
2015-04-20,5691,23782
2015-04-21,6379,23820

This can be achieved using the d3 nest function.


  var data = d3.nest()
      .key(function(d) {return d.dtg;})
      .entries(dataNest);

We declare our new array’s name as data and we initiate the nest function;


 var data = d3.nest()

We assign the key for our new array as dtg. A ‘key’ is like a way of saying “This is the thing we will be grouping on”. In other words our resultant array will have a single entry for each unique date (dtg) which will have the values of the number of downloaded books associated with it.


  .key(function(d) {return d.dtg;})

Then we tell the nest function which data array we will be using for our source of data.


  }).entries(dataNest);

Wrangle the data
Once we have our pivoted data we can format it in a way that will suit the code for the visualisation. This involves storing the values for the ‘Science’ and ‘Style’ variables as part of a named index.


  data.forEach(function(d) {
      d.dtg = d.values[0]['dtg'];
      d["Science"] = d.values[0]['downloaded'];
      d["Style"] = d.values[1]['downloaded'];
  });

We then loop through the ‘Science’ and ‘Style’ array to convert the incrementing value of the total number of downloads into a value of the number that have been downloaded each day;


  for(i=data.length-1;i>0;i--) {
          data[i].Science   = data[i].Science  -data[(i-1)].Science ;
          data[i].Style     = data[i].Style    -data[(i-1)].Style ;
   }

Finally because we are adjusting from total downloaded to daily values we are left with an orphan value that we need to remove from the front of the array;


  data.shift();

Cheating with the domain
The observant d3.js reader will have noticed that the setting of the y domain has a large section commented out;


    x.domain(d3.extent(data, function(d) { return d.dtg; }));
    y.domain([
//      d3.min(data, function(d) {
//          return Math.min(d["Science"], d["Style"]); }),
//      d3.max(data, function(d) {
//          return Math.max(d["Science"], d["Style"]); })
      0,1400
    ]);

That’s because I want to be able to provide an ideal way for the graph to represent the data in an appropriate range, but because we are using the basis smoothing modifier, and the data is ‘peaky’, there is a tendency for the y scale to be fairy broad and the resultant graph looks a little lost;


Using automatic range

Alternatively, we could remove the smoothing and let the true data be shown;


Using automatic range and removing the basis smoothing

It should be argued that this is a truer representation of the data, but in this case I feel comfortable sacrificing accuracy for aesthetics (what have I become?).
Therefore, the domain for the y axis is set manually to between 0 and 1400, but feel free to remove that at the point when you introduce your own data :-).
data vs datum
One small line gets its own section. That line is;


    svg.datum(data);

A casual d3.js user could be forgiven for thinking that this doesn’t seem too fearsome a line, but it has hidden depths.
As Mike Bostock explains here, if we want to bind data to elements as a group we would be *.data, but if we want to bind that data to individual elements, we should use *.datum.
It’s a function of how the data is stored. If there is an expectation that the data will be dynamic then data is the way to go since it has the feature of preparing enter and exit selections. If the data is static (it won’t be changing) then datum is the way to go.
In our case we are assigning data to individual elements and as a result we will be using datum.
Setting up the clipPaths
The clipPath operator is used to define an area that is used to create a shape by intersecting one area with another.
In our case we are going to set up two clip paths. One is the area above the green ‘Science’ line (which we defined earlier as being the y1 component of an area selection);


The ‘clip-above’ clip path

This is declared via this portion of the code;


    svg.append("clipPath")
        .attr("id", "clip-above")
      .append("path")
        .attr("d", area.y0(0));

Then we set up the clip path that will exist for the area below the green ‘Science’ line ;


The ‘clip-below’ clip path

This is declared via this portion of the code;


    svg.append("clipPath")
        .attr("id", "clip-below")
      .append("path")
        .attr("d", area.y0(height));

Each of these paths has an ‘id’ which can be subsequently used by the following code.
Clipping and adding the areas
Now we come to clipping our shape and filling it with the appropriate colour.
We do this by having a shape that represents the area between the two lines and applying our clip path for the values above and below our reference line (the green ‘Science’ line). Where the two intersect, we fill it with the appropriate colour. The code to fill the area above the reference line is as follows;


    svg.append("path")
        .attr("class", "area above")
        .attr("clip-path", "url(#clip-above)")
        .attr("d", area.y0(function(d) { return y(d["Style"]); }));

Here we have two lines that are defining the shape between the two science and style lines;


    svg.append("path")
        ....
        ....
        .attr("d", area.y0(function(d) { return y(d["Style"]); }));

If we were to look at the shape that this produces it would look as follows (greyed out for highlighting);


The shape between the science and style lines

We apply a class to the shape so that is filled with the colour that we want;


        .attr("class", "area above")

.. and apply the clip path so that only the areas that intersect the two shapes are filled with the appropriate colour;


        .attr("clip-path", "url(#clip-above)")

Here the intersection of those two shapes is shown as pink;


The intersection of the shapes

Then we do the same for the area below;


    svg.append("path")
        .attr("class", "area below")
        .attr("clip-path", "url(#clip-below)")
        .attr("d", area.y0(function(d) { return y(d["Style"]); }));

With the corresponding areas showing the intersection of the two shapes coloured differently;


The intersection of the shapes

Draw the lines and the axes
The final part of our basic difference chart is to draw in the lines over the top so that they are highlighted and to add in the axes;


    svg.append("path")
        .attr("class", "line")
        .style("stroke", "darkgreen")
        .attr("d", lineScience);

    svg.append("path")
        .attr("class", "line")
        .style("stroke", "red")
        .attr("d", lineStyle);

    svg.append("g")
        .attr("class", "x axis")
        .attr("transform", "translate(0," + height + ")")
        .call(xAxis);

    svg.append("g")
        .attr("class", "y axis")
        .call(yAxis);

Et viola! we have our difference chart!


The basic difference chart

As mentioned earlier, the code for the simple difference chart is available online at bl.ocks.org or GitHub. It is also available as the files ‘diff-basic.html’ and ‘downloads.csv’ as a download with the book D3 Tips and Tricks (in a zip file) when you download the book from Leanpub.

Adding a bit more to our difference chart.

The chart itself is a thing of beauty, but given the subject matter (it’s describing two books after all) we should include a bit more information on what it is we’re looking at and provide some links so that a fascinated viewer of the graphs can read the books!
Add a Y axis label
Because it’s not immediately obvious what we’re looking at on the Y axis we should add in a nice subtle label on the Y axis;


    svg.append("g")
        .attr("class", "y axis")
        .call(yAxis)
      .append("text")
        .attr("transform", "rotate(-90)")
        .attr("y", 6)
        .attr("dy", ".71em")
        .style("text-anchor", "end")
        .text("Daily Downloads from Leanpub");

Add a title
Every graph should have a title. The following code adds this to the top(ish) centre of the chart and provides a white drop-shadow for readability;


    // ******* Title Block ********
    svg.append("text") // Title shadow
      .attr("x", (width / 2))
      .attr("y", 50 )
      .attr("text-anchor", "middle")
      .style("font-size", "30px")
      .attr("class", "shadow")
      .text(title);

    svg.append("text") // Title
      .attr("x", (width / 2))
      .attr("y", 50 )
      .attr("text-anchor", "middle")
      .style("font-size", "30px")
      .style("stroke", "none")
      .text(title);

Adding the legend
A respectable legend in this case should provide visual context of what it is describing in relation to the graph (by way of colour) and should actually name the book. We can also go a little bit further and provide a link to the books in the legend so that potential readers can access them easily.
Firstly the rectangles filled with the right colour, sized appropriately and arranged just right;


    var block = 300;   // rectangle width and position

    svg.append("rect") // Style Legend Rectangle
      .attr("x", ((width / 2)/2)-(block/2))
      .attr("y", height+(margin.bottom/2) )
      .attr("width", block)
      .attr("height", "25")
      .attr("class", "area above");

    svg.append("rect") // Science Legend Rectangle
      .attr("x", ((width / 2)/2)+(width / 2)-(block/2))
      .attr("y", height+(margin.bottom/2) )
      .attr("width", block)
      .attr("height", "25")
      .attr("class", "area below");

Then we add the text (with a drop-shadow) and a link;


    svg.append("text") // Style Legend Text shadow
      .attr("x", ((width / 2)/2))
      .attr("y", height+(margin.bottom/2) + 5)
      .attr("dy", ".71em")
      .attr("text-anchor", "middle")
      .style("font-size", "18px")
      .attr("class", "shadow")
      .text("The Elements of Data Analytic Style");

    svg.append("text") // Science Legend Text shadow
      .attr("x", ((width / 2)/2)+(width / 2))
      .attr("y", height+(margin.bottom/2) + 5)
      .attr("dy", ".71em")
      .attr("text-anchor", "middle")
      .style("font-size", "18px")
      .attr("class", "shadow")
      .text("R Programming for Data Science");

    svg.append("a")
      .attr("xlink:href", "https://leanpub.com/datastyle")
      .append("text") // Style Legend Text
      .attr("x", ((width / 2)/2))
      .attr("y", height+(margin.bottom/2) + 5)
      .attr("dy", ".71em")
      .attr("text-anchor", "middle")
      .style("font-size", "18px")
      .style("stroke", "none")
      .text("The Elements of Data Analytic Style");

    svg.append("a")
      .attr("xlink:href", "https://leanpub.com/rprogramming")
      .append("text") // Science Legend Text
      .attr("x", ((width / 2)/2)+(width / 2))
      .attr("y", height+(margin.bottom/2) + 5)
      .attr("dy", ".71em")
      .attr("text-anchor", "middle")
      .style("font-size", "18px")
      .style("stroke", "none")
      .text("R Programming for Data Science");

I’ll be the first to admit that this could be done more efficiently with some styling via css, but then it would leave nothing for the reader to try :-).
As a last touch we can include the links to the respective books in the shading for the graph itself;


    svg.append("a")
      .attr("xlink:href", "https://leanpub.com/datastyle")
      .append("path")
        .attr("class", "area above")
        .attr("clip-path", "url(#clip-above)")
        .attr("d", area.y0(function(d) { return y(d["Style"]); }));

    svg.append("a")
      .attr("xlink:href", "https://leanpub.com/rprogramming")
      .append("path")
        .attr("class", "area below")
        .attr("clip-path", "url(#clip-below)")
        .attr("d", area.y0(function(d) { return y(d["Style"]); }));

Perhaps not strictly required, but a nice touch none the less.
The final result
And here it is;


The full difference chart

The code for the full difference chart is available online at bl.ocks.org or GitHub. It is also available as the files ‘diff-full.html’ and ‘downloads.csv’ as a download with the book D3 Tips and Tricks (in a zip file) when you download the book from Leanpub.
The description above (and heaps of other stuff) is in the D3 Tips and Tricks book that can be downloaded for free (or donate if you really want to :-)).