## Sunday, 5 April 2015

### Exploring Event Data by Combination Scatter Plot and Interactive Line Graphs

#### Purpose

In the process of implementing a method of measuring and displaying the passage of a cat through a cat-door (as described in the book ‘Raspberry Pi: Measure, Record, Explore’) I built a graph that showed events indicated by both date and time on separate axes. It was then that I figured that this would be useful for exploring event data or data that exists as a series of date/time stamps that signify a particular ‘thing as having occurred. In the cat door example it was the use of the door by the cat, but this is applicable to a huge range of data sets.
One that I thought of straight away was the dates and times that people downloaded the book D3 Tips and Tricks. Leanpub has an API for accessing the history of book activity and I was able to download it and store it in a database for examination.
Ultimately what I developed was a scatter plot that shows the date of the events on the X axis and the time of the events on the Y axis. This was augmented by two line graphs that showed the accumulated sums of each axis on their respective sides.
 Data Event Exploration
The full code for this example is available online at bl.ocks.org or GitHub. It is also available as the files ‘book-downloads.html’ and ‘downloads.zip’ (which contains downloads.json (it’s zipped up because otherwise it’s a bit too large for Leanpub)) as a download with the book D3 Tips and Tricks (in a zip file) when you download the book from Leanpub. For the ideal viewing experience, check it out in full screen mode.

There is also a separate blog post describing the information that I learned from looking at the data here.
To make the information slightly more accessible when the user hovers their mouse over the scatter plot there is an intersection of the position extrapolated to show the relationship to the other graphs and it presents the appropriate value of date, time and number downloaded by date and time.
This graph is a relatively complex combination of a range of different techniques presented in the book, including wrangling and nesting of data, combination of multiple graphs and the use of mouse movement to display tool-tips and additional data.

#### The Code

The code is extremely lengthy, so in lieu of placing it in the book it can be found on bl.ocks.org or Github. It is liberally commented to assist readers and I will describe particular sections of the code below and hopefully that will help more where required.
##### Wrangling the data
The graph uses four sets of data.
1. The raw event data (an array called `events`)
2. The scatter plot data (an array called `data`)
3. The date graph data (an array called `dataDate`)
4. The time graph data (an array called `dataTime`)
The raw event data is ingested from an external JSON file using the standard `d3.json` call.
The data itself is simply a collection of dates.
````{``"dtg"``:``"2013-01-24 09:10:59"``},`
`{``"dtg"``:``"2013-01-24 09:17:37"``},`
`{``"dtg"``:``"2013-01-24 09:48:48"``},`
`{``"dtg"``:``"2013-01-24 15:01:59"``},`
`{``"dtg"``:``"2013-01-24 18:11:44"``},`
`{``"dtg"``:``"2013-01-24 18:47:05"``},`
`{``"dtg"``:``"2013-01-24 18:47:23"``},`
`{``"dtg"``:``"2013-01-24 19:55:53"``},`
`{``"dtg"``:``"2013-01-24 22:37:39"``},`
`{``"dtg"``:``"2013-01-25 01:22:48"``},`
`{``"dtg"``:``"2013-01-25 06:37:38"``},`
`{``"dtg"``:``"2013-01-25 08:28:20"``},`
```
Once loaded we run a `forEach` over the file to put it in a format for manipulation into the remaining three data sets.
```    `// parse and format all the event data`
`events``.``forEach``(``function``(``d``)` `{`
`d``.``dtg` `=` `d``.``dtg``.``slice``(``0``,``-``4``)``+``'0:00'``;` `// get the 10 minute block`
`dtgSplit` `=` `d``.``dtg``.``split``(``" "``);`      `// split on the space`
`d``.``date` `=` `dtgSplit``[``0``];`             `// get the date seperatly`
`d``.``time` `=` `dtgSplit``[``1``];`             `// format the time`
`d``.``number_downloaded` `=` `1``;`          `// Number of downloads`
`});`
```
The first thing we do is to `slice` off the last four characters of the `dtg` string and replace them with `0:00`. This leave us with a set of `dtg` values that are only represented by the 10 minute window in which they were downloaded.
We then `split` the `dtg` string on the space that separates the date and the time and we designate one half `date` and the other half `time`.
Lastly we represent the number of books downloaded for each event as 1 (this helps us sum them up later).
Using the `events` data we create the data-set for the scatter plot (`data`) by nesting the information on the 10 minute `dtg` value of date/time and by summing the number of downloads;
```    `var` `data` `=` `d3``.``nest``()`
`.``key``(``function``(``d``)` `{` `return` `d``.``dtg``;})`
`.``rollup``(``function``(``d``)` `{`
`return` `d3``.``sum``(``d``,``function``(``g``)` `{``return` `g``.``number_downloaded``;` `});`
`})`
`.``entries``(``events``);`
```
We carry out a similar process for the date…
```    `var` `dataDate` `=` `d3``.``nest``()`
`.``key``(``function``(``d``)` `{` `return` `d``.``date``;})`
`.``rollup``(``function``(``d``)` `{`
`return` `d3``.``sum``(``d``,``function``(``g``)` `{``return` `g``.``number_downloaded``;` `});`
`})`
`.``entries``(``events``);`
```
… and the time;
```    `var` `dataTime` `=` `d3``.``nest``()`
`.``key``(``function``(``d``)` `{` `return` `d``.``time``;})`
`.``sortKeys``(``d3``.``ascending``)`
`.``rollup``(``function``(``d``)` `{`
`return` `d3``.``sum``(``d``,``function``(``g``)` `{``return` `g``.``number_downloaded``;` `});`
`})`
`.``entries``(``events``);`
```
##### Sizing Everything Up
The size of the graph is determined by a number of fixed variables which are fairly self explanatory;
• `scatterplotHeight` (which is also the height of the time graph)
• `dateGraphHeight`
• `timeGraphWidth`
But we need to let the width of the scatter plot (and the date graph) be a function of the number of days that have been collected. This variable is handled by;
• `scatterplotWidth`
This set-up is handled in the following block of code;
```    `var` `oneDay` `=` `24``*``60``*``60``*``1000``;` `// hours*minutes*seconds*milliseconds`
`var` `dateStart` `=` `d3``.``min``(``data``,` `function``(``d``)` `{` `return` `d``.``date``;` `});`
`var` `dateFinish` `=` `d3``.``max``(``data``,` `function``(``d``)` `{` `return` `d``.``date``;` `});`
`var` `numberDays` `=` `Math``.``round``(``Math``.``abs``((``dateStart``.``getTime``()` `-`
`dateFinish``.``getTime``())``/``(``oneDay``)));`

`var` `margin` `=` `{``top``:` `20``,` `right``:` `20``,` `bottom``:` `20``,` `left``:` `50``},`
`scatterplotHeight` `=` `520``,`
`scatterplotWidth` `=` `numberDays` `*` `1.5``,`
`dateGraphHeight` `=` `220``,`
`timeGraphWidth` `=` `220``;`
```
The overall size of the graphic (`height` and `width`) is therefore a combination of these variables;
```    `var` `height` `=` `scatterplotHeight` `+` `dateGraphHeight``,`
`width` `=` `scatterplotWidth` `+` `timeGraphWidth``;`
```
##### The Scatter Plot
There is no real surprise with the scatter plot itself. The only thing slightly unusual is the use of a time scale for both the X and Y axes;
```    `var` `x` `=` `d3``.``time``.``scale``().``range``([``0``,` `scatterplotWidth``]);`
`var` `y` `=` `d3``.``time``.``scale``().``range``([``0``,` `scatterplotHeight``]);`
```
When the circles are drawn, the size of the circle is determined by the radius, which is the number of downloads multiplied by 1.5. I know that this is a bit of a visualization ‘no-no’ because the area of the circle should be representative of the number, not the radius, but I tried it both ways and to my simple way of viewing the data, the radius adjustment provided the best comparison.
```    `svg``.``selectAll``(``".dot"``)`
`.``data``(``data``)`
`.``enter``().``append``(``"circle"``)`
`.``attr``(``"class"``,` `"dot"``)`
`.``attr``(``"r"``,` `function``(``d``)` `{` `return` `d``.``number_downloaded``*``1.5``;` `})`
`.``style``(``"opacity"``,` `0.3``)`
`.``style``(``"fill"``,` `"#e31a1c"` `)`
`.``attr``(``"cx"``,` `function``(``d``)` `{` `return` `x``(``d``.``date``);` `})`
`.``attr``(``"cy"``,` `function``(``d``)` `{` `return` `y``(``d``.``time``);` `});`
```
I know that this is a topic of some academic debate, and it is fascinating, so here are both results for comparison;
##### Date and Time Graphs
Both of these graphs are fairly routine. The time graph has the X and Y axes reversed from what would be ordinarily expected, but otherwise not much else to write home about.
##### Mouse Movement Information Display
This portion of the graph is an expansion of the ‘Favorite tool tip’ method from the previous section in this chapter. We expand the number of elements to update dynamically to about 10. All of which are designated with their own `class`.
We append the rectangle to capture the mouse movement over the scatter plot;
```    `svg``.``append``(``"rect"``)`
`.``attr``(``"width"``,` `scatterplotWidth``)`
`.``attr``(``"height"``,` `scatterplotHeight``)`
`.``style``(``"fill"``,` `"none"``)`
`.``style``(``"pointer-events"``,` `"all"``)`
`.``on``(``"mouseover"``,` `function``()` `{` `focus``.``style``(``"display"``,` `null``);` `})`
`.``on``(``"mouseout"``,` `function``()` `{` `focus``.``style``(``"display"``,` `"none"``);` `})`
`.``on``(``"mousemove"``,` `mousemove``);`
```
We capture the position of the mouse and convert it to figures we can use to compare to our data;
```    `function` `mousemove``()` `{`
`var` `xpos` `=` `d3``.``mouse``(``this``)[``0``],`
`x0` `=` `x``.``invert``(``xpos``),`
`y0` `=` `d3``.``mouse``(``this``)[``1``],`
`y1` `=` `y``.``invert``(``y0``),`
`date1` `=` `d3``.``mouse``(``this``)[``0``];`
```
And then we place our dynamic text and lines with our `focus.select` statements.
##### Labeling
The last order of business is to place some labels.
The location of labeling in this example is an interesting problem in itself. I’m personally torn between the desire to maintain simplicity and to ensure clarity. Hopefully what I have is enough to satisfy both requirements, but as always, each user and requirement will differ, so label as desired.
If there are additional parts of the code that you would like explained, please feel free to get in touch.