We’re pleased to present our second edition of the Rhiza Tech Talk series with a new post by senior software engineer Matt Pickell. In this post, Matt talks about his process for placing dynamic shields on major roads for our geography-based data visualizations.
At Rhiza, we pull multiple datasets together to create very simple visualizations. Using vector tile data from Mapbox allows us to have a lot of control over these visualizations and produce customizable, compelling maps.
Vector tiles are packaged with separate layers containing all information needed to create a single map tile. One of these layers contains information for labeling the roads. The detail contained in this layer is sufficient to interpret the labels around pivots in the roads — making the text bend with the road. Label placement on maps is an interesting topic! The specific topic I will describe here is a little simpler: Placing shields on major roads.
When originally conceiving how to implement this we tested a very simple approach of summarizing the line-label segments into individual points and placing a road label at each point. For each label feature, take the center point in the line and place the shield there..
There were many issues with this approach; It did not consider the road holistically and resulted in poorly placed and frequently overlapping shields. The assumption was that the vector data spaced labels in some fashion…which it does not.
After taking a closer look at the data provided, we found that if you concatenate the line segments (matching on label text) in the road label layer you have basically recreated the road itself. Using this information we developed a method of label placement that considers each road as a whole and places labels by looking at all roads together. The result is cleanly spaced shields with no overlap, created on the fly as the user interacts with the map.
The first thing we need to do is to transform the label data into a new data structure grouping geojson features by road. Data manipulation can be performed on the data as well so that labels are formatted as desired, etc. The process discussed here starts from the geojson output of another process that has already downloaded all tiles, extracted the vector tile data, and filtered it based on the granularity of roads we want. So the geojson feature collection we have now contains the road label data from all tiles on the visible map. (These processes are performed in the client using web-workers.)
In order to split the data by road, we iterate over the geojson collection and check each label text. We’re specifically looking for highway shields so the strings are checked for validity using a simple regular expression:
As we check these labels, we run into complexities like a road that is named ‘US 22;US 30;I 376’. For cleaner placement of the shields, features like this are cloned: one feature with a label ‘US 22;US 30’, and one feature with the label ‘I 376’. The ‘US’ shields in the example represent a combined highway and are left together, placed together, and later styled to sit side-by-side on the map. If we encounter a route like ‘I 376 Business’, we have the choice to drop it or shorten it to ‘I 376B’, depending on what we want to show.
In an attempt to minimize what we process later, each valid line segment is checked against/placed in an rtree (we are using the rbush implementation. An rtree is used for spacial indexing, so that we can quickly and efficiently determine when two bounding-boxes overlap.) If a collision is detected in the rtree, then we have a line segment that already covers this section of road. A simple resolution here is to just take the larger (i.e., more points) segment and drop the shorter. You could also check for overlap and combine the segments, dropping the overlap, but we’re not drawing the roads with this information — we’re only trying to approximate the road so we can fit a line to it later. Our main goal here is to remove redundant line segments hoping reduce processing later.
The final step, then, is to extract the data from the rtree and place it into a ‘lookup’ mapping by road label. The result now is a map of road label -> array of road label features. This is a clean format for looking at each road as a whole in the next step: Placement.
When we place labels for each road, we want two things
- The labels should be spaced well along the individual road. Equal spacing, and only X shields per road length (where X is a value passed in to the web worker)
- Across all visible roads no labels should overlap
To achieve both of these with the best placement we prioritize how the road shields will be placed: interstates are more important than US highways, US highways trump the state roads. We then use another rtree in order to prevent any overlap. By prioritizing the processing order, more important labels are placed in the rtree first and have a better placement.
Now comes some fun with math in order to figure out the shield placement! At the end of the day, we simply need an X/Y position and label for each shield. To determine how many shields a road gets, we have a single variable passed in to the web-worker for how many shields we want per some linear-length on the visual map. This is just a general scaling to give an idea of how many shields we want per road, and the specific linear-length assumed here is “width of the map.”
In the prioritized order, we run each individual road through a placement process:
- Determine the bounding box of the road based on the points in the line segments. This will give us the endpoints that we need the shield(s) to be inside of when placed. We can also calculate the length of the line of best fit inside this bounding box in order to determine how many shields this road segment gets. This bounding box is clipped at the edges of the visible map if it extends outside so we’re not placing shields in non-visible space.
- Find the first-degree polynomial line of best fit and r-squared value for the road. Using a first-degree polynomial to represent the line does a pretty good job. The r-squared value tells us how well the line fits the data in the line segments. The point of using a line to approximate the road is that it is much easier to measure distances for placement of shields.
- If the r-squared value is too low, we are not going to get good placement of shields on the road. In this situation, the road features are sorted (by either latitude or longitude, depending on some interrogation of the data), and broken into X segments, where X is the number of shields that will be placed on that specific road. For each segment, a bounding box and line of best fit is calculated.
- Each road or road segment has its shields placed into the rtree. For each segment, points along the line of best fit are selected based on the number of shields to be placed on that road. For each point, an array is built with the actual points on the road, sorted by distance from the selected point. In order, each point is tested against the rtree to determine if it is a valid placement. Once a valid one is found, the label is placed and we move on to process the next road.
Note on r-squared
Calculating the r-squared for the line is important in order to make sure we actually are improving our placement, and not creating something worse.. Here is an example of I 79 near Pittsburgh. You can see that the best fit line is really not a good fit at all, and it has a very low r-squared.
When broken down in to sub-pieces, we get much better lines.
Finally, the shield-placement rtree now contains the only relevant information needed. The information in the tree is extracted into a new geojson feature collection and passed back to main ui thread for rendering on the chart. This rendering is fast and light since the web-worker reduced the road-label layer down to a collection of points. The result is a much cleaner, more intelligent (and dynamic) layout of the shields.