Consistent width for geom_bar in the event of missing data
RGgplot2R Problem Overview
Is there a way to set a constant width for geom_bar()
in the event of missing data in the time series example below? I've tried setting width
in aes()
with no luck. Compare May '11 to June '11 width of bars in the plot below the code example.
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
colours <- c("#FF0000", "#33CC33", "#CCCCCC", "#FFA500", "#000000" )
iris$Month <- rep(seq(from=as.Date("2011-01-01"), to=as.Date("2011-10-01"), by="month"), 15)
d<-aggregate(iris$Sepal.Length, by=list(iris$Month, iris$Species), sum)
d$quota<-seq(from=2000, to=60000, by=2000)
colnames(d) <- c("Month", "Species", "Sepal.Width", "Quota")
d$Sepal.Width<-d$Sepal.Width * 1000
g1 <- ggplot(data=d, aes(x=Month, y=Quota, color="Quota")) + geom_line(size=1)
g1 + geom_bar(data=d[c(-1:-5),], aes(x=Month, y=Sepal.Width, width=10, group=Species, fill=Species), stat="identity", position="dodge") + scale_fill_manual(values=colours)
R Solutions
Solution 1 - R
Some new options for position_dodge()
and the new position_dodge2()
, introduced in ggplot2 3.0.0 can help.
You can use preserve = "single"
in position_dodge()
to base the widths off a single element, so the widths of all bars will be the same.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge(preserve = "single") ) +
scale_fill_manual(values = colours)
Using position_dodge2()
changes the way things are centered, centering each set of bars at each x axis location. It has some padding
built in, so use padding = 0
to remove.
ggplot(data = d, aes(x = Month, y = Quota, color = "Quota")) +
geom_line(size = 1) +
geom_col(data = d[c(-1:-5),], aes(y = Sepal.Width, fill = Species),
position = position_dodge2(preserve = "single", padding = 0) ) +
scale_fill_manual(values = colours)
Solution 2 - R
The easiest way is to supplement your data set so that every combination is present, even if it has NA
as its value. Taking a simpler example (as yours has a lot of unneeded features):
dat <- data.frame(a=rep(LETTERS[1:3],3),
b=rep(letters[1:3],each=3),
v=1:9)[-2,]
ggplot(dat, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
This shows the behavior you are trying to avoid: in group "B", there is no group "a", so the bars are wider. Supplement dat
with a dataframe with all the combinations of a
and b
:
dat.all <- rbind(dat, cbind(expand.grid(a=levels(dat$a), b=levels(dat$b)), v=NA))
ggplot(dat.all, aes(x=a, y=v, colour=b)) +
geom_bar(aes(fill=b), stat="identity", position="dodge")
Solution 3 - R
I had the same problem but was looking for a solution that works with the pipe (%>%
). Using tidyr::spread
and tidyr::gather
from the tidyverse
does the trick. I use the same data as @Brian Diggs, but with uppercase variable names to not end up with double variable names when transforming to wide:
library(tidyverse)
dat <- data.frame(A = rep(LETTERS[1:3], 3),
B = rep(letters[1:3], each = 3),
V = 1:9)[-2, ]
dat %>%
spread(key = B, value = V, fill = NA) %>% # turn data to wide, using fill = NA to generate missing values
gather(key = B, value = V, -A) %>% # go back to long, with the missings
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())
Edit:
There actually is a even simpler solution to that problem in combination with the pipe. Use tidyr::complete
gives the same result in one line:
dat %>%
complete(A, B) %>%
ggplot(aes(x = A, y = V, fill = B)) +
geom_col(position = position_dodge())