ggplot2: Creating a custom plot with two different geoms

This past week for work I had to create some plots to show the max, min, and median of a measure across the levels of a qualitative variable, and show the max and min of the same variable within a subgroup of the dataset.  To illustrate what I mean, I took a fun dataset from the data and story library and recreated the plot that I made at work.
The data are supposed to represent the results of a study done on how long it took subjects to complete a pen and paper maze while they were either smelling a floral scent or not.  Other than the time to completion (the main variable of interest), some other variables are recorded, like sex and whether or not the subject is a smoker.
 I represent the min, max, and median completion time that it took men and women to complete the maze with the “crossbar” geom, which show obvious overlap in performance across the genders (however it’s apparent that men were a bit more variable in their performance than women).  The seemingly interesting result is represented by the dots.  The dots show the min and max values for the smokers within each gender.  While smoking doesn’t appear to have an effect on performance amongst the women, it seems to be correlated with much slower performance amongst the men in the study.  Then again, I just re-checked the data, which show that there are only 2 male smokers and 6 female smokers, so the comparison seems pretty unreliable.
Stepping away from the study itself, what I like here is that you can call up several geoms in the same plot, passing different data subsets to each geom.  It allows for useful customization so that you can tackle problems that aren’t so cut and dried.
Finally, I realize that before putting this kind of graph into a report, I really should create a little legend that shows the reader that the ends of the boxes are max and mins, and the middle lines represent medians, and the dots represent max and mins of the subset.
Following is the code and the plot I created:


scents = read.table("clipboard",header=TRUE,sep="\t")
strial3.by.sex.wide = ddply(scents, 'Sex', function (x) quantile(x$S.Trial.3, c(0,.5,1), na.rm=TRUE))
strial3.by.sex.smokers = melt(ddply(subset(scents,Smoker == "Y") , 'Sex', function (x) quantile(x$S.Trial.3, c(0,1), na.rm=TRUE)),variable.name="Percentile",value.name="Time")
ggplot() + geom_crossbar(data=strial3.by.sex.wide, aes(x=Sex, y=strial3.by.sex.wide$"50%", ymin=strial3.by.sex.wide$"0%", ymax=strial3.by.sex.wide$"100%"),fill="#bcc927",width=.75) +
geom_point(data=strial3.by.sex.smokers, aes(x=Sex, y=Time, stat="identity"), size=3)
+ opts(legend.title = theme_text(size=10, face="bold"), legend.text = theme_text(size=10),
axis.text.x=theme_text(size=10), axis.text.y=theme_text(size=10,hjust=1), axis.title.x=theme_text(size=12,face="bold"), axis.title.y=theme_text(size=12, angle=90,
face="bold")) + scale_y_continuous(name="Time to Completion")

Bar Graph Colours That Work Well

Ever since I started using ggplot2 more often at work in order to do graphs, I’ve realized something about the use of colour in bar graphs vs. dot plots: When I’m looking at a graph displayed on the brilliant Viewsonic monitor I’m using at work, the same relatively intense colours that work well in a dot plot start to bother me in a bar graph.  Take the bar graph immediately below for example.  The colour choice is not a bad one, but there’s something about the intensity of the colours that makes me want to find a new set of colours somewhat more soothing to my eyes.

The first resource I found was a “Color Encyclopedia” website called Color Hex and started looking for colours that seemed more soothing and could be used to compare 3 bars against one another in a bar graph.  You can search for colour names, colours according to their hexadecimal values, or even browse their list of “web safe colors”.  I stumbled upon the particular purple displayed in the graph below, and it simply gave me the other colours in the triad as suggestions.

Looking at this triad of colours, I’m actually quite pleased, but I still didn’t really understand why these colours worked, and how to select a new triad that didn’t bother me.  I shuffled through many different colours on the Color Hex website, and nothing else seemed to work with me as I wasn’t selecting colours based on any theory.

Then I stumbled upon an article by the good people at Perceptual Edge.  They seemed to confirm my earlier statement about the same intense colours working well when used to colour dot plots not working so well in bar plots.   Their solution is a simple one: choose from a list of colours of medium intensity.  On page 6 of the document linked above, they offer 8 different hues that look nice in a bar plot.  All I had to do to use these colours in the below plots was take a screenshot of the document, bring it into Inkscape, and hover the eye-dropper tool over the colours to get the hexadecimal colour values.  If you’re interested in using the values, I typed them out at the bottom of my post.  Now take a look at the graphs below:

The two graphs above follow the same principle that I had unknowingly touched upon when I chose the colours from the Color Hex website: stick with medium intensity, and your eyes won’t be jarred by the colour contrast.  I like that!

Anyway, below I show you the code I used to manually input the hexadecimal colour values into my ggplot bar graphs, and the list of 8 hexadecimal colour values corresponding with the colour boxes on page 6 of the Perceptual Edge document.  The variables a, c, and b were just variable names from a mock data frame that I cooked up for the purpose of the plots.

> colours = c(“#599ad3”, “#f9a65a”, “#9e66ab”)
> ggplot(e, aes(x=a, y=c, fill=b, stat=”identity”)) + geom_bar(position=”dodge”) + coord_flip()+scale_fill_manual(values=colours)

#727272
#f1595f
#79c36a
#599ad3
#f9a65a
#9e66ab
#cd7058
#d77fb3