ggplot的组成

  • 诸如geom_point()这样的函数,其实都是调用了layor函数
1
2
3
4
5
6
7
ggplot(data) + layer(
mapping = NULL,
data = NULL,
geom = "point",
stat = "identity",
position = "identity"
)
  • ggplot的组成部分:datalayergeommappingstatposition

  • data一定是数据框

  • geom_point()之类的函数,必须有aes(),即aesthetic mappings,指定了数据框中的变量怎么映射到图片中。在ggplot(data, aes())中写aes(),和geom_point(data, aes())中指定,是一样的;但是有多个图层的时候,前者的aes所有图层都能用。

1
2
3
# aes里面都是:某个属性=某一列,或,某个属性=某个值
# 某个属性=某个值,aes自动会在数据框中添加一列,全是该数值
aes()
  • geom函数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
geom_blank(): display nothing. Most useful for adjusting axes limits using data.
geom_point(): points.
geom_path(): paths.
geom_ribbon(): ribbons, a path with vertical thickness.
geom_segment(): a line segment, specified by start and end position.
geom_rect(): rectangles.
geom_polyon(): filled polygons.
geom_text(): text.
# 离散单变量
geom_bar(): display distribution of discrete variable.
# 连续单变量
geom_histogram(): bin and count continuous variable, display with bars.
geom_density(): smoothed density estimate.
geom_dotplot(): stack individual points into a dot plot.
geom_freqpoly(): bin and count continuous variable, display with lines.
# 连续双变量
geom_point(): scatterplot.
geom_quantile(): smoothed quantile regression.
geom_rug(): marginal rug plots.
geom_smooth(): smoothed line of best fit.
geom_text(): text labels.
# 展示分布的函数
geom_bin2d(): bin into rectangles and count.
geom_density2d(): smoothed 2d density estimate.
geom_hex(): bin into hexagons and count.
# 两个变量,至少一个是离散的
geom_count(): count number of point at distinct locations
geom_jitter(): randomly jitter overlapping points.
# 一个离散,一个连续
geom_bar(stat = "identity"): a bar chart of precomputed summaries.
geom_boxplot(): boxplots.
geom_violin(): show density of values in each group.
# 一个时间,一个连续
geom_area(): area plot.
geom_line(): line plot.
geom_step(): step plot.
# Display uncertainty,没懂这个是啥
geom_crossbar(): vertical bar with center.
geom_errorbar(): error bars.
geom_linerange(): vertical line.
geom_pointrange(): vertical line with center.
# 画地图
geom_map()
# 三个变量的
geom_contour(): contours.
geom_tile(): tile the plane with rectangles.
geom_raster(): fast version of geom tile() for equal sized tiles.
  • stat,用于数据的转换
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
stat_bin(): geom bar(), geom freqpoly(), geom histogram()
stat_bin2d(): geom bin2d()
stat_bindot(): geom dotplot()
stat_binhex(): geom hex()
stat_boxplot(): geom boxplot()
stat_contour(): geom contour()
stat_quantile(): geom quantile()
stat_smooth(): geom smooth()
stat_sum(): geom count()

stat_ecdf(): compute a empirical cumulative distribution plot.
stat_function(): compute y values from a function of x values.
stat_summary(): summarise y values at distinct x values.
stat_summary2d(), stat summary hex(): summarise binned values.
stat_qq(): perform calculations for a quantile-quantile plot.
stat_spoke(): convert angle and radius to position.
stat_unique(): remove duplicated rows.
# ggplot
ggplot(mpg, aes(trans, cty)) +
geom_point() +
stat_summary(geom = "point", fun.y = "mean")
# 等价于
ggplot(mpg, aes(trans, cty)) +
geom_point() +
geom_point(stat = "summary", fun.y = "mean")
# position
position_stack(): stack overlapping bars (or areas) on top of each other.
position_fill(): stack overlapping bars, scaling so the top is always at 1.
position_dodge(): place overlapping bars (or boxplots) side-by-side.
position_nudge(): move points by a fixed offset.
position_jitter(): add a little random noise to every position.
position_jitterdodge(): dodge points within groups, then add a little random noise.
  • scale控制能看到的东西,如size, colour, position or shape, axes and legends
1

  • 图例
1
labs(color='xx')
1
2
3
4
5
6
7
8
aes(x=x, y=y, color='某一列', size = '某一列', shape='某一列', group='一列', fill='离散', weight='')
# 如果fill要作用在连续型变量上,需要group

# 注意,下面的两行代码,第一条代码有图例,第二条代码没有图例
ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = "blue"))
ggplot(mpg, aes(displ, hwy)) + geom_point(colour = "blue")
# 在ggplot里面标明的aes对所有图层有效,可以在某个图层上再指定aes,来覆盖掉ggplot里面的aes
ggplot(data, aes(x,y,group=z)) + geom_line(aes(y,z,group=a))
  • Scales: colour, or size, or shape. Scales draw a legend or axes
  • coord
  • Facet,分面
1
2
3
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(~class)
  • Theme

geom_函数

  • 下面的aes()都需要xy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
geom_area()
geom bar(stat = "identity")
geom_line()
geom point()
# polygons多边形
geom polygon()
geom_boxplot()
geom_violin()
geom rect(), geom tile() and geom raster() draw rectangles.
# ggplot不支持3d图层,但是支持瓷砖图、气泡图
geom_contour()
geom_raster()

#
geom_bin2d()
geom_hex()


# 不确定的数据
• Discrete x, range: geom errorbar(), geom linerange()
• Discrete x, range & center: geom crossbar(), geom pointrange()
• Continuous x, range: geom ribbon()
• Continuous x, range & center: geom smooth(stat = "identity")



  • 加注释,geom_textannotate,指定图的位置,内容label
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
yrng <- range(economics$unemploy)
xrng <- range(economics$date)
caption <- paste(strwrap("Unemployment rates in the US have
varied a lot over the years", 40), collapse = "\n")
ggplot(economics, aes(date, unemploy)) +
geom_line() +
geom_text(
aes(x, y, label = caption),
data = data.frame(x = xrng[1], y = yrng[2], caption = caption),
hjust = 0, vjust = 1, size = 4 )
# 或者
ggplot(economics, aes(date, unemploy)) +
geom_line() +
annotate("text", x = xrng[1], y = yrng[2], label = caption,
hjust = 0, vjust = 1, size = 4 )