1 Objectives of today’s exercise

We will extract and transform sociodemographic data of local areas in Paris. At the end of the class, we will produce maps of Paris, coloured by sociodemographic variables. For example, we will able to see which areas of Paris have the highest number of qualified professionals, the highest number of immigrants, or the highest number of young people.

We aim to create two data frames, one by IRIS and the other by arrondissement, that look something like this:

We then aim to plot this data on maps of Paris.

2 R and coding basics

2.1 Classes of objects

There are five classes of objects:

logical (e.g., TRUE, FALSE)
integer (e.g., 213, -3)
numeric (real or decimal) (e.g, 2, 2.0, -4.89 pi)
complex (e.g, 1 + 0i, 1 + 4i)
character (e.g, “hello”, "AA231@_:)"

You can find the class of an object by using the class()function and you can affect the class of an object by using the functions as.numeric(), as.logical() and as.character. The numeric equivalent of FALSE is 0 and of TRUE is 1 (or any other number).

as.numeric(FALSE)

## [1] 0

as.logical(43)

## [1] TRUE

2.1.1 Numeric, integer and complex values

You can do operations on them:

addition (+)

6 + 2

## [1] 8

subtraction (-)

6 - 2

## [1] 4

division (/)

6 / 2

## [1] 3

multiplication (*)

6 * 2

## [1] 12

exponent (^)

6^2

## [1] 36

It is often useful to store the result of a computation in an object:

result <- (6^2) / 4 + 17.85

If you want to see the value of an object:

result

## [1] 26.85

2.1.2 Character values

They need to be surrounded by quotation marks (’ or "). They need not be letters.

MyMessage <- "Welcome to PPD! _@&"
MyMessage

## [1] "Welcome to PPD! _@&"

Without quotation marks, R will think that you refer to an object.

Hello

## Error in eval(expr, envir, enclos): objet 'Hello' introuvable

You can combine several character objects

paste("My name", "is", "Léa", " ! :D")

## [1] "My name is Léa  ! :D"

or split them

substr("Bonjour", 2, 5)

## [1] "onjo"

2.1.3 Logical values and statements

In logical statements, e.g. “if A is equal to B, then apply function Y”, we use the following notation. Specifically, we use a double equals sign for ‘equal to’.

Statement	Meaning
`==`	equal to
`>=`, `<=`	greater than or equal to, less than or equal to
`>`, `<`	greater than, less than
`!=`	not equal to
`&`	and
`\|`	or

Keep in mind that parentheses matter!

1 == 1 | (2 == 2 & 1 == 2)

## [1] TRUE

(1 == 1 | 2 == 2) & 1 == 2

## [1] FALSE

Basic usage of logical values:

1==1

## [1] TRUE

is.numeric(MyMessage)

## [1] FALSE

123 > pi

## [1] TRUE

2.2 Data strcture

R has a number of basic data structures. A data structure is either homogeneous (all elements are of the same data type) or heterogeneous (elements can be of more than one data type).

Dimension	Homogeneous	Heterogeneous
1	Vector	List
2	Matrix	Data Frame
3+	Array	nested Lists

2.2.1 Vectors

In R, a vector is a sequence of objects that have the same class. To create a vector you should list its elements separated by commas inside c():

days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")

Vectors are ordered: you can recover elements of a vector using their position in the sequence:

days[4]

## [1] "Thursday"

Conversely, the function match() allows you to recover the position(s) of a specific element in a vector:

match("Friday", days)

## [1] 5

You can do basic computations with vectors:

4 + c(10, 20, 30)

## [1] 14 24 34

c(1, 2, 3) * 4

## [1]  4  8 12

4 ^ c(1, 2, 3)

## [1]  4 16 64

c(1, 2, 3) ^ 4

## [1]  1 16 81

In R, logical operators also work with vectors:

x = c(1, 3, 5, 7, 8, 9)
x > 3

## [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

x == 3

## [1] FALSE  TRUE FALSE FALSE FALSE FALSE

There are also useful for subsetting:

x[x > 3]

## [1] 5 7 8 9

max(x)

## [1] 9

which(x == max(x))

## [1] 6

The length() function gives you the number of elements in a vector:

length(days)

## [1] 7

The rep() function generates vectors by repeating things:

rep(c(1, 2, 3), 3)

## [1] 1 2 3 1 2 3 1 2 3

rep(c("a", "b", "c"), each = 2)

## [1] "a" "a" "b" "b" "c" "c"

The seq() function allows you to create vectors with sequences:

seq(0, 100, 5)

##  [1]   0   5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90
## [20]  95 100

Sequences of consecutive integers can be easily produced using the “:” sign

1:8

## [1] 1 2 3 4 5 6 7 8

You can append a string to each element of a vector with the function paste() (and the function paste0(), which is a shortcut for paste(..., seq=""))

paste("A", 1:4)

## [1] "A 1" "A 2" "A 3" "A 4"

paste0("B", 1:4)

## [1] "B1" "B2" "B3" "B4"

You can also merge all the elements of a vector:

days

## [1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"    "Saturday" 
## [7] "Sunday"

paste0(days, collapse=" ")

## [1] "Monday Tuesday Wednesday Thursday Friday Saturday Sunday"

2.2.2 Matrices

Matrices have rows and columns containing a single data type. In a matrix, the order of rows and columns is important. (This is not the case for data frames, which we will see later.)

Matrices can be created using the matrix function.

x = 1:9
x

## [1] 1 2 3 4 5 6 7 8 9

X = matrix(x, nrow = 3, ncol = 3)
X

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

By default the matrix function fills your data into the matrix column by column. But we can also tell R to fill rows instead:

Y = matrix(x, nrow = 3, ncol = 3, byrow = TRUE)

We can also create a matrix of a specified dimension where every element is the same, in this case 0.

Z = matrix(0, 2, 5)
Z

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]    0    0    0    0    0

Like vectors, matrices can be subsetted using square brackets, []. However, since matrices are two-dimensional, we need to specify both a row and a column when subsetting.

Y[2][3]

## [1] NA

X[1, ]

## [1] 1 4 7

Y[c(1,2), 2]

## [1] 2 5

Matrices can also be created by combining vectors as columns, using cbind, or combining vectors as rows, using rbind.

x = 1:9
x

## [1] 1 2 3 4 5 6 7 8 9

rev(x)

## [1] 9 8 7 6 5 4 3 2 1

rep(1, 9)

## [1] 1 1 1 1 1 1 1 1 1

rbind(x, rev(x), rep(1, 9))

##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## x    1    2    3    4    5    6    7    8    9
##      9    8    7    6    5    4    3    2    1
##      1    1    1    1    1    1    1    1    1

cbind(col_1 = x, col_2 = rev(x), col_3 = rep(1, 9))

##       col_1 col_2 col_3
##  [1,]     1     9     1
##  [2,]     2     8     1
##  [3,]     3     7     1
##  [4,]     4     6     1
##  [5,]     5     5     1
##  [6,]     6     4     1
##  [7,]     7     3     1
##  [8,]     8     2     1
##  [9,]     9     1     1

The usual computations are done element by element:

X + Y

##      [,1] [,2] [,3]
## [1,]    2    6   10
## [2,]    6   10   14
## [3,]   10   14   18

X - Y

##      [,1] [,2] [,3]
## [1,]    0    2    4
## [2,]   -2    0    2
## [3,]   -4   -2    0

X * Y

##      [,1] [,2] [,3]
## [1,]    1    8   21
## [2,]    8   25   48
## [3,]   21   48   81

X / Y

##           [,1] [,2]     [,3]
## [1,] 1.0000000 2.00 2.333333
## [2,] 0.5000000 1.00 1.333333
## [3,] 0.4285714 0.75 1.000000

Matrix multiplication uses %*%. Other matrix functions include t() which gives the transpose of a matrix and solve() which returns the inverse of a square matrix if it is invertible.

X  %*% Y

##      [,1] [,2] [,3]
## [1,]   66   78   90
## [2,]   78   93  108
## [3,]   90  108  126

t(X)

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

2.2.3 Arrays

A vector is a one-dimensional array. A matrix is a two-dimensional array. In R you can create arrays of arbitrary dimensionality N. Here is how:

d = 1:16
d1 = array(data = d,dim = c(4,2,2))
d2 = array(data = d,dim = c(4,2,2,3))  # will recycle 1:16
d1

## , , 1
## 
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
## 
## , , 2
## 
##      [,1] [,2]
## [1,]    9   13
## [2,]   10   14
## [3,]   11   15
## [4,]   12   16

d1 are simply two (4,2) matrices laid on top of each other, as if there were two pages. Similarly, d2 would have two pages, and another 3 registers in a fourth dimension. And so on. You can subset an array like you would a vector or a matrix, taking care to index each dimension:

d1[ ,1,1]  # all elements from col 1, page 1

## [1] 1 2 3 4

d1[2:3, , ]  # rows 2:3 from all pages

## , , 1
## 
##      [,1] [,2]
## [1,]    2    6
## [2,]    3    7
## 
## , , 2
## 
##      [,1] [,2]
## [1,]   10   14
## [2,]   11   15

2.2.4 Lists

A list is a one-dimensional heterogeneous data structure. So it is indexed like a vector with a single integer value (or with a name), but each element can contain an element of any type. Lists are extremely useful and versatile objects, so make sure you understand their usage:

# creation without fieldnames
list(10, "Bonjour", FALSE)

## [[1]]
## [1] 10
## 
## [[2]]
## [1] "Bonjour"
## 
## [[3]]
## [1] FALSE

# creation with fieldnames
ex_list = list(
  a = c(1, 2, 3, 4),
  b = TRUE,
  c = "PPD Master",
  d = function(arg = 42) {print("Hello everyone!")},
  e = diag(3)
)

Lists can be subset using two syntaxes, the $ operator, and square brackets []. The $ operator returns a named element of a list. The [] syntax returns a list, while the [[]] returns an element of a list.

ex_list[1] returns a list contain the first element.
ex_list[[1]] returns the first element of the list, in this case, a vector.

ex_list$e

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

ex_list[1:2]

## $a
## [1] 1 2 3 4
## 
## $b
## [1] TRUE

ex_list[1]

## $a
## [1] 1 2 3 4

ex_list[[1]]

## [1] 1 2 3 4

ex_list[c("e", "a")]

## $e
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
## 
## $a
## [1] 1 2 3 4

ex_list["e"]

## $e
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

ex_list[["e"]]

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

ex_list$d(arg = 1)

## [1] "Hello everyone!"

2.2.5 Dataframe

Data frame is usually the most common way that we store and interact with data in economics.

data = data.frame(x = 1:10,
                  y = c(rep("Hello", 9), "Goodbye"),
                  z = rep(c(TRUE, FALSE), 5))

Unlike a matrix, a data frame is not required to have the same data type for each element. A data frame is a list of vectors, and each vector has a name. So, each vector must contain the same data type, but the different vectors can store different data types. Note, however, that all vectors must have the same length (which is the main difference from a list).

Again, we access any given column with the $ operator (as a vector):

data

##     x       y     z
## 1   1   Hello  TRUE
## 2   2   Hello FALSE
## 3   3   Hello  TRUE
## 4   4   Hello FALSE
## 5   5   Hello  TRUE
## 6   6   Hello FALSE
## 7   7   Hello  TRUE
## 8   8   Hello FALSE
## 9   9   Hello  TRUE
## 10 10 Goodbye FALSE

data$y

##  [1] "Hello"   "Hello"   "Hello"   "Hello"   "Hello"   "Hello"   "Hello"  
##  [8] "Hello"   "Hello"   "Goodbye"

all.equal(length(data$x),
          length(data$y),
          length(data$z))

## [1] TRUE

nrow(data)

## [1] 10

ncol(data)

## [1] 3

names(data)

## [1] "x" "y" "z"

We can use different functions to get to know what is in a data frame:

head() which displays the n first observations of a data frame

head(data) #default

##   x     y     z
## 1 1 Hello  TRUE
## 2 2 Hello FALSE
## 3 3 Hello  TRUE
## 4 4 Hello FALSE
## 5 5 Hello  TRUE
## 6 6 Hello FALSE

head(data, n=2)

##   x     y     z
## 1 1 Hello  TRUE
## 2 2 Hello FALSE

str() which displays the structure of the data frame

str(data)

## 'data.frame':    10 obs. of  3 variables:
##  $ x: int  1 2 3 4 5 6 7 8 9 10
##  $ y: chr  "Hello" "Hello" "Hello" "Hello" ...
##  $ z: logi  TRUE FALSE TRUE FALSE TRUE FALSE ...

You can subset data frames like matrices using square brackets [ , ], or you can use the function subset().

data[data$z == F, c("x", "y" ) ] #[row condition, col condition]

##     x       y
## 2   2   Hello
## 4   4   Hello
## 6   6   Hello
## 8   8   Hello
## 10 10 Goodbye

subset(data, subset = y == "Hello", select = c("x", "z"))

##   x     z
## 1 1  TRUE
## 2 2 FALSE
## 3 3  TRUE
## 4 4 FALSE
## 5 5  TRUE
## 6 6 FALSE
## 7 7  TRUE
## 8 8 FALSE
## 9 9  TRUE

3 Set-up

3.1 R projects and structuring your code

R projects are good for managing your data and scripts in a particular folder on your computer. Using R-Studio, click File -> New project to create a new R project in a new or existing folder. A good name for a new folder is something like “Class1”, which you can save somewhere logical on your computer, such as in a folder called “Introdution_to_R”.

Within the folder “Class1”, create 3 subfolders, “Data”, “Scripts” and “Output”. We will save all R code in the folder “Scripts”. A key advantage of using R projects is that all paths leading to our input data and output files will be relative to the location of the R project (the folder “Class1”).

Create a new R script by clicking File -> New file -> R Script. This should be saved in the folder “Scripts”. You can call this script something like “cleaning_paris_data”.

3.2 Commenting code

It is always a good idea to comment lines of code. Use # at the start of a line in order place a comment or in order to disactivate the line so that it does not run.

3.3 Installing packages

R comes with a number of built-in functions and datasets, but one of the main strengths of R as an open-source project is its package system. Packages add additional functions and data. Often, if you want to do something in R, but it is not available by default, there probably exists a package that does it. You can find all packages listed on Comprehensive R Archive Network CRAN).

To install a package, use the install.packages("package_name") function. This requires an internet connection. Once a package is installed, it must be loaded into your current R session before being used by using the library("package_name") function. Once you close R, all the packages are closed. The next time you open R, you do not have to install the package again, but you do have to load any packages you intend to use by invoking library(). Thus, the first lines of our script will be:

### Installing and loading packages
# install.packages("tidyverse")
# install.packages("sf")
library("tidyverse")
library("sf")

A useful piece of code to install packages only if they are not already installed, then load them, is:

### installs if necessary and loads tidyverse and sf, another package which we will be using today
list.of.packages <- c("tidyverse", "sf")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos = "http://cran.us.r-project.org")

invisible(lapply(list.of.packages, library, character.only = TRUE))

4 Reading data

All the data for today’s exercise can be downloaded from here. Although we provide sources, you do not need to download the data from a given source.

For this exercise, we use the French population data at the IRIS level (50k units in metropolitan France) that can be downloaded from the Insee website. The shapefiles for the IRIS can be downloaded from here. The shapefiles for the arrondissements in Paris can be downloaded from here.

4.1 Reading delimited data

This data is in the form of an .csv (comma-separated values). This can be read using the function read_delim the tidyverse package readr. To the left of the <- sign is the new R object we wish to define, to the right is how we wish to define it.

df <- read_delim(file = "Data/base-ic-evol-struct-pop-2013.csv", delim = ",", col_names = TRUE, skip = 5, locale = locale(encoding = "UTF-8"))

The options for the function read_delim can be found by typing ?? read_delim in the console. Here, we just present a few frequently used options.

Argument	Description
file (required)	path to file (relative to R project)
delim (required)	delimiter
col_names (TRUE by default)	TRUE if first line is column names, else FALSE or a vector of column names
skip (0 by default)	the number of lines to skip at the start
locale	control the regional options, importantly the encoding

We can check that our dataframe df is how we want it to be by typing View(df) in the console, or by clicking on the data frame in the “Environment” panel.

Encoding matters

4.2 Other types of data

There are other packages inside the tidyverse that can be used to read most other classic types of data, for example: read_csv, read_xls, read_dta, read_sas, read_sav. These functions work similarly.

4.3 The tibble in R

A tibble in R is a standard R object used to store databases. It is the more modern version of a data frame. A tibble consists of rows and columns, where the columns contain one of five basic classes of data.

When our data set was imported from .csv, R recognized character and numeric columns. We will later learn how to change column types.

Each of the columns may be accessed by their name, e.g. df$IRIS, or by their number , e.g. df[,2].

5 The dplyr pipe function

The pipe function is part of the package dplyr in the tidyverse, and is used to simply transform a tibble. A cheatsheet for the dplyr package can by found on the homepage of this course, here

The pipe function

5.1 Selecting rows and columns: select and filter

We want to select only the columns IRIS, COM, TYP_IRIS, P13_POP and the age variables P13_POP0014 through to P13_POP75P. We want to select the rows that denote data from Paris only. To select columns, we use the function select. To select rows, we use the function filter.

iris <- df %>% 
  filter(DEP=="75") %>%
  select(IRIS, COM, TYP_IRIS, P13_POP, P13_POP0014:P13_POP75P)

The order of these two lines matters, if we select the columns first, then we cannot use the variable DEP to filter the variables. It is also possible to deselect variables by putting a minus sign before the variable, e.g. select(-COM).

5.1.1 Renaming columns

Columns can be renamed by using rename(new_name=old_name), or by integrating the new names into the select function, e.g. select(new_name_1=old_name_1, new_name_2=old_name_2).

5.2 Mutating variables

We now wish to convert all the population variables to percentages, and the TYP_IRIS variable to a factor. To modify one, many or all columns, we use the functions mutate, mutate_at or mutate_all.

iris <- df %>% 
  filter(DEP=="75") %>%
  select(IRIS, COM, TYP_IRIS, P13_POP, P13_POP0014:P13_POP75P) %>%
  mutate(TYP_IRIS = as.factor(TYP_IRIS)) %>%
  mutate_at(vars(P13_POP0014:P13_POP75P), funs(pc=./P13_POP))

5.2.1 Conditional mutations

We notice that there are some IRIS for which the population is 0. In these cases, when we divide by 0, we obtain the result NaN (not a number). We wish to convert these values to 0. We can use mutate_if to only mutate columns satifying a particular condition, and we can use the ifelse function to replace NaN by 0. The three arguments of the ifelse function are:

Logical statement
Action to take if logical statement is true
Action to take if logical statement is false

iris <- df %>% 
  filter(DEP=="75") %>%
  select(IRIS, COM, TYP_IRIS, P13_POP, P13_POP0014:P13_POP75P) %>%
  mutate(TYP_IRIS = as.factor(TYP_IRIS)) %>%
  mutate_at(vars(P13_POP0014:P13_POP75P), funs(pc=./P13_POP)) %>%
  mutate_if(is.numeric, funs(ifelse(is.nan(.), 0, .)))

5.2.2 Some basic string operations

Here we learn two simple functions for string variables, substr and paste0.

Say we wish to convert the column COM into a more readable string, e.g. instead of “75114”, we wish to write “Paris 14”. We use the function substr to extract from the 4th to the 5th position of the string, and paste0 to concatenate strings.

iris <- df %>% 
  filter(DEP=="75") %>%
  select(IRIS, COM, TYP_IRIS, P13_POP, P13_POP0014:P13_POP75P) %>%
  mutate(TYP_IRIS = as.factor(TYP_IRIS)) %>%
  mutate_at(vars(P13_POP0014:P13_POP75P), funs(pc=./P13_POP)) %>%
  mutate_if(is.numeric, funs(ifelse(is.nan(.), 0, .))) %>%
  mutate(name_arrd = substr(COM, 4, 5)) %>%
  mutate(name_arrd = paste0("Paris ", name_arrd))

5.3 Grouping and aggregating variables

We now wish to group the IRISes by arrondissement, in order to obtain aggregated statistics of the population by arrondissement. Using the function group_by, we can group the variables by COM, which indicates the arrondissement. We can use the function summarise_all, which works in the same way as mutate_all, to aggregate our data by group. After this aggregation, we need to ungroup our data frame.

arrd <- iris %>% 
  select(COM, P13_POP, P13_POP0014:P13_POP75P) %>%
  group_by(COM) %>%
  summarise_all(funs(sum(.))) %>%
  ungroup %>%
  mutate_at(vars(P13_POP0014:P13_POP75P), funs(pc=./P13_POP)) %>%
  mutate_if(is.numeric, funs(ifelse(is.nan(.), 0, .)))

The final two lines are the same as before.

5.4 Changing from wide to long, and long to wide

Our data is currently in wide format. To change it from wide to long format, we use the function gather, and to change it from long to wide format, we use the function spread.

long <- arrd %>%
  gather(key = population_variable, value = value, -COM)

wide <- long %>%
  spread(key = population_variable, value = value)

5.5 Writing data

We can write data in .csv format using write_csv. We can also use .rds format (r dataset) in order to preserve the tibble attributes, such as which variables are factor variables.

iris <- df %>% 
  filter(DEP=="75") %>%
  select(IRIS, COM, TYP_IRIS, P13_POP, P13_POP0014:P13_POP75P) %>%
  mutate(TYP_IRIS = as.factor(TYP_IRIS)) %>%
  mutate_at(vars(P13_POP0014:P13_POP75P), funs(pc=./P13_POP)) %>%
  mutate_if(is.numeric, funs(ifelse(is.nan(.), 0, .))) %>%
  mutate(name_arrd = substr(COM, 4, 5)) %>%
  mutate(name_arrd = paste0("Paris ", name_arrd)) %>%
  write_csv("Output/iris.csv") %>%
  write_rds("Output/iris.rds") 

arrd <- iris %>% 
  select(COM, P13_POP, P13_POP0014:P13_POP75P) %>%
  group_by(COM) %>%
  summarise_all(funs(sum(.))) %>%
  ungroup %>%
  mutate_at(vars(P13_POP0014:P13_POP75P), funs(pc=./P13_POP)) %>%
  mutate_if(is.numeric, funs(ifelse(is.nan(.), 0, .))) %>%
  write_csv("Output/arrd.csv") %>%
  write_rds("Output/arrd.rds")

5.6 Joins

There are four key types of joins.

Function	Meaning
`left_join(a, b, by="x")`	Join matching rows from b to a
`right_join(a, b, by="x")`	Join matching rows from a to b
`inner_join(a, b, by="x")`	Join data retaining rows in both sets
`full_join(a, b, by="x")`	Join data retaining all rows

We will apply a join with geographical data, in order to display our variables on a map.

5.6.1 Import geographical data

Shapefiles are a common format of geographical data. We can import them using the package sf, which is not part of the tidyverse, but follows the same syntax. We select only the variable corresponding to the IRIS code, and call this IRIS to match our other data set. We then apply a right_join to join our data to the geographical data to the iris tibble that we have created.

irisshp <- read_sf(dsn = "Data/iris", layer = "CONTOURS-IRIS") %>%
  select(IRIS=CODE_IRIS) %>%
  right_join(iris, by="IRIS")

5.6.2 Plot data

In the next class, we will plot data in a much nicer way using ggplot2. However, for now, we will simply use the plot function.

We wish to plot a demography variable, such as the percentage of people over 75 years old, on a map of Paris. We select only the variable of interest then use the function plot.

iristoplot <- irisshp %>%
  # mutate(P13_POP75P_pc=ifelse(TYP_IRIS=="H", P13_POP75P_pc, NA)) %>%  ### optional line to exclude IRISes with no or few inhabitants
  select(P13_POP75P_pc) 

plot(iristoplot)

In order to save the plots, use the following code.

### to save plot use these two lines
# dev.copy(pdf, 'Output/age.pdf')
# dev.off()

The same plot by arrondissement is given by the following code.

arrdshp <- read_sf(dsn = "Data/arrondissements", layer = "arrondissements") %>%
  select(COM=c_arinsee) %>%
  mutate(COM=as.character(COM)) %>%
  left_join(arrd, by="COM") %>%
  select(P13_POP75P_pc)

plot(arrdshp)

Class 1: Introduction to data wrangling with the tidyverse