# Swirls – R programming – Lesson 5 – Missing Values

> library(swirl)

| Hi! Type swirl() when you are ready to begin.

> swirl()

| Welcome to swirl! Please sign in. If you’ve been here before, use the same name as you did then. If you are

| new, call yourself something unique.

What shall I call you? bernhardhack

| Please choose a course, or type 0 to exit swirl.

1: R Programming

2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers

4: Vectors 5: Missing Values 6: Subsetting Vectors

7: Matrices and Data Frames 8: Logic 9: Functions

10: lapply and sapply 11: vapply and tapply 12: Looking at Data

13: Simulation 14: Dates and Times 15: Base Graphics

Selection: 5

| | 0%

| Missing values play an important role in statistics and data analysis. Often, missing values must not be

| ignored, but rather they should be carefully studied to see if there’s an underlying pattern or cause for

| their missingness.

…

|===== | 5%

| In R, NA is used to represent any value that is ‘not available’ or ‘missing’ (in the statistical sense). In

| this lesson, we’ll explore missing values further.

…

|========== | 10%

| Any operation involving NA generally yields NA as the result. To illustrate, let’s create a vector c(44, NA,

| 5, NA) and assign it to a variable x.

> x <- c(44, NA, 5, NA)
| Your dedication is inspiring!
|=============== | 15%
| Now, let's multiply x by 3.
> x*3

[1] 132 NA 15 NA

| Nice work!

|===================== | 20%

| Notice that the elements of the resulting vector that correspond with the NA values in x are also NA.

…

|========================== | 25%

| To make things a little more interesting, lets create a vector containing 1000 draws from a standard normal

| distribution with y <- rnorm(1000).
> y <- rnorm(1000)
| You're the best!
|=============================== | 30%
| Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000).
> z <- rep(NA, 1000)
| Perseverance, that's the answer.
|==================================== | 35%
| Finally, let's select 100 elements at random from these 2000 values (combining y and z) such that we don't
| know how many NAs we'll wind up with or what positions they'll occupy in our final vector -- my_data <-
| sample(c(y, z), 100).
> my_data <- samplec(c(y, z), 100)
Error: could not find function "samplec"
> my_data <- sample(c(y, z), 100)
| Nice work!
|========================================= | 40%
| Let's first ask the question of where our NAs are located in our data. The is.na() function tells us whether
| each element of a vector is NA. Call is.na() on my_data and assign the result to my_na.
> my_ny <- is.na(my_data)
| Not quite! Try again. Or, type info() for more options.
| Assign the result of is.na(my_data) to the variable my_na.
> my_na <- is.na(my_data)
| You are amazing!
|============================================== | 45%
| Now, print my_na to see what you came up with.
> my_na

[1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

[19] FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE

[37] TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE

[55] FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE

[73] TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE

[91] FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE TRUE

| That’s the answer I was looking for.

|==================================================== | 50%

| Everywhere you see a TRUE, you know the corresponding element of my_data is NA. Likewise, everywhere you see a

| FALSE, you know the corresponding element of my_data is one of our random draws from the standard normal

| distribution.

…

|========================================================= | 55%

| In our previous discussion of logical operators, we introduced the `==` operator as a method of testing for

| equality between two objects. So, you might think the expression my_data == NA yields the same results as

| is.na(). Give it a try.

> my_data == NA

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

[37] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

[73] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Your dedication is inspiring!

|============================================================== | 60%

| The reason you got a vector of all NAs is that NA is not really a value, but just a placeholder for a quantity

| that is not available. Therefore the logical expression is incomplete and R has no choice but to return a

| vector of the same length as my_data that contains all NAs.

…

|=================================================================== | 65%

| Don’t worry if that’s a little confusing. The key takeaway is to be cautious when using logical expressions

| anytime NAs might creep in, since a single NA value can derail the entire thing.

…

|======================================================================== | 70%

| So, back to the task at hand. Now that we have a vector, my_na, that has a TRUE for every NA and FALSE for

| every numeric value, we can compute the total number of NAs in our data.

…

|============================================================================= | 75%

| The trick is to recognize that underneath the surface, R represents TRUE as the number 1 and FALSE as the

| number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the total number of TRUEs.

…

|================================================================================== | 80%

| Let’s give that a try here. Call the sum() function on my_na to count the total number of TRUEs in my_na, and

| thus the total number of NAs in my_data. Don’t assign the result to a new variable.

> sum(my_na)

[1] 62

| You are doing so well!

|======================================================================================== | 85%

| Pretty cool, huh? Finally, let’s take a look at the data to convince ourselves that everything ‘adds up’.

| Print my_data to the console.

> my:data

Error: object ‘my’ not found

> my_data

[1] -0.16629368 -0.42129496 NA NA NA -1.51308034 0.70290182 -0.43921687 NA

[10] 0.82686778 NA NA NA NA NA NA NA NA

[19] 1.00981458 -0.69336608 0.67911988 NA -1.15797177 NA -0.78783302 NA -2.17044504

[28] -0.39462761 NA -0.19135562 NA NA NA -0.80854687 -2.47246396 NA

[37] NA NA NA 0.74824429 0.28179468 -0.26805840 NA NA NA

[46] NA NA 1.02802970 -0.92091188 NA -0.14571645 -0.63697098 NA NA

[55] 0.63796534 NA NA NA NA NA -1.00566234 NA NA

[64] NA 1.03687648 NA NA 2.17104308 NA -0.46387743 NA NA

[73] NA 0.86804538 NA NA NA -0.64092080 -0.03314492 NA NA

[82] NA NA 0.02434736 NA 1.32072814 NA NA -0.36630120 NA

[91] 1.05074646 NA NA -1.92568166 -1.30438332 NA NA -0.49430828 NA

[100] NA

| Excellent job!

|============================================================================================= | 90%

| Now that we’ve got NAs down pat, let’s look at a second type of missing value — NaN, which stands for ‘not a

| number’. To generate NaN, try dividing (using a forward slash) 0 by 0 now.

> 0/0

[1] NaN

| You are doing so well!

|================================================================================================== | 95%

| Let’s do one more, just for fun. In R, Inf stands for infinity. What happens if you subtract Inf from Inf?

> Inf-Inf

[1] NaN

| Excellent job!

|=======================================================================================================| 100%

| Would you like to receive credit for completing this course on Coursera.org?

## Leave a Reply