> library(swirl)
| Hi! Type swirl() when you are ready to begin.
> swirl()
| Welcome to swirl! Please sign in. If you’ve been here before, use the same name as you did then. If you are
| new, call yourself something unique.
What shall I call you? bernhardhack
| Please choose a course, or type 0 to exit swirl.
1: R Programming
2: Take me to the swirl course repository!
Selection: 1
| Please choose a lesson, or type 0 to return to course menu.
1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers
4: Vectors 5: Missing Values 6: Subsetting Vectors
7: Matrices and Data Frames 8: Logic 9: Functions
10: lapply and sapply 11: vapply and tapply 12: Looking at Data
13: Simulation 14: Dates and Times 15: Base Graphics
Selection: 5
| | 0%
| Missing values play an important role in statistics and data analysis. Often, missing values must not be
| ignored, but rather they should be carefully studied to see if there’s an underlying pattern or cause for
| their missingness.
…
|===== | 5%
| In R, NA is used to represent any value that is ‘not available’ or ‘missing’ (in the statistical sense). In
| this lesson, we’ll explore missing values further.
…
|========== | 10%
| Any operation involving NA generally yields NA as the result. To illustrate, let’s create a vector c(44, NA,
| 5, NA) and assign it to a variable x.
> x <- c(44, NA, 5, NA)
| Your dedication is inspiring!
|=============== | 15%
| Now, let's multiply x by 3.
> x*3
[1] 132 NA 15 NA
| Nice work!
|===================== | 20%
| Notice that the elements of the resulting vector that correspond with the NA values in x are also NA.
…
|========================== | 25%
| To make things a little more interesting, lets create a vector containing 1000 draws from a standard normal
| distribution with y <- rnorm(1000).
> y <- rnorm(1000)
| You're the best!
|=============================== | 30%
| Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000).
> z <- rep(NA, 1000)
| Perseverance, that's the answer.
|==================================== | 35%
| Finally, let's select 100 elements at random from these 2000 values (combining y and z) such that we don't
| know how many NAs we'll wind up with or what positions they'll occupy in our final vector -- my_data <-
| sample(c(y, z), 100).
> my_data <- samplec(c(y, z), 100)
Error: could not find function "samplec"
> my_data <- sample(c(y, z), 100)
| Nice work!
|========================================= | 40%
| Let's first ask the question of where our NAs are located in our data. The is.na() function tells us whether
| each element of a vector is NA. Call is.na() on my_data and assign the result to my_na.
> my_ny <- is.na(my_data)
| Not quite! Try again. Or, type info() for more options.
| Assign the result of is.na(my_data) to the variable my_na.
> my_na <- is.na(my_data)
| You are amazing!
|============================================== | 45%
| Now, print my_na to see what you came up with.
> my_na
[1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[19] FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
[37] TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
[55] FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
[73] TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE
[91] FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE TRUE
| That’s the answer I was looking for.
|==================================================== | 50%
| Everywhere you see a TRUE, you know the corresponding element of my_data is NA. Likewise, everywhere you see a
| FALSE, you know the corresponding element of my_data is one of our random draws from the standard normal
| distribution.
…
|========================================================= | 55%
| In our previous discussion of logical operators, we introduced the `==` operator as a method of testing for
| equality between two objects. So, you might think the expression my_data == NA yields the same results as
| is.na(). Give it a try.
> my_data == NA
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[37] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[73] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
| Your dedication is inspiring!
|============================================================== | 60%
| The reason you got a vector of all NAs is that NA is not really a value, but just a placeholder for a quantity
| that is not available. Therefore the logical expression is incomplete and R has no choice but to return a
| vector of the same length as my_data that contains all NAs.
…
|=================================================================== | 65%
| Don’t worry if that’s a little confusing. The key takeaway is to be cautious when using logical expressions
| anytime NAs might creep in, since a single NA value can derail the entire thing.
…
|======================================================================== | 70%
| So, back to the task at hand. Now that we have a vector, my_na, that has a TRUE for every NA and FALSE for
| every numeric value, we can compute the total number of NAs in our data.
…
|============================================================================= | 75%
| The trick is to recognize that underneath the surface, R represents TRUE as the number 1 and FALSE as the
| number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the total number of TRUEs.
…
|================================================================================== | 80%
| Let’s give that a try here. Call the sum() function on my_na to count the total number of TRUEs in my_na, and
| thus the total number of NAs in my_data. Don’t assign the result to a new variable.
> sum(my_na)
[1] 62
| You are doing so well!
|======================================================================================== | 85%
| Pretty cool, huh? Finally, let’s take a look at the data to convince ourselves that everything ‘adds up’.
| Print my_data to the console.
> my:data
Error: object ‘my’ not found
> my_data
[1] -0.16629368 -0.42129496 NA NA NA -1.51308034 0.70290182 -0.43921687 NA
[10] 0.82686778 NA NA NA NA NA NA NA NA
[19] 1.00981458 -0.69336608 0.67911988 NA -1.15797177 NA -0.78783302 NA -2.17044504
[28] -0.39462761 NA -0.19135562 NA NA NA -0.80854687 -2.47246396 NA
[37] NA NA NA 0.74824429 0.28179468 -0.26805840 NA NA NA
[46] NA NA 1.02802970 -0.92091188 NA -0.14571645 -0.63697098 NA NA
[55] 0.63796534 NA NA NA NA NA -1.00566234 NA NA
[64] NA 1.03687648 NA NA 2.17104308 NA -0.46387743 NA NA
[73] NA 0.86804538 NA NA NA -0.64092080 -0.03314492 NA NA
[82] NA NA 0.02434736 NA 1.32072814 NA NA -0.36630120 NA
[91] 1.05074646 NA NA -1.92568166 -1.30438332 NA NA -0.49430828 NA
[100] NA
| Excellent job!
|============================================================================================= | 90%
| Now that we’ve got NAs down pat, let’s look at a second type of missing value — NaN, which stands for ‘not a
| number’. To generate NaN, try dividing (using a forward slash) 0 by 0 now.
> 0/0
[1] NaN
| You are doing so well!
|================================================================================================== | 95%
| Let’s do one more, just for fun. In R, Inf stands for infinity. What happens if you subtract Inf from Inf?
> Inf-Inf
[1] NaN
| Excellent job!
|=======================================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?