Swirl – R porgramming – Lesson 7 – Matrices and Data Frames

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers
4: Vectors 5: Missing Values 6: Subsetting Vectors
7: Matrices and Data Frames 8: Logic 9: Functions
10: lapply and sapply 11: vapply and tapply 12: Looking at Data
13: Simulation 14: Dates and Times 15: Base Graphics

Selection: 7
| | 0%

| In this lesson, we’ll cover matrices and data frames. Both represent ‘rectangular’ data types, meaning that
| they are used to store tabular data, with rows and columns.


|=== | 3%
| The main difference, as you’ll see, is that matrices can only contain a single class of data, while data
| frames can consist of many different classes of data.


|====== | 6%
| Let’s create a vector containing the numbers 1 through 20 using the `:` operator. Store the result in a
| variable called my_vector.

> my_vector <- c(1:20) | Give it another try. Or, type info() for more options. | You learned about the `:` operator in the lesson on sequences. If you wanted to create a vector containing the | numbers 1, 2, and 3 (in that order), you could use either c(1, 2, 3) or 1:3. In this case, we want the numbers | 1 through 20 stored in a variable called my_vector. Also, remember that you don't need the c() function when | using `:`. > my_vector(1:20)
Error: could not find function “my_vector”
> my_vector <- 1:20 | That's the answer I was looking for. |========= | 8% | View the contents of the vector you just created. > my_vector
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

| Excellent work!
|=========== | 11%
| The dim() function tells us the ‘dimensions’ of an object. What happens if we do dim(my_vector)? Give it a
| try.

> dim(my_vector)
NULL

| You nailed it! Good job!
|============== | 14%
| Clearly, that’s not very helpful! Since my_vector is a vector, it doesn’t have a `dim` attribute (so it’s just
| NULL), but we can find its length using the length() function. Try that now.

> length(my_vector)
[1] 20

| All that hard work is paying off!
|================= | 17%
| Ah! That’s what we wanted. But, what happens if we give my_vector a `dim` attribute? Let’s give it a try. Type
| dim(my_vector) <- c(4, 5). > dim(my_vector) <- c(4, 5) | Nice work! |==================== | 19% | It's okay if that last command seemed a little strange to you. It should! The dim() function allows you to get | OR set the `dim` attribute for an R object. In this case, we assigned the value c(4, 5) to the `dim` attribute | of my_vector. ... |======================= | 22% | Use dim(my_vector) to confirm that we've set the `dim` attribute correctly. > dim(my_vector)
[1] 4 5

| All that hard work is paying off!
|========================== | 25%
| Another way to see this is by calling the attributes() function on my_vector. Try it now.

> attributes(my_vector)
$dim
[1] 4 5

| That’s correct!
|============================= | 28%
| Just like in math class, when dealing with a 2-dimensional object (think rectangular table), the first number
| is the number of rows and the second is the number of columns. Therefore, we just gave my_vector 4 rows and 5
| columns.


|=============================== | 31%
| But, wait! That doesn’t sound like a vector any more. Well, it’s not. Now it’s a matrix. View the contents of
| my_vector now to see what it looks like.

> my_vector
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20

| All that practice is paying off!
|================================== | 33%
| Now, let’s confirm it’s actually a matrix by using the class() function. Type class(my_vector) to see what I
| mean.

> class(my_vector)
[1] “matrix”

| Your dedication is inspiring!
|===================================== | 36%
| Sure enough, my_vector is now a matrix. We should store it in a new variable that helps us remember what it
| is. Store the value of my_vector in a new variable called my_matrix.

> my_matrix <- my_vector | You're the best! |======================================== | 39% | The example that we've used so far was meant to illustrate the point that a matrix is simply an atomic vector | with a dimension attribute. A more direct method of creating the same matrix uses the matrix() function. ... |=========================================== | 42% | Bring up the help file for the matrix() function now using the `?` function. > ?matric()
Error in .helpForCall(topicExpr, parent.frame()) :
no methods for ‘matric’ and no documentation for it as a function
> ?matrix()
starting httpd help server … done

| Almost! Try again. Or, type info() for more options.

| The command ?matrix will do the trick.

> ?matrix

| That’s a job well done!
|============================================== | 44%
| Now, look at the documentation for the matrix function and see if you can figure out how to create a matrix
| containing the same numbers (1-20) and dimensions (4 rows, 5 columns) by calling the matrix() function. Store
| the result in a variable called my_matrix2.

> my_matrix2 <- matrix(data = 1:20, nrow = 4, ncol = 5, byrow = FALSE) | You got it right! |================================================= | 47% | Finally, let's confirm that my_matrix and my_matrix2 are actually identical. The identical() function will | tell us if its first two arguments are the same. Try it out. > identical(my_matrix, mymatrix2)
Error in identical(my_matrix, mymatrix2) : object ‘mymatrix2’ not found
> identical(my_matrix, my_matrix2)
[1] TRUE

| Keep working like that and you’ll get there!
|==================================================== | 50%
| Now, imagine that the numbers in our table represent some measurements from a clinical experiment, where each
| row represents one patient and each column represents one variable for which measurements were taken.


|====================================================== | 53%
| We may want to label the rows, so that we know which numbers belong to each patient in the experiment. One way
| to do this is to add a column to the matrix, which contains the names of all four people.


|========================================================= | 56%
| Let’s start by creating a character vector containing the names of our patients — Bill, Gina, Kelly, and
| Sean. Remember that double quotes tell R that something is a character string. Store the result in a variable
| called patients.

> patients <- c("Bill", "Gina", "Kelly", "Sean") | You got it right! |============================================================ | 58% | Now we'll use the cbind() function to 'combine columns'. Don't worry about storing the result in a new | variable. Just call cbind() with two arguments -- the patients vector and my_matrix. > cbind(pateints, my_matrix)
Error in cbind(pateints, my_matrix) : object ‘pateints’ not found
> cbind(patients, my_matrix)
patients
[1,] “Bill” “1” “5” “9” “13” “17”
[2,] “Gina” “2” “6” “10” “14” “18”
[3,] “Kelly” “3” “7” “11” “15” “19”
[4,] “Sean” “4” “8” “12” “16” “20”

| All that practice is paying off!
|=============================================================== | 61%
| Something is fishy about our result! It appears that combining the character vector with our matrix of numbers
| caused everything to be enclosed in double quotes. This means we’re left with a matrix of character strings,
| which is no good.


|================================================================== | 64%
| If you remember back to the beginning of this lesson, I told you that matrices can only contain ONE class of
| data. Therefore, when we tried to combine a character vector with a numeric matrix, R was forced to ‘coerce’
| the numbers to characters, hence the double quotes.


|===================================================================== | 67%
| This is called ‘implicit coercion’, because we didn’t ask for it. It just happened. But why didn’t R just
| convert the names of our patients to numbers? I’ll let you ponder that question on your own.


|======================================================================== | 69%
| So, we’re still left with the question of how to include the names of our patients in the table without
| destroying the integrity of our numeric data. Try the following — my_data <- data.frame(patients, my_matrix) > my_data <- data.frame(patients, my_matrix) | That's correct! |========================================================================== | 72% | Now view the contents of my_data to see what we've come up with. > my_data
patients X1 X2 X3 X4 X5
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20

| Your dedication is inspiring!
|============================================================================= | 75%
| It looks like the data.frame() function allowed us to store our character vector of names right alongside our
| matrix of numbers. That’s exactly what we were hoping for!


|================================================================================ | 78%
| Behind the scenes, the data.frame() function takes any number of arguments and returns a single object of
| class `data.frame` that is composed of the original objects.


|=================================================================================== | 81%
| Let’s confirm this by calling the class() function on our newly created data frame.

> class(my_data)
[1] “data.frame”

| You are amazing!
|====================================================================================== | 83%
| It’s also possible to assign names to the individual rows and columns of a data frame, which presents another
| possible way of determining which row of values in our table belongs to each patient.


|========================================================================================= | 86%
| However, since we’ve already solved that problem, let’s solve a different problem by assigning names to the
| columns of our data frame so that we know what type of measurement each column represents.


|============================================================================================ | 89%
| Since we have six columns (including patient names), we’ll need to first create a vector containing one
| element for each column. Create a character vector called cnames that contains the following values (in order)
| — “patient”, “age”, “weight”, “bp”, “rating”, “test”.

> names <- c("patient", "age", "weight", "bp", "rating", "test") | You're close...I can feel it! Try it again. Or, type info() for more options. | Make sure all of the names are lowercase, surrounded by double quotes, and separated with commas. Don't forget | to use the c() function. > cnames <- c("patient", "age", "weight", "bp", "rating", "test") | You nailed it! Good job! |============================================================================================== | 92% | Now, use the colnames() function to set the `colnames` attribute for our data frame. This is similar to the | way we used the dim() function earlier in this lesson. > colnames(my_data)
[1] “patients” “X1” “X2” “X3” “X4” “X5”

| You almost had it, but not quite. Try again. Or, type info() for more options.

| Try colnames(my_data) <- cnames. > colnames(my_data) <- cnames | You are really on a roll! |================================================================================================= | 94% | Let's see if that got the job done. Print the contents of my_data. > my_data
patient age weight bp rating test
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20

| That’s correct!
|==================================================================================================== | 97%
| In this lesson, you learned the basics of working with two very important and common data structures —
| matrices and data frames. There’s much more to learn and we’ll be covering more advanced topics, particularly
| with respect to data frames, in future lessons.


|=======================================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Swirl – r programming – Lesson 6 – Subsetting vectors

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers
4: Vectors 5: Missing Values 6: Subsetting Vectors
7: Matrices and Data Frames 8: Logic 9: Functions
10: lapply and sapply 11: vapply and tapply 12: Looking at Data
13: Simulation 14: Dates and Times 15: Base Graphics

Selection: 6
| | 0%

| In this lesson, we’ll see how to extract elements from a vector based on some conditions that we specify.


|=== | 3%
| For example, we may only be interested in the first 20 elements of a vector, or only the elements that are not
| NA, or only those that are positive or correspond to a specific variable of interest. By the end of this
| lesson, you’ll know how to handle each of these scenarios.


|===== | 5%
| I’ve created for you a vector called x that contains a random ordering of 20 numbers (from a standard normal
| distribution) and 20 NAs. Type x now to see what it looks like.

> x
[1] NA NA NA -0.72353476 -0.89198458 NA NA -0.23626102 NA
[10] NA 1.07340403 -0.34040992 1.20108728 -0.67248237 NA NA NA NA
[19] 0.35705287 2.57339120 0.30426816 NA 1.07870859 0.04976609 NA NA 0.93535528
[28] 0.73660119 NA 1.26329875 0.79243764 0.75671912 0.67751338 NA NA NA
[37] NA NA -0.33559365 1.29938969

| You nailed it! Good job!
|======== | 8%
| The way you tell R that you want to select some particular elements (i.e. a ‘subset’) from a vector is by
| placing an ‘index vector’ in square brackets immediately following the name of the vector.


|=========== | 10%
| For a simple example, try x[1:10] to view the first ten elements of x.

> x[1:10]
[1] NA NA NA -0.7235348 -0.8919846 NA NA -0.2362610 NA
[10] NA

| You got it!
|============= | 13%
| Index vectors come in four different flavors — logical vectors, vectors of positive integers, vectors of
| negative integers, and vectors of character strings — each of which we’ll cover in this lesson.


|================ | 15%
| Let’s start by indexing with logical vectors. One common scenario when working with real-world data is that we
| want to extract all elements of a vector that are not NA (i.e. missing data). Recall that is.na(x) yields a
| vector of logical values the same length as x, with TRUEs corresponding to NA values in x and FALSEs
| corresponding to non-NA values in x.


|================== | 18%
| What do you think x[is.na(x)] will give you?

1: A vector with no NAs
2: A vector of length 0
3: A vector of TRUEs and FALSEs
4: A vector of all NAs

Selection: 4

| Perseverance, that’s the answer.
|===================== | 21%
| Prove it to yourself by typing x[is.na(x)].

> x[is.na(x)]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Keep working like that and you’ll get there!
|======================== | 23%
| Recall that `!` gives us the negation of a logical expression, so !is.na(x) can be read as ‘is not NA’.
| Therefore, if we want to create a vector called y that contains all of the non-NA values from x, we can use y
| <- x[!is.na(x)]. Give it a try. > y <- x[!is.na(x)] | Keep up the great work! |========================== | 26% | Print y to the console. > y
[1] -0.72353476 -0.89198458 -0.23626102 1.07340403 -0.34040992 1.20108728 -0.67248237 0.35705287 2.57339120
[10] 0.30426816 1.07870859 0.04976609 0.93535528 0.73660119 1.26329875 0.79243764 0.75671912 0.67751338
[19] -0.33559365 1.29938969

| That’s a job well done!
|============================= | 28%
| Now that we’ve isolated the non-missing values of x and put them in y, we can subset y as we please.


|================================ | 31%
| Recall that the expression y > 0 will give us a vector of logical values the same length as y, with TRUEs
| corresponding to values of y that are greater than zero and FALSEs corresponding to values of y that are less
| than or equal to zero. What do you think y[y > 0] will give you?

1: A vector of all NAs
2: A vector of length 0
3: A vector of all the positive elements of y
4: A vector of TRUEs and FALSEs
5: A vector of all the negative elements of y

Selection: 3

| You are doing so well!
|================================== | 33%
| Type y[y > 0] to see that we get all of the positive elements of y, which are also the positive elements of
| our original vector x.

> y[y > 0]
[1] 1.07340403 1.20108728 0.35705287 2.57339120 0.30426816 1.07870859 0.04976609 0.93535528 0.73660119
[10] 1.26329875 0.79243764 0.75671912 0.67751338 1.29938969

| Your dedication is inspiring!
|===================================== | 36%
| You might wonder why we didn’t just start with x[x > 0] to isolate the positive elements of x. Try that now to
| see why.

> x[x > 0]
[1] NA NA NA NA NA NA NA 1.07340403 1.20108728
[10] NA NA NA NA 0.35705287 2.57339120 0.30426816 NA 1.07870859
[19] 0.04976609 NA NA 0.93535528 0.73660119 NA 1.26329875 0.79243764 0.75671912
[28] 0.67751338 NA NA NA NA NA 1.29938969

| Perseverance, that’s the answer.
|======================================== | 38%
| Since NA is not a value, but rather a placeholder for an unknown quantity, the expression NA > 0 evaluates to
| NA. Hence we get a bunch of NAs mixed in with our positive numbers when we do this.


|========================================== | 41%
| Combining our knowledge of logical operators with our new knowledge of subsetting, we could do this —
| x[!is.na(x) & x > 0]. Try it out.

> x[!is.na(x) & x > 0]
[1] 1.07340403 1.20108728 0.35705287 2.57339120 0.30426816 1.07870859 0.04976609 0.93535528 0.73660119
[10] 1.26329875 0.79243764 0.75671912 0.67751338 1.29938969

| You nailed it! Good job!
|============================================= | 44%
| In this case, we request only values of x that are both non-missing AND greater than zero.


|================================================ | 46%
| I’ve already shown you how to subset just the first ten values of x using x[1:10]. In this case, we’re
| providing a vector of positive integers inside of the square brackets, which tells R to return only the
| elements of x numbered 1 through 10.


|================================================== | 49%
| Many programming languages use what’s called ‘zero-based indexing’, which means that the first element of a
| vector is considered element 0. R uses ‘one-based indexing’, which (you guessed it!) means the first element
| of a vector is considered element 1.


|===================================================== | 51%
| Can you figure out how we’d subset the 3rd, 5th, and 7th elements of x? Hint — Use the c() function to
| specify the element numbers as a numeric vector.

> x[c(3, 5, 7)]
[1] NA -0.8919846 NA

| You’re the best!
|======================================================= | 54%
| It’s important that when using integer vectors to subset our vector x, we stick with the set of indexes {1, 2,
| …, 40} since x only has 40 elements. What happens if we ask for the zeroth element of x (i.e. x[0])? Give it
| a try.

> x[0]
numeric(0)

| You are doing so well!
|========================================================== | 56%
| As you might expect, we get nothing useful. Unfortunately, R doesn’t prevent us from doing this. What if we
| ask for the 3000th element of x? Try it out.

> x[300]
[1] NA

| Not quite, but you’re learning! Try again. Or, type info() for more options.

| Request the 3000th element of x (which does not exist) with x[3000].

> x[3000]
[1] NA

| Excellent job!
|============================================================= | 59%
| Again, nothing useful, but R doesn’t prevent us from asking for it. This should be a cautionary tale. You
| should always make sure that what you are asking for is within the bounds of the vector you’re working with.


|=============================================================== | 62%
| What if we’re interested in all elements of x EXCEPT the 2nd and 10th? It would be pretty tedious to construct
| a vector containing all numbers 1 through 40 EXCEPT 2 and 10.


|================================================================== | 64%
| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd and 10th elements of x,
| x[c(-2, -10)] gives us all elements of x EXCEPT for the 2nd and 10 elements. Try x[c(-2, -10)] now to see
| this.

> x[c(-2, -10)]
[1] NA NA -0.72353476 -0.89198458 NA NA -0.23626102 NA 1.07340403
[10] -0.34040992 1.20108728 -0.67248237 NA NA NA NA 0.35705287 2.57339120
[19] 0.30426816 NA 1.07870859 0.04976609 NA NA 0.93535528 0.73660119 NA
[28] 1.26329875 0.79243764 0.75671912 0.67751338 NA NA NA NA NA
[37] -0.33559365 1.29938969

| You got it right!
|===================================================================== | 67%
| A shorthand way of specifying multiple negative numbers is to put the negative sign out in front of the vector
| of positive numbers. Type x[-c(2, 10)] to get the exact same result.

> x[-c(2, 10)]
[1] NA NA -0.72353476 -0.89198458 NA NA -0.23626102 NA 1.07340403
[10] -0.34040992 1.20108728 -0.67248237 NA NA NA NA 0.35705287 2.57339120
[19] 0.30426816 NA 1.07870859 0.04976609 NA NA 0.93535528 0.73660119 NA
[28] 1.26329875 0.79243764 0.75671912 0.67751338 NA NA NA NA NA
[37] -0.33559365 1.29938969

| You are doing so well!
|======================================================================= | 69%
| So far, we’ve covered three types of index vectors — logical, positive integer, and negative integer. The
| only remaining type requires us to introduce the concept of ‘named’ elements.


|========================================================================== | 72%
| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2, norf = NA). > vect <- c(foo = 11, bar = 2, norf = NA) | You are quite good my friend! |============================================================================= | 74% | When we print vect to the console, you'll see that each element has a name. Try it out. > vect
foo bar norf
11 2 NA

| You are really on a roll!
|=============================================================================== | 77%
| We can also get the names of vect by passing vect as an argument to the names() function. Give that a try.

> names(vect)
[1] “foo” “bar” “norf”

| Excellent job!
|================================================================================== | 79%
| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.

> vect <- c(11, 2, NA) | Not quite! Try again. Or, type info() for more options. | Create an ordinary (unnamed) vector called vect2 that contains c(11, 2, NA). > vect2 <- c(11, 2, NA) | You got it right! |===================================================================================== | 82% | Then, we can add the `names` attribute to vect2 after the fact with names(vect2) <- c("foo", "bar", "norf"). | Go ahead. > names(vect2) <- c("foo", "bar", "norf") | That's the answer I was looking for. |======================================================================================= | 85% | Now, let's check that vect and vect2 are the same by passing them as arguments to the identical() function. > idenitical(vect, vect2)
Error: could not find function “idenitical”
> identical(vect, vect2)
[1] TRUE

| You’re the best!
|========================================================================================== | 87%
| Indeed, vect and vect2 are identical named vectors.


|============================================================================================ | 90%
| Now, back to the matter of subsetting a vector by named elements. Which of the following commands do you think
| would give us the second element of vect?

1: vect[“2”]
2: vect[bar]
3: vect[“bar”]

Selection: 1

| Not quite right, but keep trying.

| If we want the element named “bar” (i.e. the second element of vect), which command would get us that?

1: vect[“bar”]
2: vect[bar]
3: vect[“2”]

Selection: 2

| Try again. Getting it right on the first try is boring anyway!

| If we want the element named “bar” (i.e. the second element of vect), which command would get us that?

1: vect[“2”]
2: vect[“bar”]
3: vect[bar]

Selection: 2

| Excellent work!
|=============================================================================================== | 92%
| Now, try it out.

> vect[“bar”]
bar
2

| You are quite good my friend!
|================================================================================================== | 95%
| Likewise, we can specify a vector of names with vect[c(“foo”, “bar”)]. Try it out.

> vect[c(“foo”, “bar”)]
foo bar
11 2

| You are really on a roll!
|==================================================================================================== | 97%
| Now you know all four methods of subsetting data from vectors. Different approaches are best in different
| scenarios and when in doubt, try it out!


|=======================================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

1: Yes
2: No

Swirls – R programming – Lesson 5 – Missing Values

> library(swirl)

| Hi! Type swirl() when you are ready to begin.

> swirl()

| Welcome to swirl! Please sign in. If you’ve been here before, use the same name as you did then. If you are
| new, call yourself something unique.

What shall I call you? bernhardhack

| Please choose a course, or type 0 to exit swirl.

1: R Programming
2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers
4: Vectors 5: Missing Values 6: Subsetting Vectors
7: Matrices and Data Frames 8: Logic 9: Functions
10: lapply and sapply 11: vapply and tapply 12: Looking at Data
13: Simulation 14: Dates and Times 15: Base Graphics

Selection: 5
| | 0%

| Missing values play an important role in statistics and data analysis. Often, missing values must not be
| ignored, but rather they should be carefully studied to see if there’s an underlying pattern or cause for
| their missingness.


|===== | 5%
| In R, NA is used to represent any value that is ‘not available’ or ‘missing’ (in the statistical sense). In
| this lesson, we’ll explore missing values further.


|========== | 10%
| Any operation involving NA generally yields NA as the result. To illustrate, let’s create a vector c(44, NA,
| 5, NA) and assign it to a variable x.

> x <- c(44, NA, 5, NA) | Your dedication is inspiring! |=============== | 15% | Now, let's multiply x by 3. > x*3
[1] 132 NA 15 NA

| Nice work!
|===================== | 20%
| Notice that the elements of the resulting vector that correspond with the NA values in x are also NA.


|========================== | 25%
| To make things a little more interesting, lets create a vector containing 1000 draws from a standard normal
| distribution with y <- rnorm(1000). > y <- rnorm(1000) | You're the best! |=============================== | 30% | Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000). > z <- rep(NA, 1000) | Perseverance, that's the answer. |==================================== | 35% | Finally, let's select 100 elements at random from these 2000 values (combining y and z) such that we don't | know how many NAs we'll wind up with or what positions they'll occupy in our final vector -- my_data <- | sample(c(y, z), 100). > my_data <- samplec(c(y, z), 100) Error: could not find function "samplec" > my_data <- sample(c(y, z), 100) | Nice work! |========================================= | 40% | Let's first ask the question of where our NAs are located in our data. The is.na() function tells us whether | each element of a vector is NA. Call is.na() on my_data and assign the result to my_na. > my_ny <- is.na(my_data) | Not quite! Try again. Or, type info() for more options. | Assign the result of is.na(my_data) to the variable my_na. > my_na <- is.na(my_data) | You are amazing! |============================================== | 45% | Now, print my_na to see what you came up with. > my_na
[1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[19] FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
[37] TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
[55] FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE
[73] TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE FALSE TRUE
[91] FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE TRUE

| That’s the answer I was looking for.
|==================================================== | 50%
| Everywhere you see a TRUE, you know the corresponding element of my_data is NA. Likewise, everywhere you see a
| FALSE, you know the corresponding element of my_data is one of our random draws from the standard normal
| distribution.


|========================================================= | 55%
| In our previous discussion of logical operators, we introduced the `==` operator as a method of testing for
| equality between two objects. So, you might think the expression my_data == NA yields the same results as
| is.na(). Give it a try.

> my_data == NA
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[37] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[73] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Your dedication is inspiring!
|============================================================== | 60%
| The reason you got a vector of all NAs is that NA is not really a value, but just a placeholder for a quantity
| that is not available. Therefore the logical expression is incomplete and R has no choice but to return a
| vector of the same length as my_data that contains all NAs.


|=================================================================== | 65%
| Don’t worry if that’s a little confusing. The key takeaway is to be cautious when using logical expressions
| anytime NAs might creep in, since a single NA value can derail the entire thing.


|======================================================================== | 70%
| So, back to the task at hand. Now that we have a vector, my_na, that has a TRUE for every NA and FALSE for
| every numeric value, we can compute the total number of NAs in our data.


|============================================================================= | 75%
| The trick is to recognize that underneath the surface, R represents TRUE as the number 1 and FALSE as the
| number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the total number of TRUEs.


|================================================================================== | 80%
| Let’s give that a try here. Call the sum() function on my_na to count the total number of TRUEs in my_na, and
| thus the total number of NAs in my_data. Don’t assign the result to a new variable.

> sum(my_na)
[1] 62

| You are doing so well!
|======================================================================================== | 85%
| Pretty cool, huh? Finally, let’s take a look at the data to convince ourselves that everything ‘adds up’.
| Print my_data to the console.

> my:data
Error: object ‘my’ not found
> my_data
[1] -0.16629368 -0.42129496 NA NA NA -1.51308034 0.70290182 -0.43921687 NA
[10] 0.82686778 NA NA NA NA NA NA NA NA
[19] 1.00981458 -0.69336608 0.67911988 NA -1.15797177 NA -0.78783302 NA -2.17044504
[28] -0.39462761 NA -0.19135562 NA NA NA -0.80854687 -2.47246396 NA
[37] NA NA NA 0.74824429 0.28179468 -0.26805840 NA NA NA
[46] NA NA 1.02802970 -0.92091188 NA -0.14571645 -0.63697098 NA NA
[55] 0.63796534 NA NA NA NA NA -1.00566234 NA NA
[64] NA 1.03687648 NA NA 2.17104308 NA -0.46387743 NA NA
[73] NA 0.86804538 NA NA NA -0.64092080 -0.03314492 NA NA
[82] NA NA 0.02434736 NA 1.32072814 NA NA -0.36630120 NA
[91] 1.05074646 NA NA -1.92568166 -1.30438332 NA NA -0.49430828 NA
[100] NA

| Excellent job!
|============================================================================================= | 90%
| Now that we’ve got NAs down pat, let’s look at a second type of missing value — NaN, which stands for ‘not a
| number’. To generate NaN, try dividing (using a forward slash) 0 by 0 now.

> 0/0
[1] NaN

| You are doing so well!
|================================================================================================== | 95%
| Let’s do one more, just for fun. In R, Inf stands for infinity. What happens if you subtract Inf from Inf?

> Inf-Inf
[1] NaN

| Excellent job!
|=======================================================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?

Reading list grid


It’s a boy!

Top