# Swirl – r programming – Lesson 6 – Subsetting vectors

1: R Programming

2: Take me to the swirl course repository!

Selection: 1

| Please choose a lesson, or type 0 to return to course menu.

1: Basic Building Blocks 2: Workspace and Files 3: Sequences of Numbers

4: Vectors 5: Missing Values 6: Subsetting Vectors

7: Matrices and Data Frames 8: Logic 9: Functions

10: lapply and sapply 11: vapply and tapply 12: Looking at Data

13: Simulation 14: Dates and Times 15: Base Graphics

Selection: 6

| | 0%

| In this lesson, we’ll see how to extract elements from a vector based on some conditions that we specify.

…

|=== | 3%

| For example, we may only be interested in the first 20 elements of a vector, or only the elements that are not

| NA, or only those that are positive or correspond to a specific variable of interest. By the end of this

| lesson, you’ll know how to handle each of these scenarios.

…

|===== | 5%

| I’ve created for you a vector called x that contains a random ordering of 20 numbers (from a standard normal

| distribution) and 20 NAs. Type x now to see what it looks like.

> x

[1] NA NA NA -0.72353476 -0.89198458 NA NA -0.23626102 NA

[10] NA 1.07340403 -0.34040992 1.20108728 -0.67248237 NA NA NA NA

[19] 0.35705287 2.57339120 0.30426816 NA 1.07870859 0.04976609 NA NA 0.93535528

[28] 0.73660119 NA 1.26329875 0.79243764 0.75671912 0.67751338 NA NA NA

[37] NA NA -0.33559365 1.29938969

| You nailed it! Good job!

|======== | 8%

| The way you tell R that you want to select some particular elements (i.e. a ‘subset’) from a vector is by

| placing an ‘index vector’ in square brackets immediately following the name of the vector.

…

|=========== | 10%

| For a simple example, try x[1:10] to view the first ten elements of x.

> x[1:10]

[1] NA NA NA -0.7235348 -0.8919846 NA NA -0.2362610 NA

[10] NA

| You got it!

|============= | 13%

| Index vectors come in four different flavors — logical vectors, vectors of positive integers, vectors of

| negative integers, and vectors of character strings — each of which we’ll cover in this lesson.

…

|================ | 15%

| Let’s start by indexing with logical vectors. One common scenario when working with real-world data is that we

| want to extract all elements of a vector that are not NA (i.e. missing data). Recall that is.na(x) yields a

| vector of logical values the same length as x, with TRUEs corresponding to NA values in x and FALSEs

| corresponding to non-NA values in x.

…

|================== | 18%

| What do you think x[is.na(x)] will give you?

1: A vector with no NAs

2: A vector of length 0

3: A vector of TRUEs and FALSEs

4: A vector of all NAs

Selection: 4

| Perseverance, that’s the answer.

|===================== | 21%

| Prove it to yourself by typing x[is.na(x)].

> x[is.na(x)]

[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Keep working like that and you’ll get there!

|======================== | 23%

| Recall that `!` gives us the negation of a logical expression, so !is.na(x) can be read as ‘is not NA’.

| Therefore, if we want to create a vector called y that contains all of the non-NA values from x, we can use y

| <- x[!is.na(x)]. Give it a try.
> y <- x[!is.na(x)]
| Keep up the great work!
|========================== | 26%
| Print y to the console.
> y

[1] -0.72353476 -0.89198458 -0.23626102 1.07340403 -0.34040992 1.20108728 -0.67248237 0.35705287 2.57339120

[10] 0.30426816 1.07870859 0.04976609 0.93535528 0.73660119 1.26329875 0.79243764 0.75671912 0.67751338

[19] -0.33559365 1.29938969

| That’s a job well done!

|============================= | 28%

| Now that we’ve isolated the non-missing values of x and put them in y, we can subset y as we please.

…

|================================ | 31%

| Recall that the expression y > 0 will give us a vector of logical values the same length as y, with TRUEs

| corresponding to values of y that are greater than zero and FALSEs corresponding to values of y that are less

| than or equal to zero. What do you think y[y > 0] will give you?

1: A vector of all NAs

2: A vector of length 0

3: A vector of all the positive elements of y

4: A vector of TRUEs and FALSEs

5: A vector of all the negative elements of y

Selection: 3

| You are doing so well!

|================================== | 33%

| Type y[y > 0] to see that we get all of the positive elements of y, which are also the positive elements of

| our original vector x.

> y[y > 0]

[1] 1.07340403 1.20108728 0.35705287 2.57339120 0.30426816 1.07870859 0.04976609 0.93535528 0.73660119

[10] 1.26329875 0.79243764 0.75671912 0.67751338 1.29938969

| Your dedication is inspiring!

|===================================== | 36%

| You might wonder why we didn’t just start with x[x > 0] to isolate the positive elements of x. Try that now to

| see why.

> x[x > 0]

[1] NA NA NA NA NA NA NA 1.07340403 1.20108728

[10] NA NA NA NA 0.35705287 2.57339120 0.30426816 NA 1.07870859

[19] 0.04976609 NA NA 0.93535528 0.73660119 NA 1.26329875 0.79243764 0.75671912

[28] 0.67751338 NA NA NA NA NA 1.29938969

| Perseverance, that’s the answer.

|======================================== | 38%

| Since NA is not a value, but rather a placeholder for an unknown quantity, the expression NA > 0 evaluates to

| NA. Hence we get a bunch of NAs mixed in with our positive numbers when we do this.

…

|========================================== | 41%

| Combining our knowledge of logical operators with our new knowledge of subsetting, we could do this —

| x[!is.na(x) & x > 0]. Try it out.

> x[!is.na(x) & x > 0]

[1] 1.07340403 1.20108728 0.35705287 2.57339120 0.30426816 1.07870859 0.04976609 0.93535528 0.73660119

[10] 1.26329875 0.79243764 0.75671912 0.67751338 1.29938969

| You nailed it! Good job!

|============================================= | 44%

| In this case, we request only values of x that are both non-missing AND greater than zero.

…

|================================================ | 46%

| I’ve already shown you how to subset just the first ten values of x using x[1:10]. In this case, we’re

| providing a vector of positive integers inside of the square brackets, which tells R to return only the

| elements of x numbered 1 through 10.

…

|================================================== | 49%

| Many programming languages use what’s called ‘zero-based indexing’, which means that the first element of a

| vector is considered element 0. R uses ‘one-based indexing’, which (you guessed it!) means the first element

| of a vector is considered element 1.

…

|===================================================== | 51%

| Can you figure out how we’d subset the 3rd, 5th, and 7th elements of x? Hint — Use the c() function to

| specify the element numbers as a numeric vector.

> x[c(3, 5, 7)]

[1] NA -0.8919846 NA

| You’re the best!

|======================================================= | 54%

| It’s important that when using integer vectors to subset our vector x, we stick with the set of indexes {1, 2,

| …, 40} since x only has 40 elements. What happens if we ask for the zeroth element of x (i.e. x[0])? Give it

| a try.

> x[0]

numeric(0)

| You are doing so well!

|========================================================== | 56%

| As you might expect, we get nothing useful. Unfortunately, R doesn’t prevent us from doing this. What if we

| ask for the 3000th element of x? Try it out.

> x[300]

[1] NA

| Not quite, but you’re learning! Try again. Or, type info() for more options.

| Request the 3000th element of x (which does not exist) with x[3000].

> x[3000]

[1] NA

| Excellent job!

|============================================================= | 59%

| Again, nothing useful, but R doesn’t prevent us from asking for it. This should be a cautionary tale. You

| should always make sure that what you are asking for is within the bounds of the vector you’re working with.

…

|=============================================================== | 62%

| What if we’re interested in all elements of x EXCEPT the 2nd and 10th? It would be pretty tedious to construct

| a vector containing all numbers 1 through 40 EXCEPT 2 and 10.

…

|================================================================== | 64%

| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd and 10th elements of x,

| x[c(-2, -10)] gives us all elements of x EXCEPT for the 2nd and 10 elements. Try x[c(-2, -10)] now to see

| this.

> x[c(-2, -10)]

[1] NA NA -0.72353476 -0.89198458 NA NA -0.23626102 NA 1.07340403

[10] -0.34040992 1.20108728 -0.67248237 NA NA NA NA 0.35705287 2.57339120

[19] 0.30426816 NA 1.07870859 0.04976609 NA NA 0.93535528 0.73660119 NA

[28] 1.26329875 0.79243764 0.75671912 0.67751338 NA NA NA NA NA

[37] -0.33559365 1.29938969

| You got it right!

|===================================================================== | 67%

| A shorthand way of specifying multiple negative numbers is to put the negative sign out in front of the vector

| of positive numbers. Type x[-c(2, 10)] to get the exact same result.

> x[-c(2, 10)]

[1] NA NA -0.72353476 -0.89198458 NA NA -0.23626102 NA 1.07340403

[10] -0.34040992 1.20108728 -0.67248237 NA NA NA NA 0.35705287 2.57339120

[19] 0.30426816 NA 1.07870859 0.04976609 NA NA 0.93535528 0.73660119 NA

[28] 1.26329875 0.79243764 0.75671912 0.67751338 NA NA NA NA NA

[37] -0.33559365 1.29938969

| You are doing so well!

|======================================================================= | 69%

| So far, we’ve covered three types of index vectors — logical, positive integer, and negative integer. The

| only remaining type requires us to introduce the concept of ‘named’ elements.

…

|========================================================================== | 72%

| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2, norf = NA).
> vect <- c(foo = 11, bar = 2, norf = NA)
| You are quite good my friend!
|============================================================================= | 74%
| When we print vect to the console, you'll see that each element has a name. Try it out.
> vect

foo bar norf

11 2 NA

| You are really on a roll!

|=============================================================================== | 77%

| We can also get the names of vect by passing vect as an argument to the names() function. Give that a try.

> names(vect)

[1] “foo” “bar” “norf”

| Excellent job!

|================================================================================== | 79%

| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.

> vect <- c(11, 2, NA)
| Not quite! Try again. Or, type info() for more options.
| Create an ordinary (unnamed) vector called vect2 that contains c(11, 2, NA).
> vect2 <- c(11, 2, NA)
| You got it right!
|===================================================================================== | 82%
| Then, we can add the `names` attribute to vect2 after the fact with names(vect2) <- c("foo", "bar", "norf").
| Go ahead.
> names(vect2) <- c("foo", "bar", "norf")
| That's the answer I was looking for.
|======================================================================================= | 85%
| Now, let's check that vect and vect2 are the same by passing them as arguments to the identical() function.
> idenitical(vect, vect2)

Error: could not find function “idenitical”

> identical(vect, vect2)

[1] TRUE

| You’re the best!

|========================================================================================== | 87%

| Indeed, vect and vect2 are identical named vectors.

…

|============================================================================================ | 90%

| Now, back to the matter of subsetting a vector by named elements. Which of the following commands do you think

| would give us the second element of vect?

1: vect[“2”]

2: vect[bar]

3: vect[“bar”]

Selection: 1

| Not quite right, but keep trying.

| If we want the element named “bar” (i.e. the second element of vect), which command would get us that?

1: vect[“bar”]

2: vect[bar]

3: vect[“2”]

Selection: 2

| Try again. Getting it right on the first try is boring anyway!

| If we want the element named “bar” (i.e. the second element of vect), which command would get us that?

1: vect[“2”]

2: vect[“bar”]

3: vect[bar]

Selection: 2

| Excellent work!

|=============================================================================================== | 92%

| Now, try it out.

> vect[“bar”]

bar

2

| You are quite good my friend!

|================================================================================================== | 95%

| Likewise, we can specify a vector of names with vect[c(“foo”, “bar”)]. Try it out.

> vect[c(“foo”, “bar”)]

foo bar

11 2

| You are really on a roll!

|==================================================================================================== | 97%

| Now you know all four methods of subsetting data from vectors. Different approaches are best in different

| scenarios and when in doubt, try it out!

…

|=======================================================================================================| 100%

| Would you like to receive credit for completing this course on Coursera.org?

1: Yes

2: No

## Leave a Reply