Variables in R programming
In R programming, variables are names for R objects that store data that your programs can alter. R treats data, functions, and results as objects. The leftward arrow operator (<-) is usually used to generate variables, but = is also available. The expression x <-15 creates a variable named x with a value of 15. R is dynamically typed, unlike C Language or Java, therefore variables are not declared as data types in advance. Reassigning a new object changes the data type of a variable. R overwrites variable data without authorization.
Specific rules apply to variable names, or identifiers. They can contain letters, numerals, periods (.), and underscores (_), but they must begin with a letter or period without a digit. R treats my_variable and My_Variable as separate variables since it is case-sensitive. Variable names cannot contain reserved words like if, TRUE, or NULL.
Scope determines variable visibility. Function variables are local to the function’s environment and cannot be read by other functions, however command line variables are global and may be read by functions. Lexical scoping rules in R scan the current environment before its parent environment to identify variables. The workspace, which can be listed with ls(), contains all memory items throughout a session.
The Assignment Process
The assignment procedure in R stores data or expression results in a named R object for subsequent usage. The most commonly used assignment operator is <-, which combines the less-than symbol with an arrow-shaped minus sign. The command a <- 1 creates an object named a and saves the value 1 within it. Other assignment operators are = and ->. Although = is a new addition and behaves similarly to <-, many users prefer <- to avoid misunderstanding with ==. -> assigns c(TRUE,1) rightward. The vector is assigned to var.3. To see a variable’s value, type its name. R overwrites existing data without permission if the object has the same name.
Assignment context is key. The global environment is the active environment for command line commands, and new objects are saved there. R generates a temporary runtime environment to execute function code. Formal arguments and local variables formed in the function are stored in this temporary environment and do not influence global objects with the same names. This prevents functions from overwriting objects.
You can use the assign() function to deliberately edit an object in a different environment from within a function by specifying its name, new value, and target environment, such as the global environment. Alternatively, use the super assignment operator <<- to find and write to the first matching variable or construct one in the global environment if none is found.
Rules and Conventions for Naming Variables
Variables, or objects, store data in R and have unique names called identifiers. Variable names follow different conventions. Letters, numbers, periods (.), and underscores are valid IDs. However, a name must begin with a letter or period and cannot be followed by a digit. Thus, variable names cannot begin with numbers or underscores. Variable names cannot contain special symbols like ^,!, $, @, +, -, /, or *, or reserved words like if, for, or TRUE. Since R is case-sensitive, variables labeled my_var and My_var are handled as separate objects.
These rigorous requirements are supplemented by best practices. For multi-word variable names, use periods or “camelCaps” (e.g., aVariableName), which starts with a lowercase letter and uses uppercase for each word. To minimize confusion, avoid using built-in function names like mean or plot for variables. To see which names are in use in your workspace, use ls(). Remember that R will overwrite a variable without warning if you assign a new value.
Creating Objects to Store Data
R offers various data structures for different types of data. Simple items like atomic vectors generate more complicated structures.
Atomic Vectors: The Basic Building Block
Most data structures in R are built on the atomic vector, the simplest data type. Atomic vectors are ordered, one-dimensional data sets with all elements of the same “mode” (type). R treats single numbers as atomic vectors with a length of one. Atomic vectors are classified as doubles (sometimes termed “numerics”), integers, characters, logicals, complex, and raw. R can calculate mathematically on numeric vectors but not character vectors due to their characteristics.
Numeric Vectors: The most basic and ubiquitous data structure in R programming is a one-dimensional, ordered vector of numbers. All items in an atomic vector must have the same data type or mode. Most numeric vectors are created using c() to combine items, the colon operator: to build integer sequences, and seq() for more sophisticated arithmetic progressions.
Two main numeric vector types are doubles and integers in R. Standard doubles, sometimes known as numerics, store positive, negative, and decimal numbers. Because computers round decimal values to store them, doubles can have tiny floating-point mistakes. However, integers hold whole numbers and are generated by inserting a L (e.g., 4L).
Character Vectors: Character vectors are basic atomic vectors in R that store text. Each character vector element is a string. You can generate a character vector by surrounding text in single (‘) or double (“) quotes, but R stores all strings with double quotes. To create a vector from multiple strings, use the c() method, such as text <- c(“Hello”, “World”). R treats everything inside quotes as a character string, even numbers like “1” and logicals like “TRUE”. Indeed, a single string is a vector of length one with the mode “character”.
Logical Vectors: Logical vectors are atomic vectors in R that contain TRUE and FALSE Boolean data. Vectors of logical items are ordered. R recognizes the abbreviations T and F, however using the full words is suggested to avoid confusion with variables named T or F. Logical vectors, like other atomic vectors, store only one type of data. Both mode and type are “logical”.
R will convert multiple data types into character mode if you try to combine them into a vector.
Example:
# Numeric vectors
num_vec <- c(1, 2.5, 4L, 7.8)
print(num_vec)
# Character vectors
char_vec <- c("Hello", "World", "123", "TRUE")
print(char_vec)
# Logical vectors
log_vec <- c(TRUE, FALSE, T, F)
print(log_vec)
# Mixing types → R coerces all to character
mixed_vec <- c(1, TRUE, "text")
print(mixed_vec)
Output:
[1] 1.0 2.5 4.0 7.8
[1] "Hello" "World" "123" "TRUE"
[1] TRUE FALSE TRUE FALSE
[1] "1" "TRUE" "text"
More Complex Objects: Matrices, Lists, and Data Frames
Vectors are important, but complicated structures are typically needed to accurately represent data.
Matrices: A two-dimensional, rectangular matrix stores values in rows and columns in R programming. All matrix elements must be numeric, character, or logical. A matrix is a special example of an atomic vector with a dimension attribute, dim, a two-length vector indicating the number of rows and columns. Matrixes are two-dimensional arrays because they are vectors.
Lists: R’s basic data structure is lists, which organize R objects into one-dimensional collections. The main advantage of lists over atomic vectors, matrices, and arrays is their flexibility: they can hold numeric vectors, character strings, matrices, factors, other lists, and even functions. This makes them a versatile storage tool and a building element for more complicated items. Create lists with the list() function, separating each member with a comma. Name your components for easy reference.
Data Frames: Like Excel spreadsheets, R data frames are two-dimensional, tabular data structures that are most commonly used for data analysis. A data frame is a particular list with equal-length vectors constituting columns in the table. Data frames are flexible because each column can include numeric, character, or logical values, unlike matrices. In any column, all items must be uniform.
Managing Your Variables
Understanding how R’s environment system creates, stores, lists, removes, and modifies variables is crucial to variable management. R stores data and functions as named objects in active memory. In the global environment, the primary user environment, the workspace contains all these items created during a session. All global environment objects are seen in RStudio’s Environment or Workspace pane.
List, remove, and save variables with various essential functions. The ls() or objects() function lists all workspace objects. Use a pattern argument in ls() to discover variables with matching names. Using rm(), you can erase variables like x and y, or all objects from the workspace. Save.image() or save() can save your workspace or individual objects between sessions. These can be reloaded later with load(). Writing and saving commands in R scripts (plain text files with a.R extension) produces a repeatable record of your work that can be run later using the () function.
R’s scoping rules control variable visibility and modification. When a function is called, R creates a temporary runtime environment. Functions cannot unintentionally overwrite global variables since their variables are local to this environment and removed after they finish. If a function wants a non-local variable, R searches up a hierarchy of parent environments until it finds the object or the empty environment. To edit global variables within a function, use the super assignment operator (<<-) or assign() function with the target environment, such as envir = globalenv().