Hypothesis Testing in R
The powerful built-in t.test function is used for hypothesis testing in R Programming, especially for continuous data. This function does one- and two-sample t-tests. The versatile t.test function can be used with a vector, two vectors, or formula notation (e.g., y ~ x) for one- or two-sample tests. Important parameters include alternative, which determines the form of the alternative hypothesis as “two.sided”, “less”, or “greater”, and mu, which sets the mean null value.
Understanding Null and Alternative Hypotheses, P-values, and Significance Levels
Formally expressed assertions (hypotheses), a computed statistic, a matching probability value (p-value), and a criterion for decision-making (significance level) are the basic components of a hypothesis test.
Null and Alternative Hypotheses
Two opposing claims are formally stated at the outset of hypothesis testing the alternative hypothesis and the null hypothesis. Assumed to be true throughout the testing procedure, the null hypothesis (H0) is also known as the baseline or “no-change” hypothesis. The null value for tests that use means or differences is frequently an equality, such as declaring that the true mean is equal to a given value or that there is no difference between two means.
When the researcher tests a hypothesis or scenario against the null hypothesis, it is called the alternative hypothesis (HA). In general, the alternative hypothesis is described as an inequality with respect to the null value. Three different kinds of alternative hypotheses influence the test’s design:
Lower-tailed test: A lower-tailed test is determined by the structure of the alternative hypothesis in statistical hypothesis testing. This test expresses using a “less-than” statement, represented by the logical operator <. To test a single mean against a null value, the hypotheses are constructed. The alternative hypothesis can be defined using one of three methods. upper-tailed (using >) or two-tailed.
Upper-tailed test: A one-sided upper-tailed test is defined by the formal statement of the alternative hypothesis. In this setup uses a “greater-than” statement using > to assert that the true parameter value exceeds the null hypothesis. The test assumes the null hypothesis is true and looks for data above the null value to support the alternative claim. Thus, the alternative hypothesis requires the p-value to be derived as a right-hand tail probability (or upper-tail area) from the sampling distribution.
Two-tailed test: In hypothesis testing, a two-tailed test (or two-sided test) is characterized by an alternative hypothesis stating that the real parameter value differs from the null value in the null hypothesis. This test is performed when a researcher wants to find a positive or negative divergence from the null value. To test a single mean against a null value, the hypotheses would be organized. The p-value for a two-sided test is the sum of the probabilities in the sample distribution’s left and right tails. This formula equals taking twice the area of one tail if the sampling distribution is symmetric, such as the normal or distribution.
The particular issue being looked at determines the definition of the hypothesis completely.
Example:
# Hypothesis Testing Example in R
data <- c(148, 152, 149, 151, 147, 153, 150, 149, 152, 151)
mu_0 <- 150
cat("=== Hypothesis Testing ===\n")
lower <- t.test(data, mu = mu_0, alternative = "less")
upper <- t.test(data, mu = mu_0, alternative = "greater")
two <- t.test(data, mu = mu_0, alternative = "two.sided")
cat("\nLower-tailed (H1: mean < 150): p =", round(lower$p.value, 4))
cat(ifelse(lower$p.value < 0.05, " → Reject H0", " → Fail to reject H0"), "\n")
cat("Upper-tailed (H1: mean > 150): p =", round(upper$p.value, 4))
cat(ifelse(upper$p.value < 0.05, " → Reject H0", " → Fail to reject H0"), "\n")
cat("Two-tailed (H1: mean ≠150): p =", round(two$p.value, 4))
cat(ifelse(two$p.value < 0.05, " → Reject H0", " → Fail to reject H0"), "\n")
Output:
=== Hypothesis Testing ===
Lower-tailed (H1: mean < 150): p = 0.6245 → Fail to reject H0
Upper-tailed (H1: mean > 150): p = 0.3755 → Fail to reject H0
Two-tailed (H1: mean ≠150): p = 0.7509 → Fail to reject H0
The Test Statistic
A test statistic is computed when sample data is collected and the hypotheses are developed. A rescaled or standardized version of the main sample statistic of interest is this one. By scaling the difference between the observed sample statistic and the null value hypothesized in H0 by the sample statistic’s standard error, the test statistic is computed. Only two things influence the magnitude of the resulting p-value: the distribution and the test statistic’s extremity (distance from zero).
The Student’s t-distribution is the test statistic for testing a single mean, for example, and the sample standard deviation is used as an estimate. Although the t-distribution is comparable to the conventional normal distribution, it is typically employed for analyzing statistics that are estimated from a sample. The degrees of freedom, a number usually associated with the sample size, are used to choose the specific variant of the t-distribution that is employed.
The P-value
To measure the strength of the evidence against the null hypothesis, the p-value is the primary probability output. The p-value is formally defined as the likelihood of witnessing the computed test statistic, or something more extreme, if the null hypothesis (H0) is true.
The alternative hypothesis determines which way “more extreme” goes:
- There is a left-hand tail probability in a lower-tailed test.
- There is a right-hand tail probability in an upper-tailed test.
- The left-hand tail probability and the right-hand tail probability are added in a two-sided test; if the sample distribution, such the normal or t-distribution, is symmetric, the result is typically double the area in one of the tails.
One important interpretation is that the p-value will be reduced for more severe test statistics, which means that there is more statistical evidence against the presumed veracity of H0.
The Significance Level and Decision Making
A predetermined cutoff point for rejecting H0 is determined by the significance level. Typical conventional values. The decision rule is uncomplicated:
If the p-value is less than: A hypothesis test is considered statistically significant if the p-value is below the significance level. This comparison sets the claim decision cutoff. If the p-value is less than, the null hypothesis is rejected and the alternative hypothesis is accepted. Rejecting the null hypothesis depends on the pre-selected value. The significance level determines the likelihood of a Type I error, which is wrongly rejecting when it is true.
If the p-value is greater than or equal to:When the p-value exceeds the significance level, the statistical decision suggests insufficient evidence against the null hypothesis. Test results are qualified by a predetermined significance level. If the p-value exceeds the threshold, the null hypothesis is kept over the alternative hypothesis.
Statistical inference relies on this decision rule, but keeping does not prove its correctness. Instead, it means that the observed sample data do not indicate rejection under the assumption of its truth. This cutoff criterion is defined by logical operators like >= (greater than or equal to). If the p-value falls inside this range relative the result is not statistically significant.
Type I error probability is also defined by the significance level. Correctly rejecting a genuine null hypothesis is a Type I error. In the event that the null hypothesis is indeed true, extraordinary test statistics that result in rejection at a rate equal to would be expected.
Criticisms of Hypothesis Testing
Despite being a cornerstone of frequentist statistics, hypothesis testing is justified criticism. Critics argue that the conclusion of a hypothesis test, whether to accept or reject the null hypothesis, relies solely on the significance level, typically arbitrarily set at 0.05 or 0.01. Just a tiny adjustment can influence the decision to reject or maintain. No p-value can absolutely “prove” that either it only quantifies evidence against the null hypothesis. Therefore, rejecting does not disprove it, but rather suggests that the sample data favors. Due to the overuse and misuse of p-values in applied research, certain features of statistical inference have been deemphasized in recent years. Especially with many hypothesis tests, the risk of Type I errors missing a true rises.
Performing One- and Two-sample t-tests with t.test()
For continuous data, the t-test is a popular statistical test. Both one- and two-sample t tests are carried out in R using the t.test function. For the purpose of evaluating brewing quality, William Sealy Gosset first proposed the t statistic, which is linked to the t-test. The t.test function can be used with a variety of arguments, including paired (a logical value), mu (the null value for the mean), var.equal (a logical value indicating variance assumption), alternative (which specifies the direction of HA as “two.sided,” “less,” or “greater”), and mu. The computed t-statistic, degrees of freedom, p-value, and confidence interval are all included in the object it returns. The class “htest” includes this output item.
One-Sample T-Test
When comparing a single vector of data’s mean to a predetermined null value, a one-sample t-test is used. The function is called in R by providing the null mean and the data vector. For instance, you might want to check if the true mean is below a null number. Test statistics are calculated by comparing the sample mean to the null value, scaled by estimated standard error. If criteria are met, the test statistic follows a t-distribution with degrees of freedom equal to the sample size minus one.
R produces p-values as left-hand, lower-tail t-distribution probabilities for single sample t-tests with alternative=”less”. H0 is rejected by P-values below the significance level. A confidence interval is another result of the function. A one-sided confidence bound, which represents the direction of the alternative hypothesis, is usually the returning confidence interval when a one-sided alternative is selected. Explicitly setting alternative=”two.sided” is required to achieve the more popular completely bounded two-sided interval.
Two-Sample T-Tests (Independent Samples)
When comparing the means of two different, independent groups, the two-sample t-test is utilized. The two means being equal, or having no difference, is the standard null hypothesis. There are two ways to input this test in R: either supply two distinct data vectors, or use a formula notation (y ~ x) with a data frame argument. The var.equal argument, which controls for the variances of the two populations, is a crucial assumption in the test’s interpretation.
Welch Two-Sample T-test (Unpooled Variance): When the actual population variances cannot be presumed to be equal, the Welch Two-Sample T-test (Unpooled Variance), which is the default test carried out by t.test (var.equal=FALSE), is applicable. In particular, this approach is commonly called Welch’s t-test. The Welch-Satterthwaite equation is used to calculate the degrees of freedom, however it is complicated because this test assumes various true variances. The degrees of freedom that this test generates may not be integer values, as evidenced by the output examples.
Pooled Variance T-test: The test may use pooled variance by setting var.equal=TRUE if there is good reason to believe that the population variances are equal. The precision of the test is increased by using a pooled estimate of the variance. A straightforward method frequently used to verify the accuracy of this assumption is to see if the ratio of the bigger sample standard deviation to the smaller sample standard deviation is less than 2. When the pooling variance is used, the t-distribution’s degrees of freedom are determined by subtracting two from the total of the sample sizes.
The function interprets the test’s direction based on the order in which the data vectors are supplied to the x and y inputs, therefore attention must be made if the hypotheses call for a one-sided comparison.
Two-Sample T-Tests (Paired Samples)
The two sets of measures are paired or dependent, which is a specific instance of the two-sample test. This is the case when the observations are made on the same person (for example, a measurement taken before and after therapy) or are clearly connected.
Instead of treating the samples as completely distinct, the test that is used when data are paired concentrates on the true mean of the individual paired differences. In effect, the test turns into a single-sample t-test for the differences. Set R Programming t.test function’s optional logical parameter paired to TRUE for this test. The test statistic has a t-distribution and negative one degrees of freedom. Alternative=”less” is used in the test if it is predicted that the genuine difference (for example, from before to after) is less than zero, indicating a reduction.
Instead of t-tests, the Mann-Whitney U test (also known as the Wilcoxon rank-sum test) can be employed if the data clearly reveal non-normality.