C Character set
The character set includes all C Language and C++ program symbols. These characters serve as your programs’ basic building blocks. Tokens are the language’s fundamental vocabulary, and the compiler gathers them into syntactic units known as tokens.
In general, the C character set consists of:
- Letters: A-Z in uppercase and a-z in lowercase.
- Digits: All numbers between 0 and 9.
- Special characters: Many symbols, including #, (, ), {, }, <, >, +, *, /,., ;, :,?, ‘, “,!, |, ~, _, $, %, ^, =, &, \, [, ].
- White space: Blank space, newline, carriage return, horizontal tab, and vertical tab are examples of characters. White space can be used to separate tokens, but the compiler usually ignores it for other reasons.
Characters are internally represented by computers using numeric codes. The term “computer character set” refers to the collection of characters that a computer employs and the integer representations that go with them. ASCII (American Standard Code for Information Interchange) is the default character set used by C. The ASCII standard is extensively used.
These are control codes and a collection of printable characters. Unicode includes ASCII as a subset as well. While a 256-character set is also discussed, the ASCII character set is commonly used to refer to the fundamental set that includes values 0–127. Every ASCII character has an integer value. The values of ‘A’, ‘a’, and ‘0’ are 65, 97, and 48. Both the letters A through Z and a through z have contiguous character codes, as do the numbers 0 through 9.
In C++, individual characters can be stored in the char type. The UTF-16 and UTF-32 character types char16_t and char32_t are also supported by C++. UTF-8 (u8), UCS-2 (u), UCS-4 (U), and wide characters (L) are among the prefixes that can be used to indicate the type of character and string literals.
To represent some special or non-printing characters, escape sequences are used. The backslash (), often known as the escape character, is used to introduce these sequences. Despite having two characters (such as \n) in their writing, they only represent one character. Popular escape routes consist of:
- \n: Newline
- \t: Horizontal tab
- \v: Vertical tab
- \b: Backspace
- \r: Carriage return
- \f: Form feed
- \a: Alert (BEL)
- \: Backslash
- \?: Question mark
- \’: Single quote
- \”: Double quote
It is also possible to use \ooo (one to three octal digits) or \xhh (one or more hexadecimal digits) to denote a character’s octal or hex.
A range of characters and their decimal codes are printed in this straightforward C code example to show characters and their respective ASCII integer values:
#include <stdio.h> // Include standard input/output library
int main() { // Main function
int ch; // Declare an integer variable to hold character values
printf("ASCII Chart (partial):\n");
printf("Value\tCharacter\n"); // Print header for clarity
printf("-----------------\n");
// Loop through ASCII values, for example, 32 (space) to 126 (tilde)
// These ranges are based on standard ASCII printable characters
// Sources also mention looping up to 255 to show the full set
for (ch = 32; ch <= 126; ch++) { // Loop from ASCII 32 to 126
printf("%d\t%c\n", ch, ch); // Print decimal value (%d) and character (%c)
}
return 0; // Indicate successful execution
}
The for loop in this program iterates over the integer numbers 32–126. The %d and %c format specifiers are used to print the numeric value and the corresponding character, respectively, inside the loop. Since C internally treats characters as tiny integers, this loop essentially displays the ASCII character for every integer value inside the given range.