C Tokens
Tokens are the fundamental components that the compiler recognises in C programming. The smallest unit of measurement that the compiler can comprehend is a token. One could consider them the language’s fundamental syntactic or vocabulary units. A token in a C program is a section of the code that is not further broken down into component parts by the compiler. A token is any term that is utilised in a C program. A set of characters that are separated from other groups of characters by one or more white spaces is another way to define a token.
The program’s characters are initially gathered into tokens by the compiler. These tokens, together with the grammar of the language, are used to build C programs. The compiler then confirms that these tokens can be joined into valid strings in accordance with the grammar of the language.
Tokens in ANSI C fall into six different categories:
Keywords: Programming language-reserved words with particular, set meanings. You can’t use keywords as variable names. In standard C, there are 32 keywords. Examples include char, const, return, void, if, switch, case, for, while, auto, int, float, and so on.
Identifers: Names for many program components, including variables, arrays, functions, constants, and unions. Programmers construct identifiers to give program objects distinctive names. They are made up of the underscore character (_), a series of letters, and numbers. An underscore or letter must appear at the start of an identifier. Uppercase and lowercase letters in identifiers are handled differently since C is case-sensitive. For instance, Madam and Madam are regarded as distinct identifiers.
In order to improve readability and documentation, identifiers should preferably be meaningful. The C system is already familiar with the names of standard library functions, such as printf and scanf, and programmers usually don’t redefine them. Because it is the starting point for C programs, the identifier main is unique.
Constants: Tokens whose values are set and remain constant while the program is running. Integer, floating-point, character, and string constants are among the several kinds of constants that C offers. Numerical constants, which include both integer and floating-point constants, are used to represent numbers. Single quotes encapsulate single characters, which are known as character constants. An arbitrary string of letters, including white space, enclosed in double quotations is called a string constant.
String Constants: Character sequences that are surrounded by double quotations. Although they are saved as character arrays, the compiler gathers them into a single token.
Operators: Symbols that tell the machine how to process data in a logical or mathematical way. Here are several examples: +, -, *, /, =, >, and so on. Depending on the situation, certain symbols might have several interpretations.
Punctuators: Language elements can be grouped or separated using symbols like as semicolons, commas, brackets, and braces. Usually, a function is indicated by the brackets that follow a name.
In order to help the compiler differentiate between various tokens, operators, punctuators, and white space (such as blanks, newlines, and tabs) all work together as separators. White space is usually ignored by the compiler, with the exception of token separation.
The line int sum = number1 + number2;: is an example.
- “int” is a keyword.
- The programmer is responsible for selecting the identifiers sum, number1, and number2.
- + and = are operators.
- ; is a punctuation.
- The tokens’ blank spaces serve as separators.
Understanding tokens is essential to understanding how source code is parsed and processed by the C compiler.