In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.
A string is generally understood as a data type and is often implemented as an array of bytes (or words) that stores a sequence of elements, typically characters, using some character encoding.
A string may also denote more general arrays or other sequence (or list) data types and structures.
Declaration
char message[81]="Good Morning!";
Here, "Good Morning!" is a string literal, and message is a string variable.
Null-terminated
NULL ('\0') represents end-of-string symbolic constant.
An example of a null-terminated string stored in a 10-byte buffer, along with its ASCII (or more modern UTF-8) representation as 8-bit hexadecimal numbers is:
F |
R |
A |
N |
K |
NUL | k |
e |
f |
w |
4616 | 5216 | 4116 | 4E16 | 4B16 | 0016 | 6B16 | 6516 | 6616 | 7716 |
The length of the string in the above example, "FRANK", is 5 characters, but it occupies 6 bytes. Characters after the terminator do not form part of the representation; they may be either part of other data or just garbage.
Basic operations
Find out the length a string
Copy a string to another string variable.
Compare between two strings
Concatenate two strings
String inserting/deleting
String reversing
String literals may not contain embedded newlines; this proscription somewhat simplifies parsing of the language. To include a newline in a string, the backslash escape \n may be used, as below.
Backslash escapes
If you wish to include a double quote inside the string, that can be done by escaping it with a backslash (\), for example, "This string contains \"double quotes\".". To insert a literal backslash, one must double it, e.g. "A backslash looks like this: \\".
Backslashes may be used to enter control characters, etc., into a string:
Escape | Meaning |
---|---|
\\ | Literal backslash |
\" | Double quote |
\' | Single quote |
\n | Newline (line feed) |
\r | Carriage return |
\b | Backspace |
\t | Horizontal tab |
\f | Form feed |
\a | Alert (bell) |
\v | Vertical tab |
\? | Question mark (used to escape trigraphs) |
%% | Percentage mark, printf format strings only (Note \% is non standard and is not always recognised) |
\ooo | Character with octal value ooo |
\xhh | Character with hexadecimal value hh |
Character constants
Individual character constants are single-quoted, e.g. 'A', and have type int (in C++, char). The difference is that "A" represents a null-terminated array of two characters, 'A' and '\0', whereas 'A' directly represents the character value (65 if ASCII is used). The same backslash-escapes are supported as for strings, except that (of course) " can validly be used as a character without being escaped, whereas ' must now be escaped.
A character constant cannot be empty (i.e. '' is invalid syntax), although a string may be (it still has the null terminating character). Multi-character constants (e.g. 'xy') are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into an int is not specified, portable use of multi-character constants is difficult.