When 4 + 1 Equals 8: An Advanced Take On Pointers In C

In our first part on pointers, we covered the basics and common pitfalls of pointers in C. If we had to break it down into one sentence, the main principle of pointers is that they are simply data types storing a memory address, and as long as we make sure that we have enough memory allocated at that address, everything is going to be fine.

In this second part, we are going to continue with some more advanced pointer topics, including pointer arithmetic, pointers with another pointer as underlying data type, and the relationship between arrays and pointers. But first, there is one particular pointer we haven’t talked about yet.

The one proverbial exception to the rule that pointers are just memory addresses is the most (in)famous pointer of all: the NULL pointer. Commonly defined as preprocessor macro (void *) 0, we can assign NULL like any other pointer.


// regular referencing, ptr1 points to address of value
int *ptr1 = &value;
// regular pointer, ptr2 points to address of value as well
int *ptr2 = ptr1;
// uninitialized pointer, ptr3 points to unknown location
int *ptr3;
// NULL pointer, ptr4 points to (void *) 0
int *ptr4 = NULL;

While it looks like NULL is just pointing to address zero, in reality, it is a special indicator to the compiler that the pointer isn’t pointing to any valid data, but is quite literally pointing to nothing. Dereferencing such a pointer will most certainly fail, but it will fail predictably. If we kept the pointer uninitialized, anything could happen when we dereference it, with a segmentation fault being one of the better outcomes.

It is always good practice to initialize otherwise uninitialized pointers with NULLto let the compiler know, but it helps us too. Checking if (ptr != NULL) lets us easily determine whether a pointer has a valid value yet or not. And since any value other than 0 is evaluated as true in C, we can write it even shorter as if (ptr).

Pointer Arithmetic

Other than NULL, the concept remains that pointers are simply memory addresses — in other words: numbers. And like any other number, we can perform some basic arithmetic operations with them. But we wouldn’t talk about it if there wasn’t more to it, so let’s see for ourselves what happens when we add 1 to a couple of different pointer types.


char *cptr = (char *) 0x1000;
int *iptr = (int *) 0x2000;
struct foo *sptr = (struct foo *) 0x3000;

printf("char 0x%02lx %p %p\n", sizeof(char), cptr, (cptr + 1));
printf("int 0x%02lx %p %p\n", sizeof(int), iptr, (iptr + 1));
printf("struct 0x%02lx %p %p\n", sizeof(struct foo), sptr, (sptr + 1));

We have three different pointer types, and we print each type’s size as a hexadecimal number, its pointer variable’s current address, and the pointer variable’s address incremented by one:


char 0x01 0x1000 0x1001
int 0x04 0x2000 0x2004
struct 0x10 0x3000 0x3010

Unlike regular numbers, adding 1 to a pointer will increment its value (a memory address) by the size of its underlying data type. To simplify the logic behind this, think of pointer arithmetic the same way you think about array indexing. If we declare an array of ten integers int numbers[10], we have a variable that has reserved enough memory to hold ten int values. With int taking up 4 bytes, numbers is 40 bytes in total, with each entry 4 bytes apart. To access the fifth element, we simply write numbers[4] and don’t need to worry about data type sizes or addresses. With pointer arithmetic, we do the exact same thing, except the array index becomes the integer we add to the pointer, (numbers + 4).

Apart from adding integer to a pointer, we can also subtract them, and as long as they’re the same type, we can subtract a pointer from another pointer. In the latter case, the result will be the number of elements of the pointer’s underlying data type that fully fit in the memory area between the two pointers.


int *iptr1 = 0x1000;
int *iptr2 = 0x1008;
printf("%ld\n", (iptr2 - iptr1));
printf("%ld\n", sizeof(iptr2 - iptr1));

Since an int was four bytes, we can fully fit two of them in the 8 bytes offset, therefore the subtraction will output 2. Note that the sizeof operator is one exception that doesn’t follow pointer arithmetic rules, but only deals in bytes. As a result, the second output will show the full 8 bytes of the offset.

That’s pretty much all there is to know about the basics of pointer arithmetic. Trying anything other than addition with an integer, or subtraction with either an integer or another pointer of the same type will result in a compiler error.

Pointer Cast and Arithmetic

The beauty of pointers is that we can cast them to any other pointer type, and if we do so during an arithmetic operation, we add plenty of flexibility in comparison to array indexing. Let’s see how the rules apply if we cast an int * to a char * and add 3 to it.


int value = 123;
int *iptr = &value;
char *cptr1 = (char *) (iptr + 3);
char *cptr2 = (char *) iptr + 3;
printf("iptr %p\ncptr1 %p\ncptr2 %p\n", iptr, cptr1, cptr2);

For simplicity, let’s pretend value is located at address 0x1000, so we will get the following output:


iptr 0x1000
cptr1 0x100c
cptr2 0x1003

We can see a clear difference between those two additions, which is caused by C’s operator precedence. When we assign cptr1, iptr is still an int * at the time of the addition, resulting in an address offset to fit three ints, i.e. 12 bytes. But when we assign cptr2, we don’t use parentheses, and the operator precedence leads to a higher priority for the cast operation. By the time the addition is performed, iptr is already a char *, resulting in a three byte offset.

Keep in mind that we don’t have any allocated memory beyond value‘s size, so we shouldn’t dereference cptr1. Dereferencing cptr2 on the other hand will be fine, and will essentially extract the fourth byte of value. If for some reason you wanted to extract whatever resides 11 bytes into a struct array’s third element and turn it into a float, *((float *) ((char *) (struct_array + 2) + 11)) will get you there.

Incrementing While Dereferencing

Another typical thing we do with pointers is dereference them. But what happens if we increment and dereference a pointer in the same expression? Once again, it’s mostly a question of operator precedence and how generous we are with parentheses. Taking both prefix and postfix increment into account, we end up with four different options:


char buf[MUCH_BYTES];
char *ptr = buf;

// increment ptr and dereference its (now incremented) value
char c1 = *++ptr; // ptr = ptr + 1; c1 = *ptr;
// dereference ptr and increment the dereferenced value
char c2 = ++*ptr; // *ptr = *ptr + 1; c2 = *ptr;
// dereference current ptr value and increment ptr afterwards
char c3 = *ptr++; // c3 = *ptr; ptr + ptr + 1;
// dereference current ptr value and increment the dereferences value - now we need parentheses
char c4 = (*ptr)++; // c4 = *ptr; *ptr = *ptr + 1;

If you’re not fully sure about the operator precedence, or don’t want to wonder about it every time you read your code, you can always add parentheses and avoid ambiguity — or enforce the execution order as we did in the fourth line. If you want to sneak subtle bugs into a codebase, leaving out the parentheses and testing the reader’s attention to operator precedence is a good bet.

A common use case for incrementing while dereferencing is iterating over a “string”. C doesn’t really know the concept of an actual string data type, but works around it by using a null-terminated char array as alternative. Null-terminated means that the array’s last element is one additional NUL character to indicate the end of the string. NUL, not to be confused with the NULL pointer, is simply ASCII character 0x00 or '\0'. As a consequence, a string of length n requires an array of size n + 1 bytes.

So if we looked through a string and find the NUL, we know we reached its end. And since C evaluates any value that’s 0 as false, we can implement a function that returns the length of a given string with a simple loop:


int strlen(char *string) {
int count = 0;
while (*string++) {
count++;
}
return count;
}

With every loop iteration, we dereference string‘s current memory location to check if its value is NUL, and increment string itself afterwards, i.e. move the pointer to the next char‘s address. For as long as dereferencing yields a character with a value other than zero, we increment count and return it at the end.

As a side note, the string manipulation happens and stays inside that function. C always uses call by value when passing parameters to a function, so calling strlen(ptr) will create a copy of ptr when passing it to the function. The address it references is therefore still the same, but the original pointer remains unchanged.

Pointers and Arrays

Coming back to arrays, we’ve seen earlier how pointer arithmetic and array indexing are closely related and how buf[n] is identical to *(buf + n). The reason that both expressions are identical is that in C, an array decays internally into a pointer to its first element, &array[0]. So whenever we pass an array to a function, we really just pass a pointer of the array’s type, which means the following two function declarations will be identical:


void func1(char buf[]);
void func2(char *buf);

However, once an array decays into a pointer, its size information is gone. Calling sizeof(buf) inside either of those two functions will return the size of a char * and not the array size. A common solution is to pass the array size as additional parameter to the function, or have a dedicated delimiter specified like char[] strings.

Multi-dimensional Arrays and Pointers

Note that the array-to-pointer decay happens only once to the outermost dimension of the array. char buf[] decays to char *buf, and char buf[][] decays to char *buf[], but not char **buf. However, if we have an array to pointers declared in the first place, char *buf[], then it will decay into char **buf. As example, we can declare C’s main() function with either char *argv[] or char **argv parameter, there is no difference and it’s mainly a matter of taste which one to choose.

Note that all this applies only to already declared arrays. Once an array is declared, pointers give us an alternative way to access them, but we cannot replace the array declaration itself with a simple pointer because the array declaration also reserves memory.

Pointers to Pointers

As we have well established, pointers can point to any kind of data type, which includes other pointer types. When we declare char **ptr, we declare nothing but a pointer whose underlying data type is just another pointer, instead of a regular data type. As a result, dereferencing such a double pointer will give us a char * value, and dereferencing it twice will get us to the actual char.

The other way around, &ptr gives us the pointer’s address, just like with any other pointer, except the address will be of type char ***, and on and on it goes. As stated earlier, C uses call by value when passing parameters to a function, but adding an extra layer of pointers can be used to simulate call by reference.

Double Pointer Memory Arrangements

Return to main()‘s argv parameter, which we use to retrieve the command line arguments we pass to the executable itself. In memory, those arguments are stored one by one as null-terminated char arrays, along with an additional array of char * values storing the address to each of those char arrays. To illustrate this, let’s print each and every address we can associate with argv.


int main(int argc, char **argv) {
int i;

for (i = 0; i < argc; i++) {
printf("&argv[%d] %p with argv[%d] at %p len %ld '%s'\n",
i, &argv[i], i, argv[i], strlen(argv[i]), argv[i]);
}
// print once more to see what is stored after the arguments -- better not dereference it
printf("&argv[%d] %p with argv[%d] at %p\n", i, &argv[i], i, argv[i]);

return 0;
}

Along with argv, we get argc passed to main(), which tells us the number of entries in argv. And as a reminder about array decay, argv[i] is equal to &argv[i][0].

Simplifying the addresses, the output will look like this:


$ ./argv some arguments
&argv[0] 0x1c38 with argv[0] at 0x2461 len 6 './argv'
&argv[1] 0x1c40 with argv[1] at 0x2468 len 4 'some'
&argv[2] 0x1c48 with argv[2] at 0x246d len 9 'arguments'
&argv[3] 0x1c50 with argv[3] at (nil)
$

We can see that argv itself is located at address 0x1c38, pointing to the argument strings, which are stored one after another starting from address 0x2461. Since incrementing a pointer is always relative to the size of its underlying data type, incrementing argv adds the size of a pointer to the memory offset, here 8 bytes.

Another thing we can see is a NULL pointer at the very end of argv. This follows the same principle as the null-termination of strings, indicating the end of the array. That means we don’t necessarily need the argument counter parameter argc to iterate through the command line arguments, we could also just loop through argv until we find the NULL pointer.

Let’s see how this looks in practice by rewriting our previous example accordingly. To leave argv itself unaffected, we copy it to another char ** variable.


int main(int argc, char **argv) {
int i;
char **ptr = argv;

for (i = 0; *ptr; i++, ptr++) {
printf("&argv[%d] %p with argv[%d] at %p len %ld '%s'\n",
i, ptr, i, *ptr, strlen(*ptr), *ptr);
}
printf("&argv[%d] %p with argv[%d] at %p\n", i, ptr, i, *ptr);
return 0;
}

Whether we access argv via array indexing or pointer arithmetic, the output will be identical.

To Be Continued

To summarize our second part on pointers in C: pointer arithmetic happens always relative to the underlying data type, operator precedence needs to be considered or tackled with parentheses, and pointers can point to other pointers and other pointers as deep as we want.

In the next and final part, we are going to have a look at possibly the most exciting and most confusing of pointers: the function pointer.



from Hackaday https://ift.tt/2EZVRET
via IFTTT