Skip to content

section9_introdunction_to_structs

Fábio Gaspar edited this page Feb 16, 2019 · 3 revisions

Introduction to structures

Motivation

Assume you want to store and manage data describing persons. You are interested in it's age, name, and social identifier. Declaring variables for storing those properties for each person is obviously impractical. You are probably thinking of using arrays! And that is a good initial though. However, you can't store mixed data in an array!

A possible workaround is defining three arrays:

  1. One for storing ages
  2. One for storing names
  3. One for storing the social identifier

Moreover, you can force that a person A is stored in the same index for each array. If person A is at index 2, then you can get the age, name and id from the arrays at index 2. In the example below: ages[2], names[2] and social_id[2].

unsigned int ages[10];
char names[10][50];
unsigned long social_id[10];

While this is a possible solution, is not the most practical, specially as the number of properties to represent the entity grow.

This kind of problems are common, and fortunately C has a proper way to deal with them.

Structures

In C you can declare a structure, which is simply a list of declarations enclosed in braces. As a result, you can group differents types of data in a single variable.

struct Person {
	unsigned int age;
	char name[50];
	unsigned long social_id;
};

In the code above, you can see a structure definition. What that represents is that I am creating a strucure called Person, and every structure Person has an age, name and social_id.

More generically, the structure syntax is:

struct <structure tag> {
	...
};

You use the struct keyword, associate an option name, the structure tag, which is useful for re-using the kind of structure being declared and finally, between braces, you set a list of variables which are called members.

With structure declaration we are defining a new type, something we haven't done before!It's important to understand that we haven't, yet, declared a variable of type struct Person, we are just defining the type, the template or shape of the structure.

Declaring structure variables

Once you define the schema of a structure and provide a structure tag, you can declare variables that follow the structure shape.

For example:

struct Person p;

The variable p follows the Person schema, therefore has members age, name and social_id.

You can also declare structures and immediatelly declare variables after the closing brace. The example below declares the Person structure as previously, but also declares two variables, Maria and Joao.

struct Person {
	unsigned int age;
	char names[50];
	unsigned long social_id;
} Maria, Joao;

Automatic structure

An automatic structure are structure declarations that don't have an structure tag. For instance, you know up front you only need one or three variables following some schema and re-usability is not a concern, then you can ommit the structure tagname and declare the variables ahead.

struct {
	int x;
	int y;
} p1, p2, p3;

In the example, the structure declaration hasn't any tag, therefore I can refer to it. However, three variables, p1, p2 and p3 are declared, and each one has the members x and y.

Structure declaration scope

Typically, when you want to define a new structure type you want that definition to be visible everywhere. In such cases, you might want to declare the structure outside any function or in a header file, which we didn't covered yet.

However, you can declare structures inside functions, but it's definition is only visible inside the function where the declaration is present. This is useful for automatic structures.

int main() {
	
	struct Person {
		unsigned int age;
		char name[50];
		unsigned long social_id;
	} Maria, Joao;
}

void foo() {
	Person p;
}

In this example, the program won't compile because the Person type is not known inside foo function. Therefore, the structure definition should be outside any function to be globally visible.

struct Person {
	unsigned int age;
	char name[50];
	unsigned long social_id;
};

int main() {
	struct Person Maria, Joao;	
}

void foo() {
	struct Person p;
}

Constant initilization

We have seen how to declare structures. In this section we cover how to initialize structures with constant values, pretty much like you saw with arrays.

Considering the automatic structure for storing point coordiantes, you can use constant initialization as follows:

struct {
	int x;
	int y;
} pt = {1,-5};

This is also possible when declaring tagged structures.

struct Person {
	unsigned int age;
	char name[50];
	unsigned long social_id;
}

struct Person p = {20, "Helena", 123467};

Some final notes regarding constant struct initialization:

  • The values order should be accordingly with the structure members declaration
  • You don't have to initialize ALL members. You can initilize the first N elements you want. As of ANSI C or C89 you don't have the flexibility to instantiate members by name. Notice that if you only instantiate the first N members, the remaining ones, if any, are automatically initialized with zero values. These zero values have different pratical results between different data types.
Data type Default value
Integers 0
Floating 0.0
Pointers NULL
Chars NULL

Note: If you don't make use of constant initialization, members aren't initilized with zero values. They have junk.

Accessing members

At this point, you should know the different ways to declare and initialize structures. The next step is to understand how you can read and write to structure members. That is possible with the . operator.

struct {
	int x;
	int y;
} pt;

pt.x = 5;
pt.y = 0;

printf("x: %d \t y: %d\n", pt.x, pt.y);

Nested structures

Inside a structure you can have members of any primitive type, including structures you define.

Let's say we want to represent a rectangle with two point coordinates. The points belong to opposite extremes, thus that's enough to know the height, width and position in the cartesian graph. We can re-use the coordinate structures shown previously.

struct Point {
	int x, y;
};

struct Rectangle {
	struct Point p1, p2;
};

struct Rectangle rect;
rect.p1.x = 0;
rect.p1.y = 0;
rect.p2.x = 5;
rect.p2.y = 5;

The rect is of type Rectangle, which has two members: p1 and p2. Both are of type Point. Which has also two members: the integers x and y.

You can access the points individual coordinates by using the . operator sequentially. You take the rect variable and access the point p1 or p2: rect.p1. For instance, you are accessing a structure of type Point which has members. As a result, rect.p1.x.

Structure pointers

Just like any other type in C, you can have pointers for structures.

struct Point {
	int x, y;
};

struct Point p1;
struct Point *ptr = &p1;

The variable ptr is a pointer to struct Point. Recall that the * operator dereferentiates the data pointed by the pointer variable, thus in this case *ptr is, for pratical effects, a struct Point structure. Therefore, in order to access the members trough a pointer you can write (*ptr).x. The parentheses are required because the precedence of the structure member operator . is higher then pointer derefentiation *. Writing *ptr.x would mean you are accessing the member x of the structure ptr, and x is a pointer that you are deferentiating.

#include <stdio.h>
#include <stddef.h>

int main() {
	int demo = 10;
	struct {
		int *ptr;
		struct {
			char a;
			int b;
		} hello_darkness_my_old_friend;
	} spaghetti = {NULL, {'A', 123}};

	spaghetti.ptr = &demo;

	printf("%d\n", *spaghetti.ptr);
}
10

Dereferentiating pointers to structures in order to access members is very common. The syntax presented above is not the cleanest. Thankfully, C has an operator to make this task easier, ->. The following statements are equivalent.

struct Point *pt;
(*pt).x;
pt->x;

Both . and -> are associative from left to right. Moreover, alongside these two operators, the () for functions and [] subscripting operators are at the top of precedence hierarchy. The example below is a common mistake. You might think you are incrementing the pt pointer, but instead you are incrementing the x member.

++pt->x;
++(pt->x);

The correct way would be: (++pt)->x.

Legal operations

The following are legal operations with structures:

  • Copying and assigning as a unit
  • Taking the address with & operator
  • Accessing members

Accessing members was covered previously. Taking the address as well. Thus, the final topic to cover is structures copies.

struct Student {
	char name[50];
	char school[100];
	int age;
};

struct Student S1 = {"Beatriz Pinto", "Faculdade Engenharia Universidade do Porto", 21};

In the example a simple structure Student is declared and a variable of that type is initialized.

If you declare a new variable and assign S1 you are creating a copy of S1. That means that when you edit S1 you aren't affecting the new variable. This is not new, it works exactly as for the other cases covered with primitive types, but it's worth mentioning it again, specially if you have experience in other languages where many times assignment expressions are reference copies and not value copies.

struct Student S2 = S1;

printf("Student 1 Name: %s\n", S1.name);
printf("Student 2 Name: %s\n", S2.name);

S1.name = "Francisco";

printf("Student 1 Name: %s\n", S1.name);
printf("Student 2 Name: %s\n", S2.name);

Structures and Functions

Just like any other primitive in C, structures are passed by value in function arguments. That means if you want to modify a structure inside a function passed as an argument you are forced to use pointers.

In fact, structures should almost always be passed by pointer. If they are too big, creating a copy of the structure for the function call might hit the performance. Therefore, you generally see functions expecting pointers to structures and not solely structures as a unit. If you are writing a function that receives structures, through pointers, but you don't perform any modification, it's always a good practice to add the const qualifier.

You can also return structures declared inside a function, a copy of it is created and returned. This means you can have functions designed for instantiation structures, which is handy. Notice that in this case you can't deal with pointers, unless you use dynamic memory allocation. The following example creates a structure inside the function and returns the address. You might think this is a good practice for the sake of performance. Why creating a copy when the structure already exists, right? However, don't forget that once the function returns, all automatic data created inside of it, arguments, and other details not relevant here, is destroyed! As a result, the address you returning will be pointing to junk, and there's a high chance you get a invalid memory access error.

#include <stdio.h>

struct Demo {
	int a, b, c;
};

struct Demo* create_demo(int a, int b, int c) {
	struct Demo d = {a, b, c};
	return &d;
}

int main() {
	struct Demo *d = create_demo(1, 245, -100);
	printf("%d %d %d\n", d->a, d->b, d->c);
}

The example above will result in a segmentation fault, the error you know and love!

Self referential structures