This lab shows some ways to perform monitoring tasks, using kernel modules, scheduler notifications, and file system operations to communicate information between user-space and kernel-space.
- Use a kernel module to dynamically add code to a running kernel.
- Implement callbacks for scheduler notifications, in order to determine how and when a user program is preempted by another task.
- Create a kernel-level monitoring framework that leverages scheduler notifications and linked lists to maintain a detailed log of all of the preemptions experienced by a process.
- Implement device file I/O callbacks in order to communicate kernel-level information to user-space.
- Evaluate the operation of your monitoring framework through experimentation with user-level programs and different kernel configuration options.
- MMU
- OF
- ARM_VIRT_EXT
- ARM_ARCH_TIMER
- KVM
- VIRTUALIZATION
- ARM_LPAE
- VIRTUALIZATION does not depend on any other options.
- ARM_LPAE depends on CPU_32v7, which is already enabled.
- CONFIG_VIRTUALIZATION=y, CONFIG_ARM_LPAE=y, CONFIG_KVM=y in the .config file
- tracker = kmalloc(sizeof(struct preemption_tracker), GFP_KERNEL);
- preempt_notifier_init(&tracker->notifier, ¬ifier_ops);
- preempt_notifier_unregister(&tracker->notifier);
- preempt_notifier_register(&tracker->notifier);
- preempt_notifier_unregister(&tracker->notifier);
- All the error message and the return error numbers
- [ 1361.314113] process 1322 (monitor) closed /dev/sched_monitor
- unsigned long long on_core_time - time it runs before getting preempted
- unsigned long long wait_time - time it waits before running again
- unsigned long long schedin_time - time it gets scheduled in
- unsigned long long schedout_time - time it gets scheduled out
- unsigned int on_which_core - which CPU is it running on
- char* name - name of the task that preempts the process
When we open the schedule monitor, we register the tracker as the head of the list of the entries. Then as the process gets preempted, in the monitor_sched_out function, we create a new entry for the process, save the current time in schedout_time, record the name of the task that preempted the process, and instantiate the on_core_time and wait_time as follows:
- on_core_time = current time - previous preempted entry's scheduled in time
- wait_time = previous preempted entry's scheduled in time - previous preempted entry's scheduled out time
[out in] [out in]
|******| |**************| |*************|
|-------------------------------------------------------------------------------------|
Treating a complete bracket as one node, we see that the wait time of the current node is previous entry's scheduled in time minus the previous entry's scheduled out time. The runtime of the current node is current preempted time minus the previous entry's scheduled in time. We append the node to the list. Then in the monitor_sched_in function as the process gets scheduled back in, we record the CPU number that it runs on in and record the scheduled in time.
[ 1786.502343] sched_out for process monitor
[ 1786.502351] sched_out:: core:9835040 bwn:163488 cpu:0 comm:in:imklog
[ 1786.502541] sched_in for process monitor
[ 1786.502549] sched_in:: core:9835040 btw:163488 pmt:1520642634110050401 cpu:3 comm:in:imklog
[ 1786.511112] length::86
[ 1786.511118] Deleting node while writing:: core:9835040 btw:163488 cpu:3 comm:in:imklog
[ 1786.511126] copied::0
[ 1786.511288] length::86
[ 1786.511296] Deleting node while writing:: core:9767019 btw:231457 cpu:3 comm:in:imklog
[ 1786.511303] copied::0
[ 1786.511360] Deleting node : 9958267 44115 3 rcu_preempt
[ 1786.511366] Deleting node : 9941132 61354 3 in:imklog
[ 1786.511353] Process 1379 (monitor) closed /dev/sched_monitor
pi@anniechaehonglee:~/Desktop/userspace $ sudo ./monitor 1
on_core_time:9793342
wait_time:213177
which_cpu:3
name:rs:main Q:Reg
Monitor ran to completion!
When running with time command, without any other program running, the dense_mm program exists in ~30 seconds
Program summary:
avg run time: 132128510ns
avg wait time: 172198ns
num of migrations: 5
total preemptions: 213
==> This adds up to 28.18s
Program summary:
avg run time: 27301267ns
avg wait time: 223898ns
num of migrations: 1
total preemptions: 1071
Program summary:
avg run time: 40132231ns
avg wait time: 239317ns
num of migrations: 29
total preemptions: 1131
Program summary:
avg run time: 61237747ns
avg wait time: 256241ns
num of migrations: 10
total preemptions: 640
When dense_mm runs with other programs running, the process gets preempted more often, migration happens more, average runtime becomes smaller, and average wait time becomes bigger. When all four cores are reserved, the number of preemptions and the number of migrations increase. However, when priority is set so that the matrix program has higher priority, the program migrates less, gets preempted less (still more than when there are no programs running), and gets more average runtime. When there are programs running, the average wait time stays within a similar range.
rcu_preempt happens around the first and end of the list. When the matrix size is below 200, the task doesn't appear. When it is above 300, the task appears. It seems like for a process to be preempted by this task the wait time is around 200000ns. We think that this task preempts a process once it starts waiting for too long.
Program summary:
avg run time: 75822688ns
avg wait time: 86479ns
num of migrations: 4
total preemptions: 378
Program summary:
avg run time: 6607306ns
avg wait time: 157224ns
num of migrations: 9
total preemptions: 4286
Program summary:
avg run time: 18925892ns
avg wait time: 181875ns
num of migrations: 7
total preemptions: 567
Program summary:
avg run time: 174382447ns
avg wait time: 147339ns
num of migrations: 14
total preemptions: 506
When the matrix program is run on a 1000HZ-configured Linux machine, the average runtime and the wait time is significantly smaller than that of the program run on a 100HZ-configured machine. This is because a process gets preempted sooner on higher hertz kernel, so the runtime and wait time decreases.