WEBVTT 00:00.000 --> 00:10.000 Okay. Thank you. Yes. I'll talk about linear verification of Linux kernel code. So I'll start 00:10.000 --> 00:17.000 out with what is program verification. Program verification involves the source code and some 00:17.000 --> 00:23.000 specification of what the source code is supposed to do. And then we are goal is to be sure that 00:23.000 --> 00:29.000 the specification describes what the source code actually does. So ideally this tracking 00:29.000 --> 00:35.000 should be done automatically. So here's a simple example. This was presented in the Linux 00:35.000 --> 00:41.000 loop conference a couple years ago. Here we have the swap function extremely simple function. 00:41.000 --> 00:45.000 So what we would like to know about this function. So we might need to think about what the 00:45.000 --> 00:49.000 function does a little bit. It takes two arguments. Those two arguments are pointers. And the 00:49.000 --> 00:53.000 ideas we want to swap the values that are referenced by those two pointers. So clearly we have to 00:53.000 --> 00:58.000 be able to do reference those pointers. And we have to be able to write new values back into 00:58.000 --> 01:04.000 them. So then those are our requirements on the inputs. And the goal for the function 01:04.000 --> 01:10.000 should actually do is that the values at those two places should be interchanged. So then we 01:10.000 --> 01:14.000 want to consider it. So that's kind of a high level English language specification of what 01:14.000 --> 01:18.000 should happen. What we would like to have is some kind of formal representation that we can 01:18.000 --> 01:25.000 drop into some tool that will be checked. So the first property is that the two pointers 01:25.000 --> 01:31.000 are readable and rightable. So we use the notation of pharmacy, which is a tool for 01:31.000 --> 01:38.000 checking properties of C code. So basically we need to require that two input parameters 01:38.000 --> 01:43.000 the pointers are valid, meaning that we can be referenced them. And we can also write 01:43.000 --> 01:50.000 to the memory that is referenced by those pointers. And the next one is that we want that 01:50.000 --> 01:58.000 the final value of P is pointed to by P should be the original value of Q. So we can express 01:58.000 --> 02:05.000 that by saying ensures that means a requirement on the output that the final difference 02:05.000 --> 02:12.000 of the pointer P is equivalent to the original value that was the reference by Q. And 02:12.000 --> 02:19.000 similarly we want the final Q value to be the same as the P value. So the final value 02:19.000 --> 02:25.000 this D reference by Q should be the original value of this D reference by P. So so far things 02:25.000 --> 02:32.000 seem fairly simple. Now we put all this into a comment that is before the source code. We get 02:32.000 --> 02:38.000 a single file and now we can drop this into the pharmacy tool. And in a few seconds it tells 02:38.000 --> 02:43.000 that all the goals were proved. And so now we have confidence that our swap function is doing 02:43.000 --> 02:48.000 what it's supposed to or at least what the specifications that we wrote said it is supposed 02:48.000 --> 02:53.000 to do. So the question is, I mean swap functions very nice, but pretty simple we probably 02:53.000 --> 02:58.000 have some confidence that works. So we want to, the question we want to ask is could we do 02:58.000 --> 03:05.000 this for the Linux kernel? So this kind of an ongoing discussion about like requirements 03:05.000 --> 03:12.000 and documentation and how can we improve somehow the correctness and understandability 03:12.000 --> 03:20.000 of Linux kernel source code. So this sits somewhere in that space. Probably maybe the 03:20.000 --> 03:24.000 biggest question we ask is like, do we want to do this for the entire Linux kernel, little 03:24.000 --> 03:29.000 parts of it, very important parts of it parts that we don't understand that's all very 03:29.000 --> 03:36.000 open questions. So the positive aspects of writing these specifications in checking these specifications 03:36.000 --> 03:41.000 just thinking about the specification should be can lead you to find bugs. So later on in this 03:41.000 --> 03:46.000 talk I will describe some experiments that we've done and in several of these experiments 03:46.000 --> 03:50.000 just in doing writing the specifications in checking then we have actually found some bugs that 03:50.000 --> 03:56.000 have been submitted to Linux kernel and accepted or maybe found some potential bugs that were 03:57.000 --> 04:03.000 an old code that doesn't exist anymore and so on. So writing specifications means you 04:03.000 --> 04:08.000 have to somehow think pervasively thoroughly about what your code does. It's kind of a different 04:08.000 --> 04:12.000 point of view than when you're just trying to run it and so on but do some test or something 04:12.000 --> 04:18.000 and so it can highlight some issues. The specifications also provide an ambiguous notation 04:18.000 --> 04:22.000 for what the function is supposed to do. If you write out something in English you may have 04:23.000 --> 04:27.000 some, the person who writes them has their intended meaning of the English words, the person who 04:27.000 --> 04:31.000 reads them might understand something different. There's lots of things you can write 04:31.000 --> 04:36.000 in English, there's somehow like friendly English language that are kind of open to 04:36.000 --> 04:41.000 misinterpretation afterwards. So if you write out things in a more formal way there's perhaps 04:41.000 --> 04:46.000 less confusion and the specifications if they're written out in a formal way can be checked 04:46.000 --> 04:51.000 mechanically against the source code and that can happen also over time if you make 04:51.000 --> 04:56.000 some changes to your function, maybe you just make some refactoring to your function, you 04:56.000 --> 05:00.000 expect it doesn't change anything at all so then your specifications might still continue 05:00.000 --> 05:05.000 to work or you might make some small changes and then maybe your specifications will need 05:05.000 --> 05:10.000 adjustment but you still have some kind of control, some kind of insurance that what the 05:10.000 --> 05:15.000 function was doing before is still what it's going to be doing. On the other hand, on the 05:15.000 --> 05:20.000 side creating these specifications is hard, we're not going to create any more specifications 05:20.000 --> 05:25.000 but you'll see them and they're quite big and complex and then the question is I 05:25.000 --> 05:30.000 mean I just argued that maybe this whole specification idea is useful as the code changes 05:30.000 --> 05:35.000 but still in general can we really hope to maintain these things over time, developers are 05:35.000 --> 05:39.000 already more interested in improving their code, they don't really want to improve this 05:39.000 --> 05:44.000 kind of mysterious formal language and of course there's no magic bullet if you're a 05:44.000 --> 05:49.000 specifications say the wrong thing then you'll be checking that your code does the wrong thing 05:49.000 --> 05:55.000 you won't really have advanced things very much. There's in practice I think some kind of 05:55.000 --> 06:01.000 check on that in that if you write bad specifications it won't match your code and then you'll 06:01.000 --> 06:07.000 kind of be forced to iterate until when they match up you might be closer to reality but there's 06:07.000 --> 06:12.000 always the risk that specifications are wrong and you ensure that your code does the wrong thing. 06:12.000 --> 06:17.000 So so far we have done three experiments one of them is the function should we balance 06:17.000 --> 06:23.000 it's part of the Linux kernel schedule there's load balancer select idle core it's a function 06:23.000 --> 06:28.000 that is used when task wakes up and you want to figure out where to place it on the machine 06:28.000 --> 06:34.000 and F trace event enable the stable this is part of the event tracing infrastructure. 06:34.000 --> 06:39.000 So somebody might argue that maybe load balancing and task scheduling are really core 06:40.000 --> 06:47.000 aspects of operating system kernel maybe the code for enabling this enabling tracing is not really all that 06:47.000 --> 06:54.000 important but actually the third case we picked it because some human actually wrote specifications 06:54.000 --> 06:59.000 in natural language and so it was the opportunity to compare them. 06:59.000 --> 07:05.000 So start with should we balance load balancing the idea of load balancing is we have perhaps 07:05.000 --> 07:10.000 we have some CPU it would like to take on some more work maybe because it's idle maybe 07:10.000 --> 07:15.000 because it just has not enough stuff to do and what easy to do is search across the entire 07:15.000 --> 07:20.000 machine or in general across some set of CPUs to see if there's some other work that's available that 07:20.000 --> 07:27.000 it can steal. So this is beneficial because you'll get more work done more fast but it takes time 07:28.000 --> 07:34.000 because it takes time because we have to look at all those different CPUs look at their data structures 07:34.000 --> 07:42.000 maybe count some things and so on. And so what we want to do is just let the first CPU do the 07:42.000 --> 07:49.000 stealing and not all of them at the same time. So should we balance this sort of a gatekeeper 07:49.000 --> 07:56.000 that each CPU says should I balance should we balance and then sometimes it will turn true 07:56.000 --> 08:03.000 yes you'll get to do load balancing sometimes it doesn't. Here is the a recent version of the 08:03.000 --> 08:09.000 function it's not a huge function it's not a small function it's just what it is. 08:09.000 --> 08:15.000 The most interesting part of it is this loop in the middle. This loop is quite interesting for 08:15.000 --> 08:21.000 reasons it has this for each CPU and thing so that's like a special purpose it's like a 08:21.000 --> 08:26.000 for loop but actually it's iterating over a mask of CPU so you can think of it as sort 08:26.000 --> 08:31.000 of a set of CPUs but it's actually implemented as a bit mask and it's going to go through 08:31.000 --> 08:38.000 the mask and see which CPUs are under consideration and then do the loop body only for those 08:38.000 --> 08:44.000 CPUs. So that's fairly easy to understand then the more frightening thing about this loop is 08:44.000 --> 08:53.000 here in the middle we have the CPU and mask and mask and not and this is a side effecting 08:53.000 --> 09:00.000 operation what basically what this is doing is removing some CPUs from the set that we're 09:00.000 --> 09:07.000 iterating over. So this makes it harder to it reason about what the thing is doing because it's not just going 09:07.000 --> 09:16.000 from zero to 10 it's like going over a changing set of things. So here are the specifications 09:16.000 --> 09:21.000 this is much more intimidating than the ones for that we had for the swap function. 09:21.000 --> 09:26.000 Basically we have a bunch of issues that we want to consider for example is there any 09:26.000 --> 09:33.000 idle CPU are we the first idle CPU are there no idle CPUs are there some other things like 09:33.000 --> 09:37.000 maybe the CPU's actually been hot unplugged and so we obviously don't want to still work for that. 09:37.000 --> 09:43.000 So there's various issues that are taken into account and then we have the specifications that 09:43.000 --> 09:49.000 should be respected in each of those cases. So these are what are called the pre-imposed conditions 09:49.000 --> 09:54.000 under what circumstances the function could should be called and what behavior should 09:54.000 --> 10:00.000 happen if we call it in those cases. We also if you're familiar with verification at all you 10:00.000 --> 10:06.000 know you have to write loop invariants a loop invariant is dealing with the issue that we actually 10:06.000 --> 10:11.000 this is all happening before one time so we have no idea how many times the loop is going to iterate. 10:11.000 --> 10:19.000 So we need some human to provide information about some property that is going to hold no matter how many times the loop occurs. 10:19.000 --> 10:26.000 So this is a whole other pile of loop invariants that we need to take into account. 10:27.000 --> 10:34.000 So what we are interested in this particular experiment is how to think how does the code change over time and what are the impact on the specifications. 10:34.000 --> 10:42.000 And actually so this function has changed basically ten times and the last ten years that it has existed. 10:42.000 --> 10:49.000 So the beginning part of this experiment was very positive. 10:50.000 --> 10:55.000 The changes were things like the first ones to fix a bug by changing not equal to equal. 10:55.000 --> 11:04.000 Then there were some refacturings, some changing names, some realization that this function is not necessary so we can use some other function instead. 11:04.000 --> 11:14.000 And all of those changes were very simple and either they had no impact on the specifications or they had only a very minor impact on the specifications of few lines had to change. 11:14.000 --> 11:21.000 And you can see that actually the verification time is also reasonably fast and doesn't change very much. 11:21.000 --> 11:31.000 But then here in the last in 2023 the things kind of go more like this. 11:31.000 --> 11:41.000 This is where they had someone had the idea of adding in this optimization of throwing out CPUs that we don't actually have to look at again. 11:41.000 --> 11:52.000 So then that makes the whole analysis of the iterations more complicated because we have to take into account that things have disappeared along the way. 11:52.000 --> 12:03.000 The first change was to actually this change here is preferring before it was just like is the CPU itself idle at this point in number eight. 12:03.000 --> 12:10.000 We don't actually change the set but we have an extra loop in the middle which is going to check all the hyper threads. 12:10.000 --> 12:19.000 So it's not very desirable if you have a CPU but it's hyper thread is busy that's not really the ideal place to put new work because this course actually doing work quality. 12:19.000 --> 12:29.000 So we prefer fully idle course so that adds an extra loop in the middle and that adds about 10 seconds 12 seconds to the verification time. 12:29.000 --> 12:40.000 And then here at this point we add the optimization that if we have checked all the hyper threads we and we find out what the one of them is not idle we don't need to check them again. 12:40.000 --> 12:53.000 So this this causes a increase in the verification time it causes a lot of added complexity in the specifications it took us a long time to figure out what the specification should actually be. 12:53.000 --> 13:07.000 But you notice that these cases here are in red and that this case here is actually in a release to releases a Linux kernel and what that means is actually the person adding this optimization introduced the bug. 13:07.000 --> 13:21.000 So it's there was some benefit to sitting there and thinking about these specifications and thinking about what the function is really doing because we actually found a bug that was in a release. 13:21.000 --> 13:36.000 Another issue we have here is that these are the specifications this are the proving times that we have now when we originally did this work we had proving times there were much much higher especially for the complicated cases. 13:36.000 --> 13:46.000 And then we invited someone who works on pharmacy to come and help us to do it better so choose we found that choosing the right representations of certain things is very important. 13:46.000 --> 13:58.000 It's kind of an impracticality if you have to actually bring in a person who maintains the tool to figure out how to do it maybe there can be some kind of. 13:58.000 --> 14:03.000 Some kind of libraries of good representations so on. 14:03.000 --> 14:13.000 So the next one case I will go for much more quickly this is select idle core this is the function that places pass on idle core here we have again the same looping property. 14:13.000 --> 14:22.000 We want to iterate over a bunch of CPUs and we have again the property of updating the set of CPUs that we are working on at the same time. 14:22.000 --> 14:35.000 This is a looping function iterating function this more complex than the previous one the previous one started is zero and went to the end this one starts at the middle goes to the end and wraps around and works on the beginning. 14:35.000 --> 14:43.000 So you have whole extra set of specifications saying we're up here in this region we're up here in this region or we reach the end. 14:43.000 --> 14:46.000 So it just kind of adds more work. 14:46.000 --> 14:52.000 Here what we found was very helpful was to introduce some macros that what hide some complexity. 14:52.000 --> 15:02.000 These are, if I'm citing one of them is figuring out exactly what are the CPUs we can consider one of them is just hiding this for all this quantifier stuff. 15:02.000 --> 15:16.000 You do this stuff over and over again it's not very intellectually it doesn't contribute much but it takes up a lot of space and it makes it very hard to figure out what is going wrong when things go wrong. 15:16.000 --> 15:38.000 So the last function we consider is this F-trace one this code is much more complex but you can consider maybe you know maybe it should be easy because actually if you look at this carefully what we see are lots of ifs which very different kinds of if but there's actually no loops so we have no loop and variance to figure out. 15:38.000 --> 16:02.000 So that should be good on the other hand these ifs all lead to assignments of different variables things data structures and so on and then there's a very scary place here where we have if and it depends this if here depends on variables that we have updated along the way which makes proving specifying everything much more complicated. 16:02.000 --> 16:25.000 But then the question is I'm this is a very complex function how what kind of specifications do we want so the first attempt was just to thread through every single path through the function figure out what what things it tests and figure out what things it updates so we get as this is just a small extract of the specification this is just one particular case and then we ended up with. 16:25.000 --> 16:36.000 More than 350 lines of specification this function is doing enabling the atracing and disabling this only for the disabling part 350 lines is completely unenlightening. 16:36.000 --> 16:45.000 We also discovered that pharmacy is not very good at this bit masks which the original implementation used extensively so this is somehow not very good. 16:45.000 --> 16:55.000 As I mentioned I chose this because some human had written some specifications and so this is what the specification is a little extract of what the specification looks like. 16:55.000 --> 17:00.000 If we have this variable we do this and this and this and so on. 17:00.000 --> 17:14.000 So obviously they I mean this is these are perhaps understandable to humans other users the other developers but there's not checkable it's not complete and it's actually not entirely correct either. 17:15.000 --> 17:26.000 So so but we took those specifications and wrote them out in the formal notation and we end up with something that's much simpler than what I showed previously. 17:26.000 --> 17:35.000 But on the other hand it's I mean it's only this is half of a disabled case it's only 25 lines instead of 250 lines so that's pretty good. 17:35.000 --> 17:50.000 On the other hand it's still not completed it's based on something that's not complete we're not gonna end up with something complete on the other hand it has been checked so we filled in we figured out what was going wrong and added the extra cases so actually in new way it's everything that's important. 17:51.000 --> 18:02.000 On the other hand it's not very satisfying it seems like we just have some random information about some random variables we don't have kind of somehow some kind of complete definition of what's going on. 18:02.000 --> 18:19.000 So in the third case we I thought about like think about each variable and think about all of the things that affect that variable and put them together in a particular place and so somebody wants to understand a particular variable then they have something specific they can use. 18:19.000 --> 18:29.000 And we can also use pharmacy to check that everything what we describe is complete and they're all disjoint from each other so we have some confidence that we're doing some reasonable thing. 18:29.000 --> 18:48.000 And then the last thing I started thinking about is okay all of those things we've seen before describe what the function does but it doesn't describe what is the impact for the users so we also want to have a bunch of function calls and they should be able to interface with each other and maintain some invariance over time. 18:49.000 --> 18:58.000 So in conclusion what did we learn from should we balance that we can easily maintain things over time select at a core we learned that abstraction is important. 18:58.000 --> 19:07.000 And from last one we learned that really specifications need to give insight could users about how they should use the functions what functions they should do and so on. 19:07.000 --> 19:18.000 In the future we would like to do more experiments think about what of this is practical to automate so far everything has been written by hand and what is the trade-off between understandability and maintainability and correctness. 19:19.000 --> 19:21.000 So thank you.