Staring At Empty Pages: Paging and thrashing

When I talked about polling, in computer terms, I said a little about paging. But for those who don't already know all about it, more explanation is in order.

Any computer has a certain amount of main memory, commonly known these days as RAM (Random Access Memory — we used to call it core memory, but that's another story) installed in it. In the earlier days of computers, the amount of main memory that was available was quite small (it was expensive and physically large, and it took a lot of electricity to power it), and programs were limited in that the program and all its active data had to fit into main memory. The most limited computer I ever used had 2000 words available. With constraints like that, we had to be creative.

One way to deal with limited memory was, of course, to go to great effort in programming to keep things small. The smallest piece of memory was critical, and we made our algorithms tight and our data compact. This is, in fact, one of the reasons we used two-digit years, and thus set things up for “the Y2K problem”. You're welcome. Another way was to use overlay techniques: part of the program might only be needed when it started up, and that area of memory could later be overlaid with other parts of the program, or with data.

There were a lot of program bugs that were caused by that, and they were often hard to find.

Virtual memory was developed to address these problems. The concept of virtual memory is that we actually have other, albeit slower, ways of storing data, and we can use them in conjunction with main memory to create the appearance of more memory than we actually have. Any program could do this by itself, of course, simply by keeping its own data on magnetic disk, managing its own overlays, and so on. But taking care of it in the operating system — and having it assisted by hardware features, making it faster and less error-prone — is what really made it workable.

Virtual memory was available long before I knew what a computer was, actually, but it took a long time for it to mature and move into widespread use, so I did have to deal with programming without it, even on mainframe computers. And the first PCs did not have virtual memory features in their hardware. That first came with the Intel 80286 processor, and was fully implemented, with a paging system, in the 80386. Windows 3.0 was the first version of Windows that took advantage of virtual memory.

To understand paging, let's look at pages in a book. Open the nearest book to a page in the middle, and look it over. I'll wait; come back soon.

Welcome back. Here's what you saw: letters made words, which made paragraphs, which made pages. The words and paragraphs were different sizes, but the pages were all the same size, and the book contained many pages. That's sort of how the computer memory is broken down too. For ease of discussion here, what I say here will apply to 32-bit Intel processors. Other processors have used other byte lengths, word sizes, and so on, but the numbers I use here are the most common today.

A bit is the smallest unit — a binary digit — and it's either 0 or 1, off or on, no or yes, false or true. A byte is, due to a somewhat arbitrary decision, eight bits. We can think of a byte as the analogue to a letter in the book, and, indeed, each letter in English is represented by one byte. One bit can have 21 = 2 possible values (0 or 1); one byte can therefore have 28 = 256 possible values (0 through 255). One piece of data can be one or more bytes — a number, say, the item number in a store's inventory, is usually represented by four bytes, while a string of letters, such as the item's description, is represented by one byte per letter (in English character sets). We can think of each data item as a word in the book. Data items are aggregated into data structures or objects, and we can think of those as corresponding to paragraphs in the book.

Computer memory is divided into pages of 4096 (212) bytes, and these fixed-size things correspond to pages in the book. Note that a page in our book can have several paragraphs on it, but one long paragraph could span more than one page. The same is true in the computer version: often, many data objects will fit on a page, but it's entirely common for one data object (or even one data item) to be larger than a page. We'll get back to that point later, with a story.

Put simply, then, paging is the practice of taking memory pages that aren't in active use and writing them to a special file on the hard disk, where the operating system can find them when they're needed again.[1] While that's conceptually easy, it's actually very complex in practice, and a lot of hardware design and programming has gone into making it work efficiently and seamlessly. The computer system has to decide when a page should be paged out, and must make sure it's paged in at the right time. It has to manage the paging subsystem to be sure it knows where to find each page, to avoid paging out pages that will be needed again very soon, and to make sure that empty pages are ready for use when they're needed.

When a particular location in memory is needed, either for a program itself or for its data, we say that the location is referenced. When a program tries to reference a location on a page that's paged out, that's called a page fault. A good paging subsystem tries to minimize page faults because they cause delays while your program waits for the page it needs (retrieving a page from the hard drive is relatively slow, compared with having the page in memory already). It may do that by considering your program's reference patterns, trying to anticipate the pages you'll need soon, and paging them in early, before you need them and while you're waiting for other things (such as network responses).

But the program can also do things to make it easier (or harder) on the paging subsystem. A program that has high locality of reference gets fewer page faults and is easier for an anticipatory paging system to analyze. That is, if the memory that your program references is usually close to the memory that it's referenced recently, things work better all around. Think of it as the difference between reading the book one paragraph at a time, sequentially... and reading it by flitting from one page to another, randomly, reading one paragraph on each page. If you've read paragraph 2 on page 217 and you later want to read paragraph 3 on that page, you can see which way makes it easier to do.

The worst case I've ever seen of bad locality of reference was in a FORTRAN program in the 1980s. I noticed that our mainframe system was running horridly slowly (actually, someone probably complained and I was checking it out), and I noticed that one program's working set was enormous, many thousands of pages. I went to see the user who was running the program, and we had a look. In it, there was something like this:

       INTEGER IARRAY(10000, 10000)
       DO 20 I = 1, 10000
        DO 10 J = 1, 10000
        [...reference IARRAY(I, J)...]
    10 CONTINUE
    20 CONTINUE

The program was accessing a large array of integers by stepping across the rows: item (1, 1), then item (1, 2), then (1, 3), and so on, eventually getting to (1, 10000) and then (2, 1). The trouble is that, while every other commonly used programming language stores its arrays that way (in row-major order), FORTRAN stores them in column-major order: (1, 1), then (2, 1), then (3, 1), then ..., then (10000, 1), then (1, 2). What that meant was that the programmer meant to step through the array one number after another, but he was actually skipping some 10,000 entries each time, and then bouncing back to the beginning after hitting 10,000 of them. It turned out that every reference to the array resulted in a page fault, and his program was causing the paging subsystem to thrash — to page the same pages in and out constantly, resulting in little productive work for the computer system in general.

For the easiest fix, I suggested that the programmer just find every place in his program that referenced IARRAY, and that he swap the subscripts everywhere. So IARRAY(I, J) became IARRAY(J, I), for example. He did that and ran the program again... and we didn't even notice it was there — back to a normal-sized working set of a few dozen pages.

It's possible that modern FORTRAN compilers will warn you if you do something like that. For some reason, though, I doubt it.

[1] For a more lighthearted description of paging, look at “The Paging Game”, an old bit of humour about “the Thing King”, which newbies might not know about. Google will find it for you.

Monday, March 19, 2007

Paging and thrashing

1 comment:

Recent Posts...

Past Posts...

Recent Comments...

Blogs I Read...