Discussion:
Strange behaviour of mmap() in OS X
Alexander Oberdörster
2003-06-21 10:31:01 UTC
Permalink
mmap() seems to allocate and retain memory very aggressively in OS X.
When I mmap() a large file (~200 MB in size) read-only and access it
sequentially from start to end, the RSIZE of my program quickly grows
to the size of the file. If the free physical memory is insufficient to
accomodate the whole file, the system starts swapping heavily: first
other data is paged out (running applications etc), then (after my
application has finished and when the paged-out applications are needed
again) it has to be swapped back in. During this time, the system is
very unresponsive. The whole process can take several minutes.

Has anyone else observed this behaviour? Is there a workaround?
(besides the obvious 'don't use memory mapped files')

Note that other UNIXes I tested (Linux, IRIX) seem to handle mmap()
more gracefully and the system remains fully usable during and after
the mmap(). The amount of swapping is minimal. This was not due to the
amount of free memory: The Linux box has about 256 MB RAM with ca. 150
MB free, the IRIX machine has only 64 MB with ca. 30 MB free. The Mac
(a PowerBook 12") has 640 MB RAM, 180 MB were unused (inactive). All
data was gathered with 'top'.

I know that I can give the VM manager hints about the memory usage (I
use it sequentially), but memadvise() seems to be broken in OS X. I
have this impression from previous posts to several mailing lists and
my own experience.

The OS version is 10.2.6. I'm using mmap() as recommended by Apple on
the following web page:

http://developer.apple.com/techpubs/macosx/Essentials/Performance/
FilesNetworksThreads/Reading_Lar_ile_Mapping.html


Alexander Oberdörster
Marcel Weiher
2003-06-21 11:08:01 UTC
Permalink
[cross-posted from maxos-x-dev to darwin-dev]
Post by Alexander Oberdörster
mmap() seems to allocate and retain memory very aggressively in OS X.
Yes. I filed a bug on that exact behavior about 2 years ago. Maybe
you should file one as well.
Post by Alexander Oberdörster
When I mmap() a large file (~200 MB in size) read-only and access it
sequentially from start to end, the RSIZE of my program quickly grows
to the size of the file. If the free physical memory is insufficient
first other data is paged out (running applications etc), then (after
my application has finished and when the paged-out applications are
needed again) it has to be swapped back in. During this time, the
system is very unresponsive. The whole process can take several
minutes.
Yup. In fact, I managed to completely lock up my test system with
about 4-5 iterations of such a test. It is really completely
unacceptable behavior for a modern VM implementation. In effect, you
are in the same situation as if you didn't have virtual memory.
Ridiculous.
Post by Alexander Oberdörster
Has anyone else observed this behaviour? Is there a workaround?
(besides the obvious 'don't use memory mapped files')
Well, there are some system calls that *should* help, but the last I
checked they didn't (and weren't even hooked up properly, see xnu).
This may have changed in the meantime.
Post by Alexander Oberdörster
Note that other UNIXes I tested (Linux, IRIX) seem to handle mmap()
more gracefully and the system remains fully usable during and after
the mmap(). The amount of swapping is minimal.
There shouldn't really be any swap-out at all.
Post by Alexander Oberdörster
This was not due to the amount of free memory: The Linux box has
about 256 MB RAM with ca. 150 MB free, the IRIX machine has only 64 MB
with ca. 30 MB free. The Mac (a PowerBook 12") has 640 MB RAM, 180 MB
were unused (inactive). All data was gathered with 'top'.
I know that I can give the VM manager hints about the memory usage (I
use it sequentially), but memadvise() seems to be broken in OS X. I
have this impression from previous posts to several mailing lists and
my own experience.
Yup.
Post by Alexander Oberdörster
The OS version is 10.2.6.
Well, that kills the theory that matters might have improved in the
meantime.

Marcel
--
Marcel Weiher Metaobject Software Technologies
***@metaobject.com www.metaobject.com
Metaprogramming for the Graphic Arts. HOM, IDEAs, MetaAd etc.
1d480c25f397c4786386135f8e8938e4
Jim Magee
2003-06-21 21:58:00 UTC
Permalink
Post by Marcel Weiher
Post by Alexander Oberdörster
Note that other UNIXes I tested (Linux, IRIX) seem to handle mmap()
more gracefully and the system remains fully usable during and after
the mmap(). The amount of swapping is minimal.
There shouldn't really be any swap-out at all.
Any pageout algorithm that automatically punishes mapped file pages
(simply because they don't have to be cleaned before recycling in many
cases) is inherently unfare. Especially since mapped file pages are
already "punished" on the front-end of the page replacement algorithm.
Consider the fact that you can zero fill lots of pages in the amount of
time it takes to read just one back from a disk (even after read-ahead
type algorithms are applied). So we already favor adding anonymous
pages _into_ the page pool. If we also favored keeping anonymous pages
in the pool longer (because we almost always have to swap them to
replace them), then there would be a very heavy bias towards those
types of pages.

So, without explicit intervention/hinting, I wholeheartedly disagree
with your assertion about "there shouldn't really be any swapping." If
the next likely page to replace needs to be swapped in order to bring
in the next page needed from a mapped file, we should do the swapping.
Of course, if there is hinting...we should be able to do a better job
of choosing a page to replace (see below).
Post by Marcel Weiher
Post by Alexander Oberdörster
This was not due to the amount of free memory: The Linux box has
about 256 MB RAM with ca. 150 MB free, the IRIX machine has only 64
MB with ca. 30 MB free. The Mac (a PowerBook 12") has 640 MB RAM, 180
MB were unused (inactive). All data was gathered with 'top'.
I know that I can give the VM manager hints about the memory usage (I
use it sequentially), but memadvise() seems to be broken in OS X. I
have this impression from previous posts to several mailing lists and
my own experience.
The BSD madvise() API is mostly forward-looking (and all future effects
on the memory as a result of the call are optional/advisory only). So,
even if it does nothing, it's not, technically, broken. And we do
more than nothing when these are called. It's just that we not be
aggressive as some other vendors. But we had some reason to our
madness:

Specifically, in your case, marking something as "sequential" still
doesn't give us quite enough information to avoid doing "really stupid"
things in some cases. That is, how far back from the current "fault"
should we start deactivating pages? If it's too far back, we still
spill over the available memory and at least have to swap some. If
it's not far enough back, the application could re-access the page
right after we deactivated it. That's because while pre-fetching in the
sequential case, we rarely are working in the VM system with the same
page the thread is currently working on. We could add some artificial
overhead to cause faults and detect where the thread "is". But that's
assuming a straight [single] linear progression through the pages. With
vector code, etc, you often make several passes through small[ish] data
chunks and then move on. So, "sequential" isn't all that meaningful
these days.

Instead, we decided to rely on specific "past-looking" feedback. The
msync(MS_INVALIDATE) call can be used to inform us which pages you
absolutely don't need anymore. It's very similar to
madvise(MADV_DONTNEED). But it has much more predictable scheduling
behavior in the face of actually finding dirty pages in the range.
That is: whose thread is used to do the cleaning to make the pages
available? Many assume the madvise() call will return somewhat
immediately, and the cleaning of any pages will happen asynchronously.
But that asynchronous behavior can affect the rest of your media
scheduling in unpredictable ways. If you use the synchronous msync()
call, you have greater control over all that.

So, we concentrated on the synchronous (and therefore much more
predictable) msync() first.

--Jim
Alexander Oberdörster
2003-06-21 11:46:02 UTC
Permalink
Post by Marcel Weiher
Yes. I filed a bug on that exact behavior about 2 years ago. Maybe
you should file one as well.
I will try. What's the official way to do this? Is mmap() Darwin stuff?
Did you try to contact Apple developers directly? I intended to post my
previous message in the Apple Mailing Lists (darwin-developer, to be
exact), but the mailing list daemons seem to be broken at the moment,
so I can't join.
Post by Marcel Weiher
It is really completely unacceptable behavior for a modern VM
implementation.
I agree. Kind of explains part of the slowness of the system on general.
Post by Marcel Weiher
Well, there are some system calls that *should* help, but the last I
checked they didn't (and weren't even hooked up properly, see xnu).
Can you tell me where exactly? I looked for mmap() & friends in the
Darwin Source code, but couldn't find my way around.
Post by Marcel Weiher
Well, that kills the theory that matters might have improved in the
meantime.
We can always hope for 10.3. Ha ha.


Alex
Shawn Erickson
2003-06-21 13:55:01 UTC
Permalink
Post by Alexander Oberdörster
Post by Marcel Weiher
Yes. I filed a bug on that exact behavior about 2 years ago. Maybe
you should file one as well.
I will try.
Do, or do not. Their is no try. (a short little green guy told me that
once)

Bug reports are just about the only way to insure they know about and
will track the issue.
Post by Alexander Oberdörster
What's the official way to do this?
http://developer.apple.com/bugreporter/index.html

-Shawn
Alexander Oberdörster
2003-06-21 14:48:01 UTC
Permalink
Post by Shawn Erickson
Bug reports are just about the only way to insure they know about and
will track the issue.
Ok, done.


Alexander
Alexander Oberdörster
2003-06-21 23:09:01 UTC
Permalink
The msync(MS_INVALIDATE) call can be used to inform us which pages you
absolutely don't need anymore. It's very similar to
madvise(MADV_DONTNEED).
Thanks for the hint. With msync(MS_INVALIDATE) every now and then, I
have no problems with mmap() any more.

I'm just a bit puzzled why other OSes show acceptable performance
without this kind of hint.
The BSD madvise() API is mostly forward-looking (and all future
effects on the memory as a result of the call are optional/advisory
only). So, even if it does nothing, it's not, technically, broken.
My problem with madvise() was that it returns with EINVAL when the len
exceeds a certain size. Maybe I don't understand madvise(), but for
example mmap()ing a 200 MB file and then madvise()ing with len =
filesize worked fine in Linux and IRIX, but doesn't in OS X.


Alexander

Loading...