A quick and dirty description of the guile garbage collector, and the rgc. Version 1, by Greg Harvey *The current collector **Location The garbage collector currently included with guile is located in the gc.c and gc.h files in the libguile directory. The GC*MARK macros are located in tags.h **Description Guile currently employs a conservative mark and sweep collector. If you don't know what that means, there are some links to net-available gc literature at my guile website http://home.thezone.net/~gharvey/guile/guile.html This is quite terse, and I fully intend to get around to a more formal document (which is, in fact, already under way, but will take a little while, particularly since I'd rather be working on the gc right now than writing ;). It may prove useful for people wanting to get cracking on guile's gc, however, it has been hastily written, and the source is still the definitive guide to how it works. Also, note that I am in no way an official on guile, and the gc work I'm doing is not endorsed by the guile dev team or the FSF. What this means is that: 1) There's no solid reason to believe that any gc work I do will end up in the official guile distribution. 2) I hold no sway over any decisions on what does and doesn't become part of guile, and in any case, I wouldn't oppose a better gc than one that I produce. Everybody wins if guile gets a better gc. 3) If there's something wrong here, it's my fault, and you should let me know about it. **Implementation **Entry The main entry point to the garbage collector is scm_igc. ***scm_igc(what) scm_igc runs the gc proper (what is a char * that indicates the reason for the garbage collection). It calls scm_gc_start, to do any initialization that is required for gc (in the current collector, this just gets the start time, and clears the variables keeping track of the number of cells freed). It then locks the heap, empties out the dead entries of the continuation stack, conservatively marks the current register set, the c stack (or stacks, in the case of threads), and walks down the scm_sys_protects array, setting marks on each object contained there (these include the symbol tables, the permanent and protected objects). Then, the spines of weak_vectors (who we have been keeping track of during the garbage collection) are marked, and finally, scm_gc_sweep is called, to free all the unmarked objects and remove marks from the live ones. Finally, we unlock the heap, call scm_gc_end, then return. **Marking Marking is initiated from scm_igc, and employs two main functions: scm_mark(cell), which precisely marks from cell, and scm_mark_locations(chunk, len), which searches for apparent cell pointers in the chunk, setting marks on any potential cells found. Also used in marking is scm_mark_weak_vector_spines, which marks the back bone alist's used by weak hash tables to hold keys and values. ***SCM_GCMARKP(cell), SCM_SETGCMARK(cell), SCM_CLRGCMARK(cell) These three macros (located in tags.h), implement the marking for certain types of objects (the specifics can be found in scm_gc_mark; these objects are the first in the list). The representation of the mark on these objects is a 1 bit in the cdr of the object (and you have to use SCM_GCCDR(cell) to get the cdr of one of these objects during collection). There also exist *GC8* mark macros, which fill the same function, but leave a mark in bit 8 of the car of the object. ***scm_mark(cell) This is the workhorse of the group. It consists of a large switch statement on the type of the object it is called with, setting the marks of all live objects. ***scm_mark_locations(chunk, len) For each of the possible pointers from chunk to chunk+len, this checks if the value could be a pointer to one of the heap segments. If it is, this possible pointer is passed to scm_mark. In the operation of the gc, this is called on the c stack (or each stack of a thread, if threading is enabled) ***scm_mark_weak_vector_spines() This function assures that the back bone alists used by weak hashes are not freed by the garbage collector. **Sweeping ***scm_gc_sweep This is the main entry point to the sweeping phase. The sweeping operation works as follows: We walk through each cell of each heap segment. If the cell isn't marked, we free storage associated with the type of object that the cell represents (for example, with vectors we free the vector storage), then we set the cell to scm_tc_free_cell, and put it on the freelist. After all of the segments are swept, we fix up the weak vectors & hashes. For weak vectors, this means setting all entries that have been collected to SCM_BOOL_F. For weak hashes, this means shifting the alists, so that the alist is now composed of only live pairs. *The rgc The rgc is a modification to the current garbage collector, and is basically a test bed for the generational garbage collector. Notable differences between the rgc and the current gc: **The rgc does not keep marks directly on the objects. This means that there's no need for GCCDR in gc code, and that the different marking mechanism's for different objects is extinct. **The rgc attempts to gain performance by only selectively collecting segments. The current gc does a full collect every collection, while the rgc only collects a fraction on most collections (however, without a more intelligent way of tracing objects, the rgc doesn't provide a lot of benefits, since collections will tend to occur more often, and a full trace is necessary). **The rgc provides many more configurable options, both through envrionment variables, and with a scheme interface for specifing, for example, how often a full collection is run, the number of cells that must be collected to prevent allocation. **The rgc preallocates larger chunks of heap memory, and will dole them out in arbitrarily small portions. This allows us to use a smaller heap space, while not getting eaten alive with malloc costs. However, this currently makes it impossible to free empty heap segments. * Rgc performance In general, the current performance of the rgc is a bit worse than the current collector. **It wins: in cases where a lot of local garbage is created, then dies quickly. **It loses majorly: when a large allocation is taking place. It should be noted that the rgc is, for better or worse, dead. It has served it's purpose of exposing problematic areas of guile, as well as a testbed for things like the chunklets.