Memory Hierarchy And Entry Time - Sand Software And Sound
This page takes a more in-depth look on the Raspberry Pi memory hierarchy. Every stage of the memory hierarchy has a capacity and speed. Capacities are relatively simple to find by querying the operating system or MemoryWave Community studying the ARM1176 technical reference handbook. Speed, nonetheless, shouldn't be as straightforward to find and should usually be measured. I exploit a easy pointer chasing method to characterize the behavior MemoryWave Community of every level within the hierarchy. The technique additionally reveals the behavior of memory-related efficiency counter occasions at each stage. The Raspberry Pi implements five ranges in its memory hierarchy. The levels are summarized within the table under. The very best degree consists of digital memory pages which might be maintained in secondary storage. Raspbian Wheezy keeps its swap space within the file /var/swap on the SDHC card. That is sufficient space for 25,600 4KB pages. You are allowed as many pages as will match into the preallocated swap area.
zhihu.com
The Raspberry Pi has both 256MB (Model A) or 512MB (Model B) of major memory. This is enough area for 65,536 pages or 131,072 physical pages, if all of primary Memory Wave were out there for paging. It isn’t all out there for person-space applications because the Linux kernel wants house for its own code and information. Linux also supports massive pages, however that’s a separate subject for now. The vmstat command shows details about digital memory usage. Please confer with the man page for utilization. Vmstat is an effective instrument for troubleshooting paging-related performance issues since it exhibits web page in and out statistics. The processor in the Raspberry Pi is the Broadcom BCM2835. The BCM2835 does have a unified level 2 (L2) cache. Nevertheless, the L2 cache is dedicated to the VideoCore GPU. Memory references from the CPU facet are routed around the L2 cache. The BCM2835 has two stage 1 (L1) caches: a 16KB instruction cache and a 16KB data cache.
Our analysis beneath concentrates on the information cache. The info cache is 4-method set associative. Every means in an associative set shops a 32-byte cache line. The cache can handle up to 4 lively references to the identical set without battle. If all 4 methods in a set are valid and a fifth reference is made to the set, then a conflict happens and one of the 4 ways is victimized to make room for the brand new reference. The data cache is nearly indexed and physically tagged. Cache traces and tags are saved separately in DATARAM and TAGRAM, respectively. Digital tackle bits 11:5 index the TAGRAM and DATARAM. Given a 16KB capability, 32 byte strains and four ways, there must be 128 sets. Virtual address bits 4:0 are the offset into the cache line. The information MicroTLB interprets a digital deal with to a physical handle and sends the physical tackle to the L1 information cache.
The L1 information cache compares the bodily handle with the tag and determines hit/miss standing and the proper means. The load-to-use latency is three (3) cycles for an L1 data cache hit. The BCM2835 implements a two level translation lookaside buffer (TLB) structure for virtual to bodily deal with translation. There are two MicroTLBs: a ten entry data MicroTLB and a ten entry instruction MicroTLB. The MicroTLBs are backed by the principle TLB (i.e., the second level TLB). The MicroTLBs are absolutely associative. Every MicroTLB translates a virtual tackle to a bodily tackle in one cycle when the page mapping info is resident in the MicroTLB (that is, a hit in the MicroTLB). The primary TLB is a unified TLB that handles misses from the instruction and information MicroTLBs. A 64-entry, 2-method associative structure. Most important TLB misses are dealt with by a hardware page desk walker. A web page table stroll requires at the least one additional Memory Wave entry to find the web page mapping information in main memory.