FIELD OF INVENTION
This invention relates to computer graphics processing, and more specifically to computer graphics processing using two or more architecturally distinct graphics processors.
BACKGROUND OF INVENTION
Many computing devices utilize high-performance graphics processors to present high quality graphics. High performance graphics processors consume a great deal of power (electricity), and subsequently generate a great deal of heat. In portable computing devices, the designers of such devices must trade off market demands for graphics performance with the power consumption capabilities of the device (performance vs. battery life). Some laptop computers are beginning to solve this problem by introducing two GPUs in one laptop-one a low-performance, low-power consumption GPU and the other a high-performance, high-power consumption GPU-and letting the user decide which GPU to use.
Often, the two GPUs are architecturally dissimilar. By architecturally dissimilar, it is meant that the graphical input formatted for one GPU will not work with the other GPU. Such architectural dissimilarity may be due to the two GPUs having different instruction sets or different display list formats that are architecture specific.
Unfortunately, architecturally dissimilar GPUs are not capable of cooperating with one another in a manner that allows seamless context switching between them. Therefore a problem arises in computing devices that use two or more architecturally dissimilar GPUs in that in order to switch from one GPU to another the user must stop what they are doing, select a different GPU, and then reboot the device.
This is somewhat awkward even with a laptop computer and considerably more awkward with hand-held portable computing devices such as mobile internet access devices, cellular telephones, hand-held gaming devices, and the like.
It would be desirable to allow the context switching to be hidden from the user and performed automatically in the background. Unfortunately, no solution is presently available that allows for dynamic, real-time context switching between architecturally distinct GPUs. The closest prior art is the Apple MacBook Pro, from Apple Computer of Cupertino, Calif., which contains two architecturally distinct GPUs but does not allow dynamic context switches between them. Another prior art solution is the Scalable Link Interface (SLI) architecture developed by nVidia Corporation of Santa Clara, Calif. This architecture lets a user run one or more GPUs in parallel, but only for the purpose of increasing performance, not to reduce power consumption. Also, this solution requires the two GPUs to be synchronized when the system is enabled, again requiring some amount of user intervention.
It is within this context that embodiments of the current invention arise.
DESCRIPTION OF THE SPECIFIC EMBODIMENTS
Embodiments of the present invention utilize a graphics processing system and method that allows two or more architecturally distinct GPUs with varying power consumption profiles to be combined so that certain graphics processing operations may transition seamlessly between the two GPUs without user intervention or even the user's knowledge. This is accomplished using an architecture-neutral display list instruction set in software, and having a specialized piece of hardware (the "GPU Context Controller") sit between the GPUs that translates the architecture-neutral instruction set into the architecture-specific instruction set of the given GPU:
According to an embodiment of the present invention, a graphics processing system, e.g., as shown in FIG. 1 may be configured to implement certain portions of a graphics processing method, e.g., as described below with respect to FIG. 2A and FIG. 2B.
The system 100 may include a central processing unit (CPU) 101 , a memory 102 first graphics processing unit (GPU) 103 , a second GPU 104 and a GPU context controller 105 . The memory 102 is coupled to the CPU 101 . The memory 102 may store applications and data for use by the CPU 101 . The memory 102 may be in the form of an integrated circuit, e.g., Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Read-Only Memory (ROM), and the like). By way of example, and not by way of limitation, the memory 102 may be in the form of RAM.
A computer program 106 may be stored in the memory 102 in the form of instructions that can be executed on the CPU 101 . The instructions of the program 106 may be configured to implement, amongst other things, certain parts of a graphical processing method that involves a context switch between the first and second graphics processing units 103 , 104 . The program 106 may perform physics simulations, vertex processing and other calculations related to drawing one or more images. The program 106 may also determine which of the GPU 103 , 104 is to be used for rendering the one or more images.
The GPU 103 , 104 receive input (e.g., data and/or instructions) resulting from the computations performed by the program 106 and further process the input to render the one or more images on a display 110 . Each of the GPU 103 , 104 may have a corresponding associated video RAM (VRAM) 107 A, 107 B. Each VRAM 107 A, 107 B allows the CPU 101 to process an image at the same time a GPU 103 , 104 reads it out to a display controller 108 coupled to the display 110 . By way of example, the VRAM 107 A, 107 B may be implemented in the form of dual ported RAM that allows multiple reads or writes to occur at the same time, or nearly the same time. Each VRAM 107 A, 107 B may contain both input (e.g., textures) and output (e.g., buffered frames). Each VRAM 107 may be implemented as a separate local hardware components of each GPU. Alternatively, each VRAM 107 may be virtualized as part of the main memory 102 .
The GPU 103 , 104 are in general, architecturally dissimilar. As noted above, the term "architecturally dissimilar" means that graphical input formatted for one GPU 103 will not work with the other GPU 104 and vice versa. By way of example, and not by way of limitation, the two GPU may have different instruction sets, different display lists, or both. In addition, in some embodiments, the two GPU 103 , 104 may have different processing performance and power consumption characteristics.
To facilitate fast context switching between the two GPU 103 , 104 , the program 106 generates the input, e.g., a display list, for the GPU 103 , 104 in an architecture neutral format. As used herein, the term "architecture neutral-format" refers generally to a format that does not depend on a specific processor architecture of a particular GPU. The input is sent to the GPU Context Controller 105 , which may be implemented in hardware, e.g., as an application specific integrated circuit (ASIC) or in software, e.g., as a logic block of coded instructions running on the CPU.
The GPU Context Controller 105 may be implemented as a just-in-time compiler, which compiles the input from the architecture neutral format into a format that is specific to one of the GPU 103 , 104 or the other. The GPU that is to receive the compiled input is referred to herein as the active GPU. The GPU that does not receive the compiled input is referred to herein as the inactive GPU. The GPU Context Controller 105 translates architecture-neutral display list instructions to the architecture-specific display list instruction set of the active GPU. The resulting instruction set is then sent to the active GPU for rendering. The inactive GPU is shut down while the active GPU is in use. Shutting down the inactive GPU can provide a considerable reduction in power consumption.
In addition to translating the instruction set, the GPU Context Controller 105 may monitor power consumption metrics for the active GPU to determine which of the GPU 103 , 104 should be used as the active GPU. The GPU Context Controller 105 may also dynamically perform context switches between the two GPUs 103 , 104 based on active load, anticipated load and/or direct selection messages from the CPU 101 . Context switches may be performed by reading the GPU state from one GPU, translating the state to the format of the other, and then uploading the state to the other GPU. If necessary, the Context Controller 105 may transfer VRAM contents one GPU to another. This requires the architecture-neutral display list to reference VRAM contents by virtual address instead of direct address. After a context switch the GPU Context Controller 105 may instruct the video display controller 108 to switch the VRAM address for framebuffer access.
The system described above may implement a graphics processing method according to an embodiment of the present invention. By way of example, and not by way of limitation, a computer-implemented graphics processing method 200 may proceed as illustrated in FIG. 2A. Specifically, the CPU 101 may produce graphics input for a GPU, as indicated at 201 . The CPU 101 may produce graphics input for a sequence of frames processing each frame in the order in which it is to be displayed on the display device 110 . As described above, the graphics input includes an architecture-neutral display list 202 . The GPU Context Controller 105 translates the display list 202 into an architecture specific format for the active GPU, as indicated at 203 . In the example illustrated in FIG. 2A GPU A 103 is active and GPU B 104 is inactive.
The GPU Context Controller 105 sends the translated display list 204 to the active GPU A 103 for processing, as indicated at 205 . GPU A 103 processes the translated display list, as indicated at 207 and generates output for rendering. The output is sent to the display controller 108 for rendering on the display device 110 as indicated at 209 .
To facilitate optimum power consumption, the GPU Context Controller 105 may monitor the power consumption of the active GPU, as indicated at 211 for the purpose of determining whether or not to perform a context switch. The GPU Context Controller 105 may also wait for a signal from the CPU 101 indicating that a context switch between the currently active GPU and the currently inactive GPU should be performed. If one or more criteria for performing a context switch are met, as indicated at 213 , the GPU Context Controller 105 may perform a context switch, as indicated at 215 . The GPU Context Controller 105 may the deactivate GPU A, e.g., by shutting it down, if it is to be no longer active after the context switch.
FIG. 2B illustrates an example of a context switch 220 . In this example, GPU A 103 is initially active and GPU B 104 is initially inactive. As indicated at 222 a context switch is triggered. There are a number of different ways of triggering a context switch. One way, as indicated above, is based on monitoring of power consumption of the active GPU. For example, GPU A and GPU B may have different power consumption and processing capabilities. By way of example, and not by way of limitation, GPU A may be a high power GPU and GPU B may be a low power GPU having lower power consumption than GPU A and a maximum processing capacity that is less than a maximum processing capacity of GPU A. In such a case, the GPU Context Controller 105 may be configured (e.g., programmed) to perform a context switch from GPU A to the GPU B if the GPU A is active operating at a processing capacity that is less than or equal to the maximum processing capacity GPU B.
Alternatively, if GPU A is the lower power GPU and GPU B is the high power GPU, the GPU Context Controller 105 may perform a context switch from GPU A to GPU B if GPU A is operating at its maximum processing capacity, and a frame render time is decreasing.
In some implementations, it may be desirable for the GPU Context Controller 105 to way for active GPU A 103 to finish processing a currently processing frame as indicated at 223 and 225 before implementing a context switch. The GPU Context Controller 105 may wait, as indicated at 224 until processing is finished as indicated at 226 . To implement the context switch, the GPU Context Controller 105 may read a state 227 of the active GPU A 103 , as indicated at 228 . The state may then be translated into a translated GPU state 229 that is in a format suitable for use by GPU B 104 as indicated at 230 . The GPU context controller 105 may activate GPU B 104 , as indicated at 232 . Activation of GPU B 104 may take place either before or after translating the state of GPU A 103 . Once GPU B 104 is activated, the translated GPU state 229 may be transferred to GPU B 104 , as indicated at 234 . In some embodiments, the GPU Context Controller 105 may optionally read the contents 233 of the VRAM 107 A of GPU A 103 and transfer them to the VRAM 107 B of GPU B 104 , as indicated at 236 . Once the GPU Context Controller 105 has extracted from GPU A 103 the information necessary for the context switch, GPU A 103 may be deactivated, as indicated at 238 . The GPU Context Controller 105 may then process the next frame as indicated at 240 . Subsequent processing would involve translating the display list 202 from the CPU 101 into the architecture specific format for GPU B 104 and sending the resulting translated display list 204 to GPU B 104 for processing.
It is noted that the order of operations shown in FIG. 2B is meant as an example and is not the only possible order. For example, it is possible to deactivate GPU A before activating GPU B if the necessary information for performing the context switch (e.g., state 227 and VRAM contents 233 have been extracted from GPU A and stored, e.g., in memory 102 .
The above-described approach to reducing power consumption requirements in a GPU is a considerable departure from current power-reducing measures. Current power reducing measures in modern GPUs involve "power stepping" in which parts of the GPU are disabled based on load. While these measures may have a small impact on power consumption, they do not have as great effect as disabling an entire GPU. Using two architecturally distinct GPUs is also a bold approach, because it involves the production of an architecture-neutral display list.
A graphics processing apparatus may be configured in accordance with embodiments of the present invention in any of a number of ways. By way of example, FIG. 3 is a more detailed block diagram illustrating a graphics processing apparatus 300 according to an embodiment of the present invention. By way of example, and without loss of generality, the graphics processing system 300 may be implemented as part of a computer system, such as a personal computer, video game console, personal digital assistant, cellular telephone, hand-held gaming device, portable internet device or other digital device.
The apparatus 300 generally includes a central processing unit (CPU) 301 , a memory 302 , two or more graphics processing units (GPU) 304 A, 304 B, and a GPU Context Controller 305 . The system may further include a display controller 308 coupled to a display device 310 .
The apparatus 300 may also include well-known support functions 311 , such as input/output (I/O) elements 312 , power supplies (P/S) 313 , a clock (CLK) 314 and cache 315 . The apparatus 300 may further include a storage device 316 that provides non-volatile storage for software instructions 317 and data 318 . By way of example, the storage device 316 may be a fixed disk drive, removable disk drive, flash memory device, tape drive, CD-ROM, DVD-ROM, Blu-ray, HD-DVD, UMD, or other optical storage devices.
The CPU 301 may include one or more processing cores. By way of example and without limitation, the CPU 301 may be a parallel processor module, such as a Cell Processor. An example of a Cell Processor architecture is described in detail, e.g., in Cell Broadband Engine Architecture, copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation Aug. 8, 2005 a copy of which may be downloaded at http://cell.scei.co.jp/, the entire contents of which are incorporated herein by reference.
The CPU 301 may be configured to run software applications and optionally an operating system. The software applications may include graphics processing software 303 portions of which may be stored in the memory 302 and loaded into registers of the CPU 301 and/or GPU Context Controller 305 for execution.
The CPU 301 and GPU Context Controller 305 may be configured to implement the operations described above with respect to FIG. 2A and FIG. 2B. Specifically, the graphics processing software 303 may include instructions that, upon execution, cause the CPU 301 to produce graphics input 309 for the GPU 304 A, 304 B. The graphics input 309 may be in a format having an architecture-neutral display list. The GPU Context Controller 305 may be configured to translate instructions in the architecture neutral display list into an architecture specific format for one of the GPU 304 A, 304 B or the other depending on which one of them is active. The GPU Context controller 305 may also be configured to determine whether to perform a context switch between the two GPU 304 A, 304 B, to perform the context switch, and to shut down the GPU that is inactive after the context switch.
There are a number of ways in which the GPU Context Controller 305 may be configured to perform the above-described tasks. In general, the GPU Context Controller 305 may be configured to execute software instructions of the graphics processing program 303 . By way of example, the GPU Context Controller 305 may be implemented as a dedicated separate processor component that is completely independent of the CPU 301 . Alternatively, the GPU Context Controller 305 may be implemented within the CPU 301 . For example, if the CPU 301 has a multi-core or parallel processor architecture having multiple processor elements, the functions of the GPU Context Controller 305 may be implemented through instructions executed on one or more of these processor elements. Alternatively, the functions of the GPU Context Controller 305 may be implemented through a software thread of the program 303 that runs on the CPU 301 . Thus, although the CPU Context Controller 305 is shown as a separate block in FIG. 3, embodiments of the present invention encompass implementation of the CPU Context Controller 305 , and/or its functions on the CPU 301 .
The GPU 304 A, 304 B may be architecturally dissimilar, as described above. Each graphics processing unit (GPU) 304 A, 304 B may include a graphics memory 307 A, 307 B such as a video RAM. Each graphics memory 307 A, 307 B may include a display memory (e.g., a frame buffer) used for storing pixel data for each pixel of an output image. Each graphics memory 307 A, 307 B may be integrated in the same device as the corresponding GPU 304 A, 304 B, connected as a separate device with the corresponding GPU 304 A, 304 B, and/or implemented within the memory 302 . Pixel data may be provided to either graphics memory 307 A, 307 B directly from the CPU 301 or via the GPU Context Controller 305 . Alternatively, the CPU 301 or GPU Context Controller 305 may provide the active GPU 304 A or 304 B with data and/or instructions defining the desired output images, from which the active GPU may generate the pixel data of one or more output images. The data and/or instructions defining the desired output images may be stored in memory 302 and/or graphics memory 307 A, 307 B. In one embodiment, one or both GPU 304 A, 304 B may be configured (e.g., by suitable programming or hardware configuration) with 3 D rendering capabilities for generating pixel data for output images from instructions and data defining the geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The GPU 304 A, 304 B may further include one or more programmable execution units capable of executing shader programs.
As noted above, only one of the GPU 304 A, 304 B is active at a time. The active GPU may periodically output pixel data for an image from the corresponding graphics memory to be displayed on the display device 310 . The display device 308 may be any device capable of displaying visual information in response to a signal from the client device 300 , including CRT, LCD, plasma, and OLED displays. The display controller 308 may convert the pixel data to signals that display device 310 uses to generate visible images. The display controller 308 may provide the display device 310 with analog or digital signals. By way of example, the display 310 may include a cathode ray tube (CRT) or flat panel screen that displays visible text, numerals, graphical symbols or images.
One or more user interface devices 320 may be used to communicate user inputs from one or more users to the system 300 . By way of example, one or more of the user input devices 320 may be coupled to the system 300 via the I/O elements 312 . Examples of suitable input device 320 include keyboards, computer mice, joysticks, touch pads, touch screens, light pens, still or video cameras, and/or microphones.
The apparatus 300 may include a network interface 325 to facilitate communication via an electronic communications network 327 . The network interface 325 may be configured to implement wired or wireless communication over local area networks and wide area networks such as the Internet. The system 300 may send and receive data and/or requests for files via one or more message packets 326 over the network 327 .
In addition, the apparatus 300 may optionally include one or more audio speakers that produce audible or otherwise detectable sounds. To facilitate generation of such sounds, the apparatus 300 may further include an audio processor 330 adapted to generate analog or digital audio output from instructions and/or data provided by the CPU 301 , memory 302 , and/or storage 316 .
The components of the apparatus 300 , including the CPU 301 , memory 302 , GPU 304 A, 304 B, GPU Context Controller 305 , support functions 311 , data storage 316 , user input devices 320 , network interface 325 , and audio processor 350 may be operably connected to each other via one or more data buses 360 . These components may be implemented in hardware, software or firmware or some combination of two or more of these.
According to another embodiment, instructions for carrying out graphics processing as described above may be stored in a computer readable storage medium. By way of example, and not by way of limitation, FIG. 4 illustrates an example of a computer-readable storage medium 400 . The storage medium contains computer-readable instructions stored in a format that can be retrieved interpreted by a computer processing device. By way of example, and not by way of limitation, the computer-readable storage medium 400 may be a computer-readable memory, such as random access memory (RAM) or read only memory (ROM), a computer readable storage disk for a fixed disk drive (e.g., a hard disk drive), or a removable disk drive. In addition, the computer-readable storage medium 400 may be a flash memory device, a computer-readable tape, a CD-ROM, a DVD-ROM, a Blu-ray, HD-DVD, UMD, or other optical storage medium.
The storage medium 400 contains Graphics processing instructions 401 including one or more instructions 402 for producing graphics input in a format having an architecture-neutral display list, and one or more instructions 403 for translating instructions in an architecture-neutral display list into GPU-specific instructions. The medium 400 may also optionally include one or more power monitoring instructions 404 , one or more context switch determination instructions 406 , one or more context switch instructions 408 and one or more inactive GPU shutoff instructions 410 . The power monitoring instructions 404 may be configured for monitoring power consumption and/or performance of a GPU, e.g., as described above with respect to item 211 of FIG. 2A. The context switch determination instructions 406 may be configured for determining whether one or more criteria for triggering a context switch are met, as discussed above with respect to 213 of FIG. 2A and 222 of FIG. 2B. The context switch instructions 408 may be configured for performing a context switch between two GPU, e.g., as described above with respect to 224 , 226 , 228 , 230 , 232 , 234 , 236 , 238 , and 240 of FIG. 2B. The inactive GPU shutoff instructions 410 may be configured for shutting of a GPU that is inactive after a context switch, e.g., as described above with respect to 217 of FIG., 2 A.
Embodiments of the present invention as described herein may be extended to enable dynamic load balancing between two or more graphics processors for the purpose of increasing performance at the cost of power, but with architecturally similar GPUs (not identical GPUs as with SLI). By way of example, and not by way of limitation, a context switch may be performed between the two similar GPUs based on which one would have the higher performance for processing a given set of GPU input. Performance may be based, e.g., on an estimated amount of time or number of processor cycles to process the input.
If two GPUs are sufficiently architecturally similar, graphical input formatted for one GPU will work with the other GPU and vice versa. In such a case, it would not be necessary to generate the input in an architecture neutral format and translate it to an architecture specific format.
Another solution would be to have the CPU interpret the architecture neutral instruction set and have the GPU Context Controller completely shut down the GPU. Graphics performance might severely degrade but potentially less power would be consumed. According to this solution the CPU would take over the processing tasks handled by the GPU. In such a case, this solution may be implemented in a system with just one GPU. Specifically, the CPU could take over for the GPU by performing a context switch between the GPU and the CPU.
SRC= http://www.freepatentsonline.com/y2010/0253690.html
DYNAMIC CONTEXT SWITCHING BETWEEN ARCHITECTURALLY DISTINCT GRAPHICS PROCESSORS