# Future Works in LLVM Register Allocation ### Talk Overview - I. Introduction - 2. Upcoming Changes - 3. PBQP Lower PHI-instructions to copies ### PHI Lowering ### PHI Lowering ### PHI Lowering #### **BB1**: %x1 = . %xP = %x1 #### **BB2**: %x2 = .%xP = %x2 #### BB3: $$%x3 = %xP$$ Lower PHI-instructions to copies Lower Three-Address Instructions ### Two-Address Instructions ### Two-Address Instructions $$x3 = x2 + x1$$ ### Two-Address Instructions $$x3 = x2 + x1$$ $$x3 = x2$$ $x3 += x1$ Lower three-address instructions Construct live intervals ### Live Intervals # ### Live Intervals ### Live Intervals #### Construct Live Intervals Aggressively eliminate copies # Coalescing # Coalescing Aggressively eliminate copies Compute register assignment Apply register assignment ### Improvements - New and better optimizations - New allocators - Cleaner infrastructure ## I. Optimizations ``` ... // stuff ... = <expr> ... // more stuff ... = <expr> ``` # Splitting # Splitting # Splitting • "Linear Scan" is not, in fact, linear - "Linear Scan" is not, in fact, linear - We want something faster - "Linear Scan" is not, in fact, linear - We want something faster - Priority coloring? - "Linear Scan" is not, in fact, linear - We want something faster - Priority coloring? - Linear Scan? - "Linear Scan" is not, in fact, linear - We want something faster - Priority coloring? - Linear Scan? - Need to tidy the infrastructure ### 3. Cleaner Infrastructure Modify code in-place Modify code in-place ### Improvements - New and better optimizations - New allocators - Cleaner infrastructure # Upcoming Changes ### Upcoming Changes - Live index renumbering - Improved splitting - Better def/kill tracking for values ``` BB: %x2 = %x1 add %x3, %x2 ``` ``` BB: %x2 = %x1 add %x3, %x2 ``` ``` BB: %x2 = %x1 %x3 = \dots add %x3, %x2 ``` ``` BB: %x2 = %x1 %x3 = ... add %x3, %x2 ``` ``` BB: P_1 P_2 %x2 = %x1 P_3 %x3 = \dots add %x3, %x2 P_4 P_5 P<sub>6</sub> ``` ``` BB: P_1 P<sub>2</sub> %x2 = %x1 P_3 %x3 = \dots add %x3, %x2 P_4 P_5 P6 ``` $P_1 < P_2 < P_3 < P_4 < P_5 < P_6$ ``` BB: P_1 P<sub>2</sub> %x2 = %x1 P_3 %x3 = \dots P_7 add %x3, %x2 P_4 P_5 P6 ``` $$P_1 < P_2 < P_3 < P_4 < P_5 < P_6$$ ``` BB: P_1 P_2 %x2 = %x1 P_3 %x3 = \dots \mathbf{P}_{7} add %x3, %x2 P_4 P_5 P_6 ``` $$P_1 < P_2 < P_3 < P_7 < P_4 < P_5 < P_6$$ • unsigned → LiveIndex Index Renumbering • Break multi-value intervals into component values. - Break multi-value intervals into component values. - Each value gets a 2nd chance at allocation. - Break multi-value intervals into component values. - Each value gets a 2nd chance at allocation. - ... but not a 3rd. - Break multi-value intervals into component values. - Each value gets a 2nd chance at allocation. - ... but not a 3rd. - 13% reduction in static memory references on test case (a pathological SSE kernel). ## Better Def/Kill Tracking ## Better Def/Kill Tracking #### For Values • Defined by a PHI - Track the def block. ## Better Def/Kill Tracking #### For Values - Defined by a PHI Track the def block. - Killed by a PHI Track the appropriate predecessor. #### Future Work Better value def/kill tracking LiveIndex renumbering Improved splitting Rewrite-in-place New allocators # PBQP ### **PBQP** Partitioned Boolean Quadratic Problems - Discrete optimization problems - NP-complete - Subclass solvable in linear time ## Irregular Architectures - Multiple register classes. - Register aliasing. - Register pairing. • ... Y X Z For Register Allocation: Nodes represent virtual registers. Options reflect storage locations. Option costs: Typically zero cost for registers, spill cost estimate for stack slot. Edge costs: Depends on the constraint. ## Example Interference on a Regular Architecture ## Example I Interference on a Regular Architecture ## Example I #### Interference on a Regular Architecture | | Sp | AL | AH | ВЬ | CL | |----|----|----------|----------|----------|----| | Sp | 0 | 0 | 0 | 0 | 0 | | AX | 0 | <b>∞</b> | <b>∞</b> | 0 | 0 | | ВХ | 0 | 0 | 0 | <b>∞</b> | 0 | #### Coalescing | | Sp | AX | BX | |----|----|----|----| | Sp | 0 | 0 | 0 | | AX | 0 | -c | 0 | | ВХ | 0 | 0 | -c | #### Register Pairing (Ri, Ri+1) Sp s0 s1 s2 s3 Sp O O O O O O ∞ ∞ s2 s1 ``` regalloc -> pbqp solution = solve pbqp solution -> allocation ``` ``` regalloc -> pbqp solution = solve pbqp solution -> allocation ``` ``` regalloc -> pbqp solution = solve pbqp solution -> allocation ``` ``` regalloc -> pbqp solution = solve pbqp solution -> allocation ``` llvm/lib/CodeGen/RegAllocPBQP.cpp ### How does it work? Solver uses a graph reduction algorithm. Reduce problem to the empty graph with reduction rules, then reconstruct it. #### **PROS** - Ideal for Irregularity - Very Simple - Reasonable quality | PROS | CONS | |---------------------------------------------------------------------------------------------|-------------| | <ul> <li>Ideal for Irregularity</li> <li>Very Simple</li> <li>Reasonable quality</li> </ul> | • Sloooooow | | PROS | CONS | |------------------------------------------------------------------------------------------------------------------------------------------------|-------------| | <ul> <li>Ideal for Irregularity</li> <li>Very Simple</li> <li>Reasonable quality</li> <li>Perfect opportunity</li> <li>for a coffee</li> </ul> | • Siocessow | - Improved Optimizations. - New Allocators. - Cleaner Architecture. - PBQP. the end.