Portability by Automatic Translation: Two Case Studies Yishai A. Feldman The Interdisciplinary Center Herzliya, Israel
First Case: Bogart Large-Scale Translation from Assembly Language to C Joint Work with Doron A. Friedman
The Problem u 400,000 lines of IBM 370 assembly code u Customers downsizing mainframes u Hand-optimized code over 15 calendar years u Live system
Success Criteria u Portability u Efficiency u Minimum manual work BUT u Readability is not important
Difficult Assembly Features u Registers u Condition code u Untyped language u Unstructured code u Large unstructured memory areas u Portability: u Different byte order u Different word size u Different pointer size
The Simulating Translator STM R14,R12,12(R13) LR R12,R15 USING HORNER,R12 ST R13,SAV+4 LA R13,SAV LA R7,COEF L R5,0(R7) LA R9,0 LOOP CR R9,R2 BNL OUT LA R9,1(R9) LA R7,4(R7) MR R4,R3 A R5,0(R7) B LOOP OUT LR R0,R5 LM R1,R12,24(R13) BR R14 void HORNER(tagSAPReg *Reg) { T_stm(14,12,((Reg[13].ucp+12)),Reg); Reg[12].sw = Reg[15].sw ; SAV[1] = Reg[13].sw ; Reg[13].pv = &(SAV[0]) ; Reg[7].pv = &(COEF[0]) ; Reg[5].sw = *(sWord *)Reg[7].ucp; Reg[9].sw = 0 ; LOOP: if ((Reg[9].sw) >= Reg[2].sw) goto OUT; Reg[9].sw += 1; Reg[7].sw += 4; T_mult(&Reg[4],Reg[3].sw) ; Reg[5].sw += *((sWord *)(Reg[7].ucp)); goto LOOP ; OUT: Reg[0].sw = Reg[5].sw; T_lm(1,12,((Reg[13].ucp+24)),Reg); return; }
Bogart Better Optimizing General-purpose Abstract Representation Translator
Translation by Abstraction, Transformation, and Reimplementation u Control-flow and data-flow analysis u Typing by constraint propagation u Automatic cliche recognition Abstraction Transformation Re-implementation
The Code Produced by Bogart void HORNER(tagSAPReg *Reg) { T_stm(14,12,((Reg[13].ucp+12)),Reg); Reg[12].sw = Reg[15].sw ; SAV[1] = Reg[13].sw ; Reg[13].pv = &(SAV[0]) ; Reg[7].pv = &(COEF[0]) ; Reg[5].sw = *(sWord *)Reg[7].ucp; Reg[9].sw = 0 ; LOOP: if ((Reg[9].sw) >= Reg[2].sw) goto OUT; Reg[9].sw += 1; Reg[7].sw += 4; T_mult(&Reg[4],Reg[3].sw) ; Reg[5].sw += *((sWord *)(Reg[7].ucp)); goto LOOP ; OUT: Reg[0].sw = Reg[5].sw; T_lm(1,12,((Reg[13].ucp+24)),Reg); return; } sWord HORNER(sWord r2sw, sWord r3sw) { sWord r5sw; sWordPtr r7swp; sWord r9sw; r7swp = (sWord *)(&COEF[0]); r5sw = *r7swp; r9sw = 0; while (r9sw < r2sw) { r9sw++; r7swp++; r5sw = r5sw*r3sw + *r7swp; } return r5sw; }
Accumulating Compound Expressions L R11,NIG L R5,0(R4) BAL R14,INCTAB LPR R7,R0 MR R2,R7 LR R0,R3 BR R14 Reg[11].sw = NIG; Reg[5].sw = *(sWord *)Reg[4].ucp; INCTAB(Reg); Reg[7].sw = labs(Reg[0].sw); T_mult(&Reg[2], Reg[7].sw); Reg[0].sw = Reg[3].sw; return; return r3sw * labs(INCTAB(NIG,(*r4swp))); Bogart Assembly Simulating translator
Example: Condition Code Support CGLOOP CH R7,0(R5,R4) SRL R2,1 BH CGADD BNH CGSUB Assembly if (Reg[7].sw == *((sHalf *)((Reg[4].ucp+Reg[5].sw)))) __CC = _CZero; else if (Reg[7].sw < *((sHalf *)((Reg[4].ucp+Reg[5].sw)))) __CC = _COne; else __CC = _CTwo; Reg[2].uw >>= 1; if (__CC & 0x4) goto CGADD; if (__CC & 0x3) goto CGSUB; Simulating translator Bogart r2sh >>= 1; temp = r4ucp + r5sh; if (r7sh > temp) goto CGADD; if (r7sh <= temp) goto CGSUB;
The Plan Representation CGR CH R2,GMF BL CONGR SRL R2,1 B CGR CONGR....
Global Type Analysis by Constraint Propagation
Time and Space Comparison Bogart and Simulating Translator Simulator Bogart Time (sec.) Space (bytes) BIN HORNER RANDOM SAPDBMS ( SAPDBMS is a central Sapiens module)
Time Performance on Several Platforms (For example routine BIN) IBM 370 RS/6000 AS/400 PC (DOS) 0.32— — — Original Assembly Simulator failed1.26 Bogart 1111 Hand Crafted
Results u Bogart produces more portable code u Bogart supports a larger portion of the source language u Bogart requires less manual work in code preparation u Bogart produces more efficient code in terms of time and space performance
Conclusions u Translation by abstraction produces better results than simulation on all criteria u Simulation is simpler and faster to implement u Simulated code is easier for the programmers to debug u The advantages of the abstraction approach grow in the long term u “Research-then-transfer” versus “Industry-as- laboratory” (Colin Potts, 1993)
New Developments: Bogart Falcon 2000
Second Case: MIDAS Automatic High-Quality Reengineering of Database Programs by Temporal Abstraction Joint Work with Yossi Cohen
The Problem Legacy Database Software u Much legacy software is DP, many database- related programs u Conversion from older models (indexed- sequential, hierarchical, network) to relational/object-oriented databases u Need to convert: u Schema u Data u Software
Network vs. Relational Databases
Network Database Program (1) 01 MOVE 0 TO STATUS1. 02 PERFORM UNTIL STATUS1 IS NOT EQUAL TO ZERO 03 FETCH NEXT STUDENT WITHIN DEPT-OF-STUDENT 04 AT END MOVE 1 TO STATUS1 05 IF STATUS1 IS EQUAL TO 0 THEN 06 IF STUDENT-DEGREE IS EQUAL TO 2 THEN 07 MOVE 0 TO GRADES-SUM 08 MOVE 0 TO GRADES-COUNT 09 PERFORM SUM-STUDENT-GRADES 10 DIVIDE GRADES-SUM BY GRADES-COUNT 11 GIVING GRADES-AVG 12 IF GRADES-AVG > 95 THEN 13 DISPLAY..., GRADES-AVG 14 END-IF 15 END-IF 16 END-IF 17 END-PERFORM.
Network Database Program (2) 18 SUM-STUDENT-GRADES. 19 MOVE 0 TO STATUS2 20 PERFORM UNTIL STATUS2 IS NOT EQUAL TO ZERO 21 FETCH NEXT GRADES WITHIN STUDENT-OF-GRADES 22 AT END MOVE 1 TO STATUS2 23 IF STATUS2 IS EQUAL TO 0 THEN 24 ADD GRD-GRADE TO GRADES-SUM 25 ADD 1 TO GRADES-COUNT 26 END-IF 27 END-PERFORM.
Naive Translation (1) 01 EXEC SQL DECLARE CRS1 CURSOR FOR 02 SELECT... FROM STUDENT 03 WHERE DEPT-NAME = :DEPT-NAME 04 END-EXEC. 05 EXEC SQL DECLARE CRS2 CURSOR FOR 06 SELECT... FROM GRADES 07 WHERE STUDENT-ID = :STUDENT-ID 08 END-EXEC.
Naive Translation (2) 09 MOVE 0 TO STATUS1 10 EXEC SQL OPEN CRS1 END-EXEC 11 PERFORM UNTIL STATUS1 IS NOT EQUAL TO 0 12 EXEC SQL FETCH CRS1 INTO... END-EXEC. 13 IF SQL-STATUS = SQL-NOT-FOUND THEN MOVE 1 TO STATUS1. 14 IF STATUS1 IS EQUAL TO 0 THEN 15 IF STUDENT-DEGREE IS EQUAL TO 2 THEN 16 MOVE 0 TO GRADES-SUM 17 MOVE 0 TO GRADES-COUNT 18 PERFORM SUM-STUDENT-GRADES 19 DIVIDE GRADES-SUM INTO GRADES-COUNT GIVING GRADES-AVG 20 IF GRADES-AVG > 95 THEN 21 DISPLAY..., GRADES-AVG 22 END-IF 23 END-IF 24 END-IF 25 END-PERFORM. 26 EXEC SQL CLOSE CRS1 END-EXEC.
Naive Translation (3) 27 SUM-STUDENT-GRADES. 28 MOVE 0 TO STATUS2 29 EXEC SQL OPEN CRS2 END-EXEC. 30 PERFORM UNTIL STATUS2 IS NOT EQUAL TO 0 31 EXEC SQL FETCH CRS2 INTO... END-EXEC. 32 IF SQL-STATUS = SQL-NOT-FOUND 33 THEN MOVE 1 TO STATUS2. 34 IF STATUS2 IS EQUAL TO 0 THEN 35 ADD GRD-GRADE TO GRADES-SUM 36 ADD 1 TO GRADES-COUNT 37 END-IF 38 END-PERFORM. 39 EXEC SQL CLOSE CRS2 END-EXEC.
MIDAS: Translation by Abstraction, Transformation, and Re-implementation Abstraction Temporal Abstraction Re-implementation Network DB code Relational DB code Plan Temporal Plan
MIDAS Translation 01 EXEC SQL DECLARE CRS1 CURSOR FOR 02 SELECT STUDENT.STUDENT-ID, FIRST-NAME, LAST-NAME, AVG(GRADE) 03 FROM STUDENT, GRADES 04 WHERE DEGREE = 2 05 AND DEPT-NAME = :DEPT-NAME 06 AND GRADES.STUDENT-ID = STUDENT.STUDENT-ID 07 GROUP BY STUDENT.STUDENT-ID, FIRST-NAME, LAST-NAME 08 HAVING AVG(GRADE) > END-EXEC. 10 PERFORM UNTIL SQL-STATUS = SQL-NOT-FOUND 11 EXEC SQL FETCH CRS1 INTO... END-EXEC. 12 DISPLAY..., GRADES-AVG 13 END-PERFORM
The Internal Representation: Query Graphs u Temporal abstraction u Generate / Join u Filter u Map u Aggregate u Wide-spectrum formalism
Filtering Transformation
Join-Down Transformation
Accumulation Transformation
Performance Results
Conclusions u Translation by abstraction, transformation, and re-implementation demonstrated in two domains u Query graphs as abstraction for database operations u Adapts to different schema transformations u Scalability
Conclusions and Future Work u Appropriate domain u Same host language u Few cliches give wide coverage u Important commercially u Generalizations u Other legacy models u OODB / 4GL as targets
Questions? Papers can be downloaded from