Examining Teradata To Google BigQuery Migration

2 min read
Aug 29, 2019 12:00:00 AM
Cloud migration is hot nowadays. Enterprises are considering options to migrate on-premises data and applications to cloud (AWS/GCP/Azure) to get the benefits of quick deployments, pay-per-use models and flexibility. Recently, I got a chance to work on data migration from Teradata to Google BigQuery where the task at hand was to convert Teradata BTEQ's to BigQuery SQLs. SQL, on a higher level, stays the same with few differences. In this post, I will cover a few generic SQLs (written in Teradata) with a conversion to Google BigQuery just to give a glimpse of syntax differences between two platforms as all the differences cannot be covered in one blog post.  
Task Teradata SQL BigQuery SQL
Select Rows with a particular rank SELECT col1, col2 FROM <DB>.<TBL> QUALIFY ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY <DATE> DESC) = 1 SELECT * EXCEPT(rank) FROM ( SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY <DATE> DESC) as rank FROM <DB>.<TBL> ) where rank=1
Concatenate multiple data types select ' String 1 ' || ' String 2 '; --Multiple data types can be concatenated e.g. integer, varchar, decimal etc. select 'String 1 '|| 2 select concat(' String 1 ',' String 2 '); -- with limitation, can only concat STRING or BYTES OR cast data types first and then concatenate select concat('String 1 ',cast(2 as STRING));
Truncate Table Delete from <DB>.<TBL> OR Delete from <DB>.<TBL> ALL Delete from <DB>.<TBL> WHERE TRUE
Combine two result sets with no duplicate row SELECT * from <DB>.<TBL1> union SELECT * from <DB>.<TBL2> SELECT * from <DB>.<TBL1> union distinct SELECT * from <DB>.<TBL2>
Timestamp Difference to get Interval select (timestamp '2019-08-26 10:50:00.000000' - timestamp '2019-08-25 07:40:00.000000') DAY(4) to SECONDOutput:1 03:10:00.000000 where: 1 = day(s) 03 = Hour(s) 10 = Minute(s) 00 = Second(s) 000000 = Milliseconds WITH RESULT_IN_SECONDS AS ( SELECT TIMESTAMP_DIFF(TIMESTAMP '2019-08-26 10:50:00.000000',TIMESTAMP '2019-08-25 07:40:00.000000',SECOND) AS SECONDS ) SELECT CONCAT(CAST(DAYS AS STRING),' ',HOURS,':',MINUTES,':',SECONDS,'.000000') FROM ( SELECT CAST(DAYS AS INT64) AS DAYS , FORMAT("%02d",CAST(HOURS AS INT64)) AS HOURS, FORMAT("%02d",CAST(MINUTES AS INT64)) AS MINUTES, FORMAT("%02d",CAST((REM_SEC_2 - (MINUTES * 60)) AS INT64)) AS SECONDS FROM ( SELECT DAYS,HOURS, FLOOR(REM_SEC_2/60) AS MINUTES, REM_SEC_2 FROM ( SELECT DAYS,HOURS, (REM_SEC - (HOURS * 3600 ) ) AS REM_SEC_2 FROM ( SELECT DAYS,FLOOR((SECONDS - (DAYS * 86400))/3600) AS HOURS,SECONDS - (DAYS * 86400) AS REM_SEC FROM ( SELECT SECONDS, FLOOR(SECONDS/86400) AS DAYS FROM RESULT_IN_SECONDS )) ) ) )As there is no interval data type in Google BigQuery so above manipulation can be performed to get same 'Interval' output as we get in Teradata
Search strings with text '_AB' treating '_' as data not a wildcard character select * from ( select 'GGG_ABC' AS TXT )A where TXT LIKE '%\_AB%' ESCAPE '\' select * from ( select 'GGG_ABC' AS TXT )A where TXT LIKE '%\\_AB%'
Date Difference/Subtraction select date '2019-08-26' - date '2019-08-25' Default behavior is to return difference in days select DATE_DIFF(DATE '2019-08-26', DATE '2019-08-25',DAY) In BigQuery we have to specify DAY as second parameter to get difference in Days
 
On this page

Ready to unlock value from your data?

With Pythian, you can accomplish your data transformation goals and more.