Examining Teradata To Google BigQuery Migration
Cloud migration is hot nowadays. Enterprises are considering options to migrate on-premises data and applications to cloud (AWS/GCP/Azure) to get the benefits of quick deployments, pay-per-use models and flexibility. Recently, I got a chance to work on data migration from Teradata to Google BigQuery where the task at hand was to convert Teradata BTEQ's to BigQuery SQLs. SQL, on a higher level, stays the same with few differences. In this post, I will cover a few generic SQLs (written in Teradata) with a conversion to Google BigQuery just to give a glimpse of syntax differences between two platforms as all the differences cannot be covered in one blog post.
Task | Teradata SQL | BigQuery SQL |
Select Rows with a particular rank | SELECT col1, col2 FROM <DB>.<TBL> QUALIFY ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY <DATE> DESC) = 1 | SELECT * EXCEPT(rank) FROM ( SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY <DATE> DESC) as rank FROM <DB>.<TBL> ) where rank=1 |
Concatenate multiple data types | select ' String 1 ' || ' String 2 '; --Multiple data types can be concatenated e.g. integer, varchar, decimal etc. select 'String 1 '|| 2 | select concat(' String 1 ',' String 2 '); -- with limitation, can only concat STRING or BYTES OR cast data types first and then concatenate select concat('String 1 ',cast(2 as STRING)); |
Truncate Table | Delete from <DB>.<TBL> OR Delete from <DB>.<TBL> ALL | Delete from <DB>.<TBL> WHERE TRUE |
Combine two result sets with no duplicate row | SELECT * from <DB>.<TBL1> union SELECT * from <DB>.<TBL2> | SELECT * from <DB>.<TBL1> union distinct SELECT * from <DB>.<TBL2> |
Timestamp Difference to get Interval | select (timestamp '2019-08-26 10:50:00.000000' - timestamp '2019-08-25 07:40:00.000000') DAY(4) to SECONDOutput:1 03:10:00.000000 where: 1 = day(s) 03 = Hour(s) 10 = Minute(s) 00 = Second(s) 000000 = Milliseconds | WITH RESULT_IN_SECONDS AS ( SELECT TIMESTAMP_DIFF(TIMESTAMP '2019-08-26 10:50:00.000000',TIMESTAMP '2019-08-25 07:40:00.000000',SECOND) AS SECONDS ) SELECT CONCAT(CAST(DAYS AS STRING),' ',HOURS,':',MINUTES,':',SECONDS,'.000000') FROM ( SELECT CAST(DAYS AS INT64) AS DAYS , FORMAT("%02d",CAST(HOURS AS INT64)) AS HOURS, FORMAT("%02d",CAST(MINUTES AS INT64)) AS MINUTES, FORMAT("%02d",CAST((REM_SEC_2 - (MINUTES * 60)) AS INT64)) AS SECONDS FROM ( SELECT DAYS,HOURS, FLOOR(REM_SEC_2/60) AS MINUTES, REM_SEC_2 FROM ( SELECT DAYS,HOURS, (REM_SEC - (HOURS * 3600 ) ) AS REM_SEC_2 FROM ( SELECT DAYS,FLOOR((SECONDS - (DAYS * 86400))/3600) AS HOURS,SECONDS - (DAYS * 86400) AS REM_SEC FROM ( SELECT SECONDS, FLOOR(SECONDS/86400) AS DAYS FROM RESULT_IN_SECONDS )) ) ) )As there is no interval data type in Google BigQuery so above manipulation can be performed to get same 'Interval' output as we get in Teradata |
Search strings with text '_AB' treating '_' as data not a wildcard character | select * from ( select 'GGG_ABC' AS TXT )A where TXT LIKE '%\_AB%' ESCAPE '\' | select * from ( select 'GGG_ABC' AS TXT )A where TXT LIKE '%\\_AB%' |
Date Difference/Subtraction | select date '2019-08-26' - date '2019-08-25' Default behavior is to return difference in days | select DATE_DIFF(DATE '2019-08-26', DATE '2019-08-25',DAY) In BigQuery we have to specify DAY as second parameter to get difference in Days |
Share this
Previous story
← How To Reclaim ASM Disks
You May Also Like
These Related Stories
Comparing Data Science at an AI Startup and a Consulting Company
![](https://www.pythian.com/hubfs/Imported_Blog_Media/data-scientist-team.jpg)
Comparing Data Science at an AI Startup and a Consulting Company
Aug 6, 2019
2
min read
Changes to DBCA patch application behaviour causes PDB cloning to fail
Changes to DBCA patch application behaviour causes PDB cloning to fail
Mar 11, 2016
12
min read
Free and Easy Schema diff for Oracle
Free and Easy Schema diff for Oracle
Oct 9, 2009
2
min read
No Comments Yet
Let us know what you think