Microsoft Azure Technical Track

Examining Teradata To Google BigQuery Migration

2 min read

Aug 29, 2019 12:00:00 AM

Cloud migration is hot nowadays. Enterprises are considering options to migrate on-premises data and applications to cloud (AWS/GCP/Azure) to get the benefits of quick deployments, pay-per-use models and flexibility. Recently, I got a chance to work on data migration from Teradata to Google BigQuery where the task at hand was to convert Teradata BTEQ's to BigQuery SQLs. SQL, on a higher level, stays the same with few differences. In this post, I will cover a few generic SQLs (written in Teradata) with a conversion to Google BigQuery just to give a glimpse of syntax differences between two platforms as all the differences cannot be covered in one blog post.

Task	Teradata SQL	BigQuery SQL
Select Rows with a particular rank	SELECT col1, col2 FROM <DB>.<TBL> QUALIFY ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY <DATE> DESC) = 1	SELECT * EXCEPT(rank) FROM ( SELECT col1, col2, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY <DATE> DESC) as rank FROM <DB>.<TBL> ) where rank=1
Concatenate multiple data types	select ' String 1 ' \|\| ' String 2 '; --Multiple data types can be concatenated e.g. integer, varchar, decimal etc. select 'String 1 '\|\| 2	select concat(' String 1 ',' String 2 '); -- with limitation, can only concat STRING or BYTES OR cast data types first and then concatenate select concat('String 1 ',cast(2 as STRING));
Truncate Table	Delete from <DB>.<TBL> OR Delete from <DB>.<TBL> ALL	Delete from <DB>.<TBL> WHERE TRUE
Combine two result sets with no duplicate row	SELECT * from <DB>.<TBL1> union SELECT * from <DB>.<TBL2>	SELECT * from <DB>.<TBL1> union distinct SELECT * from <DB>.<TBL2>
Timestamp Difference to get Interval	select (timestamp '2019-08-26 10:50:00.000000' - timestamp '2019-08-25 07:40:00.000000') DAY(4) to SECONDOutput:1 03:10:00.000000 where: 1 = day(s) 03 = Hour(s) 10 = Minute(s) 00 = Second(s) 000000 = Milliseconds	WITH RESULT_IN_SECONDS AS ( SELECT TIMESTAMP_DIFF(TIMESTAMP '2019-08-26 10:50:00.000000',TIMESTAMP '2019-08-25 07:40:00.000000',SECOND) AS SECONDS ) SELECT CONCAT(CAST(DAYS AS STRING),' ',HOURS,':',MINUTES,':',SECONDS,'.000000') FROM ( SELECT CAST(DAYS AS INT64) AS DAYS , FORMAT("%02d",CAST(HOURS AS INT64)) AS HOURS, FORMAT("%02d",CAST(MINUTES AS INT64)) AS MINUTES, FORMAT("%02d",CAST((REM_SEC_2 - (MINUTES * 60)) AS INT64)) AS SECONDS FROM ( SELECT DAYS,HOURS, FLOOR(REM_SEC_2/60) AS MINUTES, REM_SEC_2 FROM ( SELECT DAYS,HOURS, (REM_SEC - (HOURS * 3600 ) ) AS REM_SEC_2 FROM ( SELECT DAYS,FLOOR((SECONDS - (DAYS * 86400))/3600) AS HOURS,SECONDS - (DAYS * 86400) AS REM_SEC FROM ( SELECT SECONDS, FLOOR(SECONDS/86400) AS DAYS FROM RESULT_IN_SECONDS )) ) ) )As there is no interval data type in Google BigQuery so above manipulation can be performed to get same 'Interval' output as we get in Teradata
Search strings with text '_AB' treating '_' as data not a wildcard character	select * from ( select 'GGG_ABC' AS TXT )A where TXT LIKE '%\_AB%' ESCAPE '\'	select * from ( select 'GGG_ABC' AS TXT )A where TXT LIKE '%\\_AB%'
Date Difference/Subtraction	select date '2019-08-26' - date '2019-08-25' Default behavior is to return difference in days	select DATE_DIFF(DATE '2019-08-26', DATE '2019-08-25',DAY) In BigQuery we have to specify DAY as second parameter to get difference in Days

On this page

Ready to unlock value from your data?

With Pythian, you can accomplish your data transformation goals and more.

Speak with Pythian consultants now →

What We Do

Examining Teradata To Google BigQuery Migration

Share this

Share this

More resources

Comparing Data Science at an AI Startup and a Consulting Company

Datascape Podcast Episode 47: March 2021 Cloud Update Show

How Google BigQuery can help marketing teams do more with their data

Ready to unlock value from your data?