I am unable to apply a proper regex on customtarget column in bigquery.
With normal MSSQL:
SELECT * from mytable where CustomTargeting like = '%u=%' -- is all okay
With Bigquery(legacy-sql) :
SELECT REGEXP_EXTRACT(CustomTargeting, r'[^u=\d]') as validate_usersfrom [project:dataset.impressions_4213_20181112] Limit 10
Error:
Exactly one capturing group must be specified
Update:
Yet couldn't get substring u ='anystring'
How can I extract data where CustomTargeting ='%u=somestring%'?
Best Answer
For BigQuery Legacy SQL
In SELECT
statement list you can useSELECT REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)')
In WHERE
clause - you can useWHERE REGEXP_MATCH(CustomTargeting, r'(?:^|;)u=(\d*)')
So, you query can look like
#legacySQLSELECT CustomTargeting, REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)') FROM [project:dataset.impressions_4213_20181112]WHERE REGEXP_MATCH(CustomTargeting, r'(?:^|;)u=(\d*)')
For BigQuery Standard SQL
Same for SELECT
But different for WHERE
- WHERE REGEXP_CONTAINS(CustomTargeting, r'(?:^|;)u=(\d*)')
#standardSQLSELECT CustomTargeting, REGEXP_EXTRACT(CustomTargeting, r'(?:^|;)u=(\d*)') FROM `project.dataset.impressions_4213_20181112`WHERE REGEXP_CONTAINS(CustomTargeting, r'(?:^|;)u=(\d*)')
Update - To address provided data example:
Regular expression updated from r'^u=(\d*)')
to r'(?:^|;)u=(\d*)')
- hope it is self-descriptive, but if not - it makes match to be searched either at the begonning of string or after ;