JSON Shredding nested array

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
Teradata Employee

JSON Shredding nested array

Hi,

 

Considering a json with nested arrays :

 

{ "id": 1 ,
              "librairies" : [
                           { "id": 100 , "books : [
          { "id":  2082 , "authors" : [ { "id":1 , "name":"tolkien" } ] },
          { "id":  4947, "authors" : [ { "id":2 , "name":"kimball" } ] }
                                                          ],
  { "id": 556, "books : [
          { "id":  4455, "authors" : [ { "id":10 , "name":"merle" } ] },
          { "id":  4947, "authors" : [ { "id":2 , "name":"kimball" } ] }
                                                          ]
] }

Array of authors within array of books within array of librairies.

 

Using Teradata functions, how Can I shred all  level at the same time to get an output like this :

library_id | book_id | author_name

?

 

Is the only wait to nest JSON_TABLE functions ? thx

5 REPLIES 5
Teradata Employee

Re: JSON Shredding nested array

Hi Maxime,

 

I'm in line wth your findings, only solution I found was nesting JSON_TABLE.

Data

create multiset volatile table mvt_json, no log
( mvt_pi    byteint
, col       json(3000) inline length 3000
)
unique primary index (mvt_pi)
on commit preserve rows;

insert into mvt_json values (0, new json('{ "id": 1 ,
              "librairies": [
                           { "id": 100 , "books": [
          { "id":  2082 , "authors" : [ { "id":1 , "name":"tolkien" } ] },
          { "id":  4947, "authors" : [ { "id":2 , "name":"kimball" } ] }
                                                          ]},
  { "id": 556, "books": [
          { "id":  4455, "authors": [ { "id":10 , "name":"merle" } ] },
          { "id":  4947, "authors": [ { "id":2 , "name":"kimball" }, { "id":3 , "name":"toto" } ] }
                                                          ]}
] }', latin));

collect stats column (mvt_pi) on mvt_json;

Query

with cte_librairies (row_id, json_id, librairy_id, books) as
(
select row_id, json_id, librairy_id, books
  from JSON_Table
       (  on (select mvt_pi, col from mvt_json)
       using rowexpr('$.librairies[*]')
             colexpr( '[ {"jsonpath" : "$.id"   , "type" : "integer", "fromRoot" : true}
                       , {"jsonpath" : "$.id"   , "type" : "integer"}
                       , {"jsonpath" : "$.books", "type" : "varchar(5000)"}
                       ]')
       ) as lib (row_id, json_id, librairy_id, books)
)
  ,  cte_books (row_id, authors, json_id, librairy_id, book_id) as
(
select row_id, authors, json_id, librairy_id, book_id
  from JSON_Table
       (  on (select row_id, cast(books as json) as books, json_id, librairy_id from cte_librairies)
       using rowexpr('$')
             colexpr( '[ {"jsonpath" : "$.id"     , "type" : "integer"}
                       , {"jsonpath" : "$.authors", "type" : "varchar(5000)"}
                       ]')
       ) as boo (row_id, book_id, authors, json_id, librairy_id)
)
select row_id, json_id, librairy_id, book_id, author_id, author_name
  from JSON_Table
       (  on (select row_id, cast(authors as json) as authors, json_id, librairy_id, book_id from cte_books)
       using rowexpr('$')
             colexpr( '[ {"jsonpath" : "$.id"  , "type" : "integer"}
                       , {"jsonpath" : "$.name", "type" : "varchar(10)"}
                       ]')
       ) as aut (row_id, author_id, author_name, json_id, librairy_id, book_id);

row_id  json_id  librairy_id  book_id  author_id  author_name
------  -------  -----------  -------  ---------  -----------
     0        1          100     2082          1  tolkien
     0        1          100     4947          2  kimball
     0        1          556     4455         10  merle
     0        1          556     4947          2  kimball
     0        1          556     4947          3  toto

I was hoping to get relevant id from parent levels starting with this path

$.librairies[*].books[*].authors[*]

But it seems there no such implementation in JsonPath.

Teradata Employee

Re: JSON Shredding nested array

hi,

 

thanks for helping, that's what I was guessing. This part here is gonna be a problem for me :

 

                       , {"jsonpath" : "$.authors", "type" : "varchar(5000)"}

You always have to define temporary datatype like this one for inner arrays and -according to manual - json_table output columns are limited to non-LOB character types.(meaning varchar(64000) latin cs maximum).

 

with my current project, I may find inner arrays higher than 64ko...Guess the shredding is going to be outside Teradata...

 

I'll try asking engineering and give a follow up here..

A+

Teradata Employee

Re: JSON Shredding nested array

Yep, that's a limit.

No luck with JSON_SHRED_BATCH - or the bug you encountered is problematic?

Teradata Employee

Re: JSON Shredding nested array

Didn't try JSON_SHRED_BATCH , but manual says the sames about output datatypes.

 

Doesn't seems to be a bug, but a (lack of) feature !

 

Maybe I could scratch my head splitting the json, applying regexp and all but doesn't seems worth it.

 

Teradata Employee

Re: JSON Shredding nested array

following your sample, found another way to split nested arrays (without size issue)...

 

select
X.mvt_pi,col.JSONExtract('$.'||a.JSONKeys)  as betslip_splitted
FROM  
(SELECT mvt_pi,col FROM mvt_json  ) X 
,(SELECT distinct(JSONKeys) as  JSONKeys from JSON_KEYS(ON (
SELECT col FROM mvt_json

) USING QUOTES('N'))AS JSON_DATA
where JSONKeys like '%authors[%]'
and JSONKeys not like '%authors[%].%'
) a
where betslip_splitted is not null
order by X.mvt_pi;

Had to use Jsonextract method since it's the only one able to output LOB datatype...

 

The ugly part is the product join between Jsonkeys and the base table... sadly jsonkeys doesn't output extra columns like the id of the row...

 

If this work within an "after insert" trigger, this may be less ugly...