What does the life of a query in QueryGrid look like?

QueryGrid
Highlighted
Teradata Employee

What does the life of a query in QueryGrid look like?

I'm looking to understand the lifecycle of a query in querygrid.  What are the steps it goes through?


Accepted Solutions
Teradata Employee

Re: What does a query life cycle look like?

A query has 2 main phases.

  1. Metadata phase
    1. Query is submitted to SQL parsing component (e.g. PE, coordinator, HiveServer2)
    2. Initiator/Contract function is invoked and connects to the QGL process on the node
    3. Local QGL connects to a single QGL process on the remote system
    4. Remote QGL starts driver on node
    5. Remote driver submits metadata SQL to remote SQL parsing component
    6. Driver receives metadata query result, sends it back through the remote QGL to the local QGL which returns it to the local SQL parsing component
    7. Local SQL parsing component sends metadata information to all workers (e.g. AMP, Presto worker thread, Hive executor)
  2. Execution and data transfer phase
    1. Each worker spins up an instance of the QG connector local function which connect to the QGL local to the worker’s node
    2. Remote driver issues data movement query (i.e. the query equivalent to user’s SQL request) to remote SQL parsing component
    3. Remote system performs normal execution of query and spins up an instance of the QG connector remote function
    4. As the last step of the query execution, the results are passed as input to the QG connector remote function, which translates data in a system-specific format to the universal QG format
    5. Each instance of the connector remote function (equal to number of workers) writes to their own buffer it shares with the QGL (by default – the number of buffers per node can be configured). 
    6. When the buffer is full, it signals the QGL which reads the buffer and sends the buffer to a local node QGL.
      1. The number of buffers a remote QGL sends concurrently is configured via the transfer concurrency property.
      2. When the next worker’s (on the same node) buffer is full and signals the remote QGL, it will send this buffer to a different local QGL (round robin).
    7. A local QGL thread reads the buffer from the network and passes it on to a different thread to write to a shared memory buffer, then signals one of the instances of the local connector function
    8. The local connector function instance reads the buffer and feeds it to its controlling worker for further processing (e.g. joins, aggregations, etc)
  3. QG_query_lifecycle.png
1 ACCEPTED SOLUTION
2 REPLIES
Teradata Employee

Re: What does a query life cycle look like?

A query has 2 main phases.

  1. Metadata phase
    1. Query is submitted to SQL parsing component (e.g. PE, coordinator, HiveServer2)
    2. Initiator/Contract function is invoked and connects to the QGL process on the node
    3. Local QGL connects to a single QGL process on the remote system
    4. Remote QGL starts driver on node
    5. Remote driver submits metadata SQL to remote SQL parsing component
    6. Driver receives metadata query result, sends it back through the remote QGL to the local QGL which returns it to the local SQL parsing component
    7. Local SQL parsing component sends metadata information to all workers (e.g. AMP, Presto worker thread, Hive executor)
  2. Execution and data transfer phase
    1. Each worker spins up an instance of the QG connector local function which connect to the QGL local to the worker’s node
    2. Remote driver issues data movement query (i.e. the query equivalent to user’s SQL request) to remote SQL parsing component
    3. Remote system performs normal execution of query and spins up an instance of the QG connector remote function
    4. As the last step of the query execution, the results are passed as input to the QG connector remote function, which translates data in a system-specific format to the universal QG format
    5. Each instance of the connector remote function (equal to number of workers) writes to their own buffer it shares with the QGL (by default – the number of buffers per node can be configured). 
    6. When the buffer is full, it signals the QGL which reads the buffer and sends the buffer to a local node QGL.
      1. The number of buffers a remote QGL sends concurrently is configured via the transfer concurrency property.
      2. When the next worker’s (on the same node) buffer is full and signals the remote QGL, it will send this buffer to a different local QGL (round robin).
    7. A local QGL thread reads the buffer from the network and passes it on to a different thread to write to a shared memory buffer, then signals one of the instances of the local connector function
    8. The local connector function instance reads the buffer and feeds it to its controlling worker for further processing (e.g. joins, aggregations, etc)
  3. QG_query_lifecycle.png
Teradata Employee

Re: What does the life of a query in QueryGrid look like?

There are 2 phases to a query.

  1. Metadata phase
    1. Query is submitted to SQL parsing component (e.g. PE, coordinator, HiveServer2)
    2. Initiator/Contract function is invoked and connects to the QGL process on the node
    3. Local QGL connects to a single QGL process on the remote system
    4. Remote QGL starts driver on node
    5. Remote driver submits metadata SQL to remote SQL parsing component
    6. Driver receives metadata query result, sends it back through the remote QGL to the local QGL which returns it to the local SQL parsing component
    7. Local SQL parsing component sends metadata information to all workers (e.g. AMP, Presto worker thread, Hive executor)
  2. Execution phase
    1. Each worker spins up an instance of the QG connector local function which connect to the QGL local to the worker’s node
    2. Remote driver issues data movement query (i.e. the query equivalent to user’s SQL request) to remote SQL parsing component
    3. Remote system performs normal execution of query and spins up an instance of the QG connector remote function
    4. As the last step of the query execution, the results are passed as input to the QG connector remote function, which translates data in a system-specific format to the universal QG format
    5. Each instance of the connector remote function (equal to number of workers) writes to their own buffer it shares with the QGL (by default – the number of buffers per node can be configured). 
    6. When the buffer is full, it signals the QGL which reads the buffer and sends the buffer to a local node QGL.
      1. The number of buffers a remote QGL sends concurrently is configured via the transfer concurrency property.
      2. When the next worker’s (on the same node) buffer is full and signals the remote QGL, it will send this buffer to a different local QGL (round robin).
    7. A local QGL thread reads the buffer from the network and passes it on to a different thread to write to a shared memory buffer, then signals one of the instances of the local connector function
    8. The local connector function instance reads the buffer and feeds it to its controlling worker for further processing (e.g. joins, aggregations, etc)