I am newbie to TD world. Can anyone please help me understand difference between TPA and non TPA node?
Also as per my understanding, TD uses PDE to implement vprocs. But Vprocs are just software (piece of code). and I read that there is no physical connection between processor and vproc so physically how TD achives parallelism when theres just single physical processor on node. Does it use any time scheduled algorithm (e.g round robin) on processor at node which gives a feeling of vprocs and parallel processing? Plese help me understand this concept thoroughly.
Please do correct me if I am wrong somewhere.
A TPA node and a non-TPA node can be differentiated by the fact that a TPA node (Trusted Parallel Architecture node) is running PDE (Parallel Database Extensions). The PDE layer is what implements the different vprocs types (AMP, PE, GTW, TVS, etc.) and supports the communications infrastructure between the virtual processors (the Bynet) and the I/O to the storage subsystem.
If a node is not running PDE (if it is not installed or is at a PDE state of DOWN/HARDSTOP or DOWN/STANDBY) then that node is not participating in the current "configuration". A vproc may be eligible to be migrated to a different node in the same clique if it's defined node is not available at the time as the database configuration is built (as may be the case when there is a hardware failure or node panic).
Most of the node types that run Teradata utilize the SUSE Linux operating system. The Linux scheduler is used to schedule run time on the node's processors for all of the Linux and PDE-managed processes and threads (in PDE these threads are called "tasks"). Most modern Teradata nodes have 8 to 24 processors available and up to 512 GB of RAM.
I encourage you to download and review the Teradata publication "Introduction to Teradata" (Publication #: B035-1091-151K). This publication should be able to answer most of your questiolns on Teradata's architecture.
The point I am not getting is the communication between vprocs and processors on Node on physical layer and scheduling of multiple user queries via vprocs on Node processors to get parallelism effect.
Regarding "communication between vprocs and processors on Node on physical layer," a vproc (software) does not "communicate" with a processor (CPU or core); rather, the processor (core) executes the vproc (code). The Scheduler is software in the database system and in the operating system (Linux) that determines when to dispatch each of the vprocs and on which cores.
There are three levels of parallel operation. Within a node there are several physical processors or cores, and they can all be executing something at the same time; this is one level. Each core can also be executing several tasks (some of which are vprocs) at the same time because when one of them is waiting (e.g. for an I/O or a lock) another one can be executing logic; that is another level pf parallelism. These two forms of parallelism do not scale linearly, however, because if enough tasks are added to a node they start to step on each others' toes, resulting in waits. Thus Teradata Engineering does extensive research to determine the optimal combinations of cores, memory sizes, I/O configuration and number of AMPs for each different type of node and core.
The third level of parallelism, and what makes Teradata linearly scalable, is the BYnet and shared-nothing architecture. The BYnet is proprietary software (and traditionally hardware) that the vprocs use to communicate with each other. This largely involves scheduling and running execution plans for SQL statements, with each vproc doing its own subset of the work in parallel. So back to the original point, vprocs do communicate amongst themselves, but they execute on physical processors. I hope this helps clarify things a little.