Enabling Anaconda Python on Aster

Learn Data Science
Not applicable

phthon

Goal

Several people asked me about the best way to install python and additional packages on their Aster environment.

Aster relies heavily on Python for its internal code and processing. For that reason we must never touch the base install in the /home/beehive/toolchain directories. This post describes the correct procedure for installing your own python including the deep learning packages theano, tensorflow and keras.

We leverage the standard Anaconda distribution because it includes around 500 python packages out of the box:

Anaconda package list | Continuum Analytics: Documentation 

 

Installation Process

1. Download Python 3.6 Anaconda from https://www.continuum.io/downloads

 

 

 

2. Perform these installation steps on the queen and all workers

 

Note: while the queen does not require a python install, it is suggested to perform the same steps on all nodes

 

 

Assumptions:

  • we will NOT use the root account to install python
  • anaconda will be installed under a new user "pythonu" to avoid any possible conflicts.
  • the pythonu account will not have a home directory.
  • anaconda will be installed in the /opt file system because other aster/teradata tools live there
  • the shell environment variables will not be modified
  • anaconda offers an unattended installation, please refer to continuum.io website for instructions since our example uses the interactive installation method
  • the python directory structure will be owned by the extensibility account because the Aster MR functions execute under that low privilege user

Installation steps, to be repeated for all workers:

 

<login as root>

useradd pythonu                                                                        # create new user account to install python

cd /opt

mkdir anaconda

chown pythonu:users /opt/anaconda

ncli node clonefile /tmp/Anaconda3-4.3.1-Linux-x86_64.sh               # copy anaconda distribution to all worker nodes

su - pythonu

cd /tmp

bash Anaconda3-4.3.1-Linux-x86_64.sh                                           # perform  interactive installation

 

 

 

It is crucial to answer the prompts correctly:

 

Directory: /opt/anaconda/ana3

When prompted to update .bashrc with a new PATH always answer No

 

 

Sample Installation output:

 

Welcome to Anaconda3 4.3.1 (by Continuum Analytics, Inc.)

 

In order to continue the installation process, please review the license

agreement.

Please, press ENTER to continue

>>>

================

Anaconda License

================

 

Copyright 2016, Continuum Analytics, Inc.

 

All rights reserved under the 3-clause BSD License:

 

Redistribution and use in source and binary forms, with or without

modification, are permitted provided that the following conditions are met:

 

* Redistributions of source code must retain the above copyright notice,

this list of conditions and the following disclaimer.

 

* Redistributions in binary form must reproduce the above copyright notice,

this list of conditions and the following disclaimer in the documentation

and/or other materials provided with the distribution.

 

* Neither the name of Continuum Analytics, Inc. nor the names of its

contributors may be used to endorse or promote products derived from this

software without specific prior written permission.

 

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"

AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE

IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE

ARE DISCLAIMED. IN NO EVENT SHALL CONTINUUM ANALYTICS, INC. BE LIABLE FOR

ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL

DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR

SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER

CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT

LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY

OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH

DAMAGE.

 

Notice of Third Party Software Licenses

=======================================

 

Anaconda contains open source software packages from third parties. These

are available on an "as is" basis and subject to their individual license

agreements. These licenses are available in Anaconda or at

http://docs.continuum.io/anaconda/pkg-docs . Any binary packages of these

third party tools you obtain via Anaconda are subject to their individual

licenses as well as the Anaconda license. Continuum reserves the right to

change which third party tools are provided in Anaconda.

 

In particular, Anaconda contains re-distributable, run-time, shared-library

files from the Intel (TM) Math Kernel Library ("MKL binaries"). You are

specifically authorized to use the MKL binaries with your installation of

Anaconda. You are also authorized to redistribute the MKL binaries with

Anaconda or in the conda package that contains them. If needed,

instructions for removing the MKL binaries after installation of Anaconda

are available at http://www.continuum.io.

 

Cryptography Notice

===================

This distribution includes cryptographic software. The country in which you

currently reside may have restrictions on the import, possession, use,

and/or re-export to another country, of encryption software. BEFORE using

any encryption software, please check your country's laws, regulations and

policies concerning the import, possession, or use, and re-export of

encryption software, to see if this is permitted. See the Wassenaar

Arrangement <http://www.wassenaar.org/> for more information.

 

Continuum Analytics has self-classified this software as Export Commodity

Control Number (ECCN) 5D002.C.1, which includes information security

software using or performing cryptographic functions with asymmetric

algorithms. The form and manner of this distribution makes it eligible for

export under the License Exception ENC Technology Software Unrestricted

(TSU) exception (see the BIS Export Administration Regulations, Section

740.13) for both object code and source code.

 

The following packages are included in this distribution that relate to

cryptography:

 

openssl

The OpenSSL Project is a collaborative effort to develop a robust,

commercial-grade, full-featured, and Open Source toolkit implementing the

Transport Layer Security (TLS) and Secure Sockets Layer (SSL) protocols as

well as a full-strength general purpose cryptography library.

 

pycrypto

A collection of both secure hash functions (such as SHA256 and RIPEMD160),

and various encryption algorithms (AES, DES, RSA, ElGamal, etc.).

 

pyopenssl

A thin Python wrapper around (a subset of) the OpenSSL library.

 

kerberos (krb5, non-Windows platforms)

A network authentication protocol designed to provide strong authentication

for client/server applications by using secret-key cryptography.

 

cryptography

A Python library which exposes cryptographic recipes and primitives.

 

Do you approve the license terms? [yes|no]

>>> yes

 

Anaconda3 will now be installed into this location:

/home/pythonu/anaconda3

 

  - Press ENTER to confirm the location

  - Press CTRL-C to abort the installation

  - Or specify a different location below

 

[/home/pythonu/anaconda3] >>> /opt/anaconda/ana3

PREFIX=/opt/anaconda/ana3

installing: python-3.6.0-0 ...

installing: _license-1.1-py36_1 ...

installing: alabaster-0.7.9-py36_0 ...

installing: anaconda-client-1.6.0-py36_0 ...

installing: anaconda-navigator-1.5.0-py36_0 ...

installing: anaconda-project-0.4.1-py36_0 ...

installing: astroid-1.4.9-py36_0 ...

installing: astropy-1.3-np111py36_0 ...

installing: babel-2.3.4-py36_0 ...

installing: backports-1.0-py36_0 ...

installing: beautifulsoup4-4.5.3-py36_0 ...

installing: bitarray-0.8.1-py36_0 ...

installing: blaze-0.10.1-py36_0 ...

installing: bokeh-0.12.4-py36_0 ...

installing: boto-2.45.0-py36_0 ...

installing: bottleneck-1.2.0-np111py36_0 ...

installing: cairo-1.14.8-0 ...

installing: cffi-1.9.1-py36_0 ...

installing: chardet-2.3.0-py36_0 ...

installing: chest-0.2.3-py36_0 ...

installing: click-6.7-py36_0 ...

installing: cloudpickle-0.2.2-py36_0 ...

installing: clyent-1.2.2-py36_0 ...

installing: colorama-0.3.7-py36_0 ...

installing: configobj-5.0.6-py36_0 ...

installing: contextlib2-0.5.4-py36_0 ...

installing: cryptography-1.7.1-py36_0 ...

installing: curl-7.52.1-0 ...

installing: cycler-0.10.0-py36_0 ...

installing: cython-0.25.2-py36_0 ...

installing: cytoolz-0.8.2-py36_0 ...

installing: dask-0.13.0-py36_0 ...

installing: datashape-0.5.4-py36_0 ...

installing: dbus-1.10.10-0 ...

installing: decorator-4.0.11-py36_0 ...

installing: dill-0.2.5-py36_0 ...

installing: docutils-0.13.1-py36_0 ...

installing: entrypoints-0.2.2-py36_0 ...

installing: et_xmlfile-1.0.1-py36_0 ...

installing: expat-2.1.0-0 ...

installing: fastcache-1.0.2-py36_1 ...

installing: flask-0.12-py36_0 ...

installing: flask-cors-3.0.2-py36_0 ...

installing: fontconfig-2.12.1-2 ...

installing: freetype-2.5.5-2 ...

installing: get_terminal_size-1.0.0-py36_0 ...

installing: gevent-1.2.1-py36_0 ...

installing: glib-2.50.2-1 ...

installing: greenlet-0.4.11-py36_0 ...

installing: gst-plugins-base-1.8.0-0 ...

installing: gstreamer-1.8.0-0 ...

installing: h5py-2.6.0-np111py36_2 ...

installing: harfbuzz-0.9.39-2 ...

installing: hdf5-1.8.17-1 ...

installing: heapdict-1.0.0-py36_1 ...

installing: icu-54.1-0 ...

installing: idna-2.2-py36_0 ...

installing: imagesize-0.7.1-py36_0 ...

installing: ipykernel-4.5.2-py36_0 ...

installing: ipython-5.1.0-py36_0 ...

installing: ipython_genutils-0.1.0-py36_0 ...

installing: ipywidgets-5.2.2-py36_1 ...

installing: isort-4.2.5-py36_0 ...

installing: itsdangerous-0.24-py36_0 ...

installing: jbig-2.1-0 ...

installing: jdcal-1.3-py36_0 ...

installing: jedi-0.9.0-py36_1 ...

installing: jinja2-2.9.4-py36_0 ...

installing: jpeg-9b-0 ...

installing: jsonschema-2.5.1-py36_0 ...

installing: jupyter-1.0.0-py36_3 ...

installing: jupyter_client-4.4.0-py36_0 ...

installing: jupyter_console-5.0.0-py36_0 ...

installing: jupyter_core-4.2.1-py36_0 ...

installing: lazy-object-proxy-1.2.2-py36_0 ...

installing: libffi-3.2.1-1 ...

installing: libgcc-4.8.5-2 ...

installing: libgfortran-3.0.0-1 ...

installing: libiconv-1.14-0 ...

installing: libpng-1.6.27-0 ...

installing: libsodium-1.0.10-0 ...

installing: libtiff-4.0.6-3 ...

installing: libxcb-1.12-1 ...

installing: libxml2-2.9.4-0 ...

installing: libxslt-1.1.29-0 ...

installing: llvmlite-0.15.0-py36_0 ...

installing: locket-0.2.0-py36_1 ...

installing: lxml-3.7.2-py36_0 ...

installing: markupsafe-0.23-py36_2 ...

installing: matplotlib-2.0.0-np111py36_0 ...

installing: mistune-0.7.3-py36_0 ...

installing: mkl-2017.0.1-0 ...

installing: mkl-service-1.1.2-py36_3 ...

installing: mpmath-0.19-py36_1 ...

installing: multipledispatch-0.4.9-py36_0 ...

installing: nbconvert-4.2.0-py36_0 ...

installing: nbformat-4.2.0-py36_0 ...

installing: networkx-1.11-py36_0 ...

installing: nltk-3.2.2-py36_0 ...

installing: nose-1.3.7-py36_1 ...

installing: notebook-4.3.1-py36_0 ...

installing: numba-0.30.1-np111py36_0 ...

installing: numexpr-2.6.1-np111py36_2 ...

installing: numpy-1.11.3-py36_0 ...

installing: numpydoc-0.6.0-py36_0 ...

installing: odo-0.5.0-py36_1 ...

installing: openpyxl-2.4.1-py36_0 ...

installing: openssl-1.0.2k-1 ...

installing: pandas-0.19.2-np111py36_1 ...

installing: partd-0.3.7-py36_0 ...

installing: path.py-10.0-py36_0 ...

installing: pathlib2-2.2.0-py36_0 ...

installing: patsy-0.4.1-py36_0 ...

installing: pcre-8.39-1 ...

installing: pep8-1.7.0-py36_0 ...

installing: pexpect-4.2.1-py36_0 ...

installing: pickleshare-0.7.4-py36_0 ...

installing: pillow-4.0.0-py36_0 ...

installing: pip-9.0.1-py36_1 ...

installing: pixman-0.34.0-0 ...

installing: ply-3.9-py36_0 ...

installing: prompt_toolkit-1.0.9-py36_0 ...

installing: psutil-5.0.1-py36_0 ...

installing: ptyprocess-0.5.1-py36_0 ...

installing: py-1.4.32-py36_0 ...

installing: pyasn1-0.1.9-py36_0 ...

installing: pycosat-0.6.1-py36_1 ...

installing: pycparser-2.17-py36_0 ...

installing: pycrypto-2.6.1-py36_4 ...

installing: pycurl-7.43.0-py36_2 ...

installing: pyflakes-1.5.0-py36_0 ...

installing: pygments-2.1.3-py36_0 ...

installing: pylint-1.6.4-py36_1 ...

installing: pyopenssl-16.2.0-py36_0 ...

installing: pyparsing-2.1.4-py36_0 ...

installing: pyqt-5.6.0-py36_2 ...

installing: pytables-3.3.0-np111py36_0 ...

installing: pytest-3.0.5-py36_0 ...

installing: python-dateutil-2.6.0-py36_0 ...

installing: pytz-2016.10-py36_0 ...

installing: pyyaml-3.12-py36_0 ...

installing: pyzmq-16.0.2-py36_0 ...

installing: qt-5.6.2-3 ...

installing: qtawesome-0.4.3-py36_0 ...

installing: qtconsole-4.2.1-py36_1 ...

installing: qtpy-1.2.1-py36_0 ...

installing: readline-6.2-2 ...

installing: redis-3.2.0-0 ...

installing: redis-py-2.10.5-py36_0 ...

installing: requests-2.12.4-py36_0 ...

installing: rope-0.9.4-py36_1 ...

installing: ruamel_yaml-0.11.14-py36_1 ...

installing: scikit-image-0.12.3-np111py36_1 ...

installing: scikit-learn-0.18.1-np111py36_1 ...

installing: scipy-0.18.1-np111py36_1 ...

installing: seaborn-0.7.1-py36_0 ...

installing: setuptools-27.2.0-py36_0 ...

installing: simplegeneric-0.8.1-py36_1 ...

installing: singledispatch-3.4.0.3-py36_0 ...

installing: sip-4.18-py36_0 ...

installing: six-1.10.0-py36_0 ...

installing: snowballstemmer-1.2.1-py36_0 ...

installing: sockjs-tornado-1.0.3-py36_0 ...

installing: sphinx-1.5.1-py36_0 ...

installing: spyder-3.1.2-py36_0 ...

installing: sqlalchemy-1.1.5-py36_0 ...

installing: sqlite-3.13.0-0 ...

installing: statsmodels-0.6.1-np111py36_1 ...

installing: sympy-1.0-py36_0 ...

installing: terminado-0.6-py36_0 ...

installing: tk-8.5.18-0 ...

installing: toolz-0.8.2-py36_0 ...

installing: tornado-4.4.2-py36_0 ...

installing: traitlets-4.3.1-py36_0 ...

installing: unicodecsv-0.14.1-py36_0 ...

installing: wcwidth-0.1.7-py36_0 ...

installing: werkzeug-0.11.15-py36_0 ...

installing: wheel-0.29.0-py36_0 ...

installing: widgetsnbextension-1.2.6-py36_0 ...

installing: wrapt-1.10.8-py36_0 ...

installing: xlrd-1.0.0-py36_0 ...

installing: xlsxwriter-0.9.6-py36_0 ...

installing: xlwt-1.2.0-py36_0 ...

installing: xz-5.2.2-1 ...

installing: yaml-0.1.6-0 ...

installing: zeromq-4.1.5-0 ...

installing: zlib-1.2.8-3 ...

installing: anaconda-4.3.1-np111py36_0 ...

installing: conda-4.3.14-py36_0 ...

installing: conda-env-2.6.0-0 ...

Python 3.6.0 :: Continuum Analytics, Inc.

creating default environment...

installation finished.

Do you wish the installer to prepend the Anaconda3 install location

to PATH in your /home/pythonu/.bashrc ? [yes|no]

[no] >>> no

 

You may wish to edit your .bashrc or prepend the Anaconda3 install location:

 

$ export PATH=/opt/anaconda/ana3/bin:$PATH

 

Thank you for installing Anaconda3!

 

Share your notebooks and packages on Anaconda Cloud!

Sign up for free: https://anaconda.org

 

3. Setup virtual python environment

 

Anaconda easily supports switching between multiple python versions. The setup depends on the network connectivity of the Aster environment.

 

If the Aster system has internet access on the workers:

/opt/anaconda/ana3/bin/conda create -n python36 python=3.6 anaconda               # create new environment and install all anaconda packages

 

If the Aster system has no internet access on the workers:

/opt/anaconda/ana3/bin/conda create -n python36 --clone root          # clone the root environment

 

 

You can install multiple versions, for example: /opt/anaconda/ana3/bin/conda create --name python27 python=2.7.13

 

4. Activate virtual environment

 

 

<login as pythonu>

/opt/anaconda/ana3/bin/conda info --envs                     # review available environments

source /opt/anaconda/ana3/bin/activate python36               # activate python 3.6 environment

/opt/anaconda/ana3/bin/conda list                                # list all installed packages

 

 

 

 

5. Reset ownership of python directory structure

 

This step is required to allow Aster MR functions to properly access the new python environment.

 

su - chown -R extensibility:extensibility /opt/anaconda 

 

 

6. Perform test

 

To test our new installation we will install a python test script and invoke a sql script using act or Teradata Studio.

 

python_test.py:

 

 

#!/usr/bin/python

import sys

import getopt

import numpy as np

import pickle from sklearn.ensemble

import RandomForestClassifier

import pandas as pd

 

print ('hello' +'\t' + 'bye')

 

 

python_test.sql:

 

select *

from stream (

     on (select 1)

     script('source /opt/anaconda/ana3/bin/activate python36; python python_test.py')

     outputs('test1 varchar', 'test2 varchar')

)

;

 

 

Steps to test python install is correct:

  1. Save the python script in /tmp on the queen.
  2. Invoke act to install the file: act -U beehive -c "\install /tmp/python_test.py"
  3. Invoke act to run the sql test: act -U beehive -f python_test.sql

 

Expected output:

test1 | test2

-------+-------

hello  |  bye (1 rows)

 

 

If you get permission or module loading errors, review the previous steps and verify that you have correctly set the permissions on the python directory structure. Also verify that you are using the correct paths in the sql script.

 

 

Do not forget to repeat these steps for all the workers.

 

Installing additional python packages 

Activate your python environment as shown earlier and run these commands to add key deep learning packages on each worker:

conda install -c conda-forge theano=0.9.0

conda install -c conda-forge tensorflow=1.1.0

conda install -c conda-forge keras=2.0.2

References

 

Downloading Python

Managing Python

Managing Packages

Clipart