Guest Post

New Machine Learning Feature of Azure and Python – All You Need To Know

What is Azure?

Microsoft Azure is an ever-growing set of cloud services designed to help your organization meet the challenges it faces. It lets you build, manage, and deploy applications across a huge global network using your favorite infrastructures and tools.

machine learning with azure and python

Photo credit- Tutorial Graphics

What is Python?

Python is an open source, multi-platform programming language that is powerful and easy to master. It is widely used and supported.

Running Python machine learning scripts in Azure Machine Learning Studio 

This topic describes the design principles behind the current support for Python scripts in Azure Machine Learning.  The main features provided are presented by SEO agency and Magento Ecommerce agency, including:

  • Execution of basic usage scenarios
  • Rating of an experience in a web service
  • Support for importing existing code
  • Exporting visualizations
  • Supervised feature selection
  • Taking into account certain limitations

Python is an indispensable tool in the toolbox of many data scientists. It has:

  • a concise and elegant syntax
  • cross-platform support
  •  a large set of powerful libraries
  • Advanced development tools.

Python is used in all phases of a workflow commonly used in machine learning modeling: 

  • Ingestion and data processing
  • Building features
  • Model training
  • Validation of the models
  • Deployment of models

Azure Machine Learning Studio supports the integration of Python scripts into multiple parts of a machine learning experience as well as seamlessly publishing them as web services on Microsoft Azure.

 Principles of designing Python scripts in Machine Learning

The main Python interface in Azure Machine Learning Studio is accessible through the Execute Python Script module.

The Execute Python Script module in Azure ML Studio accepts up to three entries and generates up to two outputs (see next section), such as its R analog, the Execute R script module. The Python code, which must be executed, is entered in the parameter box as a specially named entry point function, called azureml_main.

Here are the keys design principles used to implement this module:

1. Must be idiomatic for Python users.

Most Python users factor their code as functions within modules. As a result, the insertion of many executable instructions into a higher level module is relatively rare. The script box also uses a specially named Python function rather than a sequence of instructions. The objects exposed in the function are standard Python library types, such as Pandas data frames and NumPy arrays.

2. Must have high fidelity between local executions and cloud executions.

The main server used to execute Python code is based on Anaconda, a widely used Python cross-platform scientific distribution. It comes with nearly 200 of the main Python packages. As a result, data scientists can debug and qualify their code in their local Anaconda environment compatible with Azure Machine Learning. They can then use an existing development environment, such as the IPython notebook or Python Tools for Visual Studio, to run the code as part of an Azure ML experiment. The azureml_main entry point is a vanilla Python function that  can be created without a specific Azure ML code or software development kit installed.

see also: How to Get Your First Data Science Job: Interview with Michael Galarnyk

3. Must be seamlessly composed of other Azure Machine Learning modules.

The Execute Python Script module accepts as inputs and outputs the standard Azure Machine Learning datasets. The underlying infrastructure effectively and seamlessly establishes a bridge between the Azure ML and Python runtimes.

Python can, therefore, be used in combination with existing Azure ML workflows, including those using R and SQLite.

A data scientist can compose workflows that:

  • use Python and Pandas for preprocessing and data cleansing
  • feed data for SQL transformation, joining multiple datasets for the purpose of forming functionalities
  • Form models using Azure Machine Learning algorithms
  • Evaluate and post-process the results using R.

Basic usage scenarios for Python scripts in ML

In this section, we look at some common uses of the Run Python Script module.

Inputs to the Python module are exposed as Pandas data frames.

The function must return a single Pandas data image packaged in a Python sequence such as a tuple, list, or NumPy array.

The first element of this sequence is then returned to the first output port of the module.  

Converting input and output types

Input datasets in Azure ML are converted to data frames in Pandas.

The output data frames are converted back into Azure ML datasets.

The conversions to execute are as follows:

1.   Numeric and string columns are converted as-is, and missing values in a data set are converted to “NA” values in Pandas. This conversion is also done in the other direction (NA values in Pandas are converted to missing values in Azure ML).

2.   Index vectors in Pandas are not supported in Azure ML. The input data arrays in the Python function always have a 64-bit numeric index between 0 and the number of rows minus 1.

3.   Azure ML datasets cannot have duplicate column names and non-string column names. If an output data frame contains non-numeric columns, the infrastructure calls str on column names. Similarly, duplicate column names are automatically truncated so that each name is unique. The suffix (2) is added to the first duplicate, the suffix (3) to the second, and so on.

Setting up Python scripts

The Execute Python Script modules used in a scoring experiment are called when they are published as a web service.

For example, the figure below illustrates a scoring experiment that holds the code to evaluate a single Python expression.

The Execute Python Script modules used in a scoring experiment to evaluate a single Python expression

Azure machine learning python script

Import existing Python script modules

Many data scientists integrate existing Python scripts into Azure ML experiments. Instead of requiring the concatenation and pasting of all the code in a single script box, the Python Script Runtime module accepts a zip file containing the Python modules at the third input port.

The file is decompressed by the runtime infrastructure at runtime, and the content is added to the Python interpreter’s library path. The function of the azureml_main entry point can then import these modules directly. Load the zip file as a dataset in Azure Machine Learning Studio. Then create and run an experiment that uses Python code in the Hello.zip file by attaching it to the third input port of the Python Script Execute module, as shown in this figure.

zip file as a dataset in Azure Machine Learning Studio

azure machine learning python programUsing visualizations

Charts created using MatplotLib, which can be viewed on the browser, can be returned by the Python Script Run module.

But graphics are not automatically redirected to images, such as when you use R.

Because of this, the user must explicitly save graphics as PNG files if they need to be returned in Azure Machine Learning.

To generate images from MatplotLib, you must follow the following procedure:

  •  Switch the main server to “AGG” from the default Qt based converter.
  •   create a new figure object
  • get the axis and generate on it all the graphs
  •   save the figure to a PNG file

This process is mentioned in the Figure  below, which creates a point cloud matrix using the scatter_matrix function in Pandas.

point cloud matrix using the scatter_matrix function in Pandas.

Advanced examples

The Anaconda environment installed in Azure Machine Learning contains common packages such as NumPy, SciPy, and Scikits-Learn. These packages can be used effectively for various data processing tasks in an automatic learning pipeline.

For example, the following experiments and script illustrate the use of set learners in Scikits-Learn to calculate the importance ratings of a dataset’s features.

The notations can be used to perform a supervised feature selection before being fed into another machine learning model. 

Limitations

The Execute Python Script module currently has the following limitations:

1. Execution in a sandbox.

The Python runtime is currently running in a sandbox and, therefore, does not allow access to the network or local file system permanently.

All locally saved files are isolated and deleted after the module finishes. Python code cannot access most directories on the machine that executes them, except for the current directory and its subdirectories.

2. Lack of sophisticated development and debugging support.

The Python module does not support IDE features such as Intellisense and Debug. In addition, if the module fails at runtime, the complete tree of Python procedure calls is available. But it must be displayed in the output log of the module.

We recommend that you develop and debug your Python scripts in an environment such as IPython, and then import the code into the module.

3. Single data frame output.

The Python entry point is only allowed to return a single data frame as an output. It is currently impossible to return arbitrary Python objects such as models formed directly when running Azure Machine Learning. As with Execute R script, which has the same restrictions, it is often possible to store objects in an array of bytes and then return them in a data frame.

4. Cannot customize Python installation.

Adding custom Python modules is currently only possible using the zip file mechanism described earlier. Although this is feasible for small modules, the procedure is quite heavy for large modules (especially those consisting of native DLLs) or a large number of modules.

Conclusions

The Execute Python Script module enables a data scientist to embed existing Python code into cloud-based machine learning workflows in Azure Machine Learning and make it work seamlessly as part of a web service.

The Python script module naturally interacts with other Azure Machine Learning modules.

It can be used for many tasks, from exploration to data preprocessing, to feature extraction, evaluation, and post-processing of results.

The runtime of the main server used for execution is based on Anaconda; a well tested and widely used Python distribution.

This main server makes it easier for you to integrate existing code elements into the cloud.

We plan to provide additional functionality for the Python Script Run module such as the ability to train and make models operational in Python and to add better code development and debugging support in Azure Machine Learning Studio.

 SQL Server Machine Learning Services

SQL Server Microsoft Machine Learning Services lets you run, train, and deploy machine learning models using R or Python.

You can use data located locally and in SQL Server databases.

Use Microsoft Machine Learning Services when you need to train or deploy models locally or in Microsoft SQL Server databases.

Models created with Machine Learning Services can be deployed using the Azure Machine Learning Template Management service.

Microsoft Machine Learning Server

Microsoft Machine Learning Server is an enterprise server for hosting and managing parallel and distributed workloads of R and Python processes. Microsoft Machine Learning Server runs on Linux, Windows, Hadoop, and Apache Spark. It is also available on HDInsight.

It provides a runtime for solutions created using Microsoft Machine Learning packages and extends Python and R open source with support for the following scenarios:

  • High-performance analytics
  • Statistical analysis
  • Automatic learning
  • Very large data sets

Other features are provided through the proprietary packages that are installed with the server.

For development, you can use as R Tools IDE for Visual Studio and Python Tools for Visual Studio.

  •  Use Microsoft Machine Learning Server when you need to:
  • Build and deploy templates created with R and Python on a server
  • Distribute scalable R and Python training on a Hadoop or Spark cluster.  

Azure Data Science Virtual Machine (DSVM)

The Data Science Virtual Machine (DSVM) is a custom virtual machine environment of the Microsoft Azure cloud, specifically designed for data science. It includes many popular data science tools and other tools are preinstalled and preconfigured to accelerate the creation of intelligent applications for advanced analysis.

The DSVM virtual machine is available in Windows Server 2016 and 2012, and on a Linux virtual machine, it is available on Ubuntu 16.04 LTS and OpenLogic CentOS 7.4 distributions.

Use the DSVM virtual machine when you need to run or host your tasks on the same node, or if you need to remotely ramp up processing on a single computer. The DSVM image is supported as a target for the Azure Machine Learning Experiment and Azure Machine Learning Model Management services.

Spark MLLib in HDInsight

 Spark MLLib in HDInsight allows you to create templates as part of Spark tasks that run on large data.

Spark allows you to easily transform and prepare data, and then support the creation of the model in one task.

Models created through Spark MLLib can be deployed, managed, and monitored through the Azure Machine Learning Template Management service.

Learning Executions can be distributed and managed with the Azure Machine Learning Experimentation Service.

Spark can also be used to scale up data preparation tasks created in Machine Learning Workbench.

NOTE: Use Spark when you need to scale up data processing and create models as part of a data pipeline. You can create Spark tasks in Scala, Java, Python, or R.

AI Training Batch

Azure Batch AI Training helps you experiment with your AI models in any framework, then scale them on clustered GPUs.

Describe the specifications of your task and the configuration to be performed, and we take care of the rest.

Batch AI Training allows you to support deep learning tasks on clustered GPUs, using frameworks such as:

  • Cognitive Toolkit
  • Caffe
  • Layering
  • TensorFlow

The Azure Machine Learning Template Management service can be used to remove Batch AI Training templates for deployment, management, and monitoring. Batch AI Training will be integrated with Azure Machine Learning Experimentation in the future.

Microsoft Cognitive Toolkit (CNTK)

Microsoft Cognitive Toolkit is a unified deep learning resource kit that describes neural networks as computational steps in a directed graph.

In this oriented graph, the terminal nodes represent input values or network parameters, while the other nodes represent matrix operations on their inputs.

The Cognitive Toolkit allows you to easily build and combine popular model types such as flow transfer DNN, convolutional networks (CNN), and recurrent networks (RNN / LSTM).

It implements Stochastic Gradient Descent (SGD) training, backpropagation of errors, with automatic distinction and parallelization on several GPUs and servers.

Use Cognitive Toolkit when you want to generate a model with deep learning. Cognitive Toolkit can be used in all previous services.

Azure Cognitive Services

Azure Cognitive Services brings together some thirty APIs to create applications that use natural communication methods.

These APIs allow your applications to see, hear, speak, understand and interpret user needs with just a few lines of code.

Easily add smart features to your applications, for example:

  • Emotion and feelings detection
  • Vision and voice recognition
  • Language understanding (LUIS)
  • Knowledge and research

Cognitive Services can be used to develop applications on devices and platforms. APIs are getting better and easier to configure.

The Brain behind this Post.

junaid ali qureshi 150x150 - New Machine Learning Feature of Azure and Python - All You Need To Know

 Junaid Ali Qureshi is a digital marketing specialist who has helped several businesses gain traffic, outperform competition and generate profitable leads. His current ventures include Progostech, Magentodevelopers.online, eLabelz, Smart Leads.ae, Progos Tech and eCig.