Microsoft has unveiled a new function that caters to data scientists, with the release of a Machine Learning library for Apache Spark.
The aim is to offer an increased rate of experimentation and also help data scientists leverage advanced machine and deep learning techniques on large datasets.
Microsoft’s Machine Learning Library (MLlib) is built to make machine learning scalable and easy to use, providing tools such as algorithms to offer classification, regression, clustering and filtering of machine learning.
According to Microsoft, customers already using its SparkML have found it to be a platform which helps in building scalable machine learning models but have still struggled with low-level APIs.
In order to change this, Microsoft has added the primary Machine Learning API for Spark as the DataFrame-based API in the spark.m1 package.
Read more: Machine learning and data science workloads ignite Apache Spark adoption
By doing this, the Machine Learning for Apache Spark will be able to simplify the necessary tasks for building models, while the library also offers more consistent APIs that can be used to handle different types of data in the form of text or categories.
Ahead of this, Microsoft also put innovation into it by adding a new Spark connector for Azure Cosmos DB. It is designed to deliver real-time data science, machine learning, advanced analytics and embedded features to explore over globally distributed data in Azure Cosmos DB.
Azure Cosmos DB is Microsoft’s multi-model database service for mission-critical applications. By connecting Apache Spark to the database, provides customers with the opportunity to solve fast-moving data science problems.
In a blog post, Denny Lee, PPM, Azure Cosmos DB said: “With the updated Spark connector for Azure Cosmos DB data models: Documents, Tables and Graphs.”
Apache Spark with Azure Cosmo DB is what drives machine learning, data science, artificial intelligence and advanced analytics.
Microsoft has also made the Machine Learning library for Apache Spark available on GitHub as an open source project for easier access for customers.