Previous to MLflow, the industry did not have a standard process or end-to-end infrastructure to develop and productionize machine learning applications in a simple and consistent way. With MLflow, organizations can package their code as reproducible runs, execute and compare hundreds of parallel experiments, leverage any hardware or software platform for training, tuning, hyperparameter search and more. Additionally, organizations can deploy and manage models in production on a variety of clouds and serving platforms. As a testament to MLflow’s design to be an open platform, RStudio’s contribution extends the MLflow platform to the large community of data scientists who use RStudio and R programming language.
"In many organizations machine learning workflows are far too ad-hoc, with no systematic tracking of experiments, inadequate protocols around reproducibility, and no consistent way to package and deploy models. MLflow helps address these issues in a uniform fashion across languages and frameworks," said JJ Allaire, chief executive officer at RStudio. “Integration of R with MLflow will significantly broaden the reach of the project by allowing a broader community to use and contribute to MLflow.”
Since launching MLflow only four months ago, community engagement and contributions have led to an impressive array of new features and integrations that have been released, including:
“With MLflow, data science teams can systematically package and reuse models across frameworks, track and share experiments locally or in the cloud, and deploy models virtually anywhere,” according to Matei Zaharia, chief technologist at Databricks, the original creator of Apache Spark, and Tech Lead of MLflow. “The flurry of interest and contributions we’ve seen from the data science community validates the need for an open source framework to streamline the machine learning lifecycle.”