Why Is Linux Popular for Machine Learning? Top Distributions To Use

Janus Atienza

Linux is becoming the go-to operating system for machine learning and artificial intelligence processes. The Linux Foundation reports that the LF AI, the foundation’s AI initiative, “has been growing at the rate of one new project per month”. Moreover, they consider that the future of open source lies in its applications of the AI ecosystem and data communities.

If you are wondering if you should use Linux for your new Machine Learning project, this post is for you. We’ll explore Linux use cases for AI and Machine Learning projects and the most popular distributions.

Advantages of Linux for Machine Learning

One of the advantages of Linux is, undoubtedly, not having a licensing fee attached. Large organizations like TensorFlow and PyTorch use Linux to build systems with tens of thousands of processors without having to pay licensing on those processors.

Most ML servers are in Linux. Therefore Linux is the logical option if you want to write for the most common server OS. PyTorch and TensorFlow have limited support for Windows. That means a developer working on PyTorch may have difficulty getting the program working on Windows because it requires custom compiling the software, thus preferring to stay on Linux OS.

Linux also offers high performance, which is essential for data-heavy processes like AI and ML. The Linux kernel is very simple to use, which enables developers to tweak the stack to work better for their data centers. DagsHub, a leader in machine learning projects, uses Linux for a diverse pool of projects.

When programmers require a stable development environment, they often prefer Linux. Linux gives stability to server-side systems. If this would be the case on a Windows machine, the system could crash due to forced restarts from updates. You won’t see a “blue screen of death” on Linux servers. The update structure of Linux distributions promotes stability and gives long-term support to the packages.

Being open-source is a definite advantage for Linux. Additionally, the OS is robust, has fewer bugs in the compilers and linkers. Linux is versatile and supports a variety of hardware, offering multi-programming and multi-user. With Linux, it is easier to set up a development environment and install dependencies. Finally, you cannot overlook the vast Linux community, where you can find many free development tools and libraries at your reach.

Uses of Linux in AI and ML

Machine Learning and deep learning are by definition, data-centric. Training models require accessing and manipulating massive data sets. Logically, data scientists and ML developers prefer frameworks that will make it easy to manipulate the data and implement complex statistical analysis and libraries. Because of its higher performance, most ML frameworks like TensorFlow, Caffe, or PyTorch work with a Linux OS.

ML is used to train models, and more recently, to run inference engines with trained models. Since these models require Python environments, it is easier to run them in general-purpose systems such as Linux. This also enables running ML on mobile devices.

Because of Linux’s benefits, innovation teams often start development in Linux, choosing open-source frameworks.

What is the best Linux distribution for machine learning

Distributions based on the Linux kernel have the advantage of being user-friendly and are easy to deploy. There are currently hundreds of Linux distributions available for any type of use case and system. They can be ready to use or packaged as source code.

How do you choose the right one for your project? Here we compiled a list of the most popular distributions.

Fedora

This Red Hat sponsored operating system distributes the software under free and open-source licenses. The goal of Fedora is to create a platform for hardware, clouds, and containers that can be customized.

You can choose from Fedora’s five different versions: Server for servers, CoreOS for cloud, IoT for Internet of Things, Silverblue for desktop specialization, and Workstation for personal computers.

You can learn more about Fedora here.

Arch Linux

According to its developer, Levente Polyak, Arch Linux focuses on keeping it simple, offering a pragmatic, user-centered, and versatile operating system. Its Do It Yourself approach enables the user to decide the components and services to install and manage. Because of that, this distro requires solid Linux knowledge.

One of the best features of Arch Linux is its rolling release distribution feature. Meaning that the new kernel and app versions are rolled out as soon as they are released. You can learn more about Arch Linux here.

CentOS

This community-driven open-source ecosystem is derived from Red Hat Enterprise Linux (RHEL) and is free. Being part of the RHEL means CentOS rolls out Red Hat updates as soon as they are released, usually within 24 hs. It also means that it shares the same stability, interoperability, consistency, and security as Red Hat.

The latest versions offer features helpful for data scientists, like virtualization, installation and image creation, and networking. You can learn more about CentOS here.

Ubuntu

This open-source operating system on Linux was first released in 2004 and is still the best choice for beginners because of its ease of use. The platform can be used for desktops, cloud, IoT, and enterprise servers. Its consistent support with releases every six months makes it a dynamic and secure option. The default installation comes with their version of the browser (Firefox), Office (LibreOffice), and programs like Thunderbird and Transmission.

Ubuntu is popular for data professionals because of its large community, being easy to use and deploy. You can learn more about Ubuntu here.

“The bottom line – unless you know what you’re doing, just use Ubuntu. It’s the most user friendly, widely supported, and easy to install” (Effective Linux & Bash for Data Scientists).

Conclusion

Choosing Linux for your next ML or AI project can offer many benefits, such as being open-source, enabling fast model training, significant community support, fast update releases, and more.

Which distribution you choose will depend on your use case, how much Linux knowledge you have and the complexity of your project. Ultimately, you can only gain from opting for a Linux OS for your next project.