mes6@njit.edu   

Disclaimer : This website is going to be used for Academic Research Purposes.

The Perfect Data Science Laptop in 2023


One of the most overlooked aspects when starting to learn data science is which laptop to choose. This is a major choice to think about as many of the techniques that are used when training machine learning models are going to be dependant on the capabilities of the equipment you choose. Needless is to say that this can become very easily a daunting task as we start to enumerate the things that we need to take into account, as well as the myriad of possibilities that are out there to fill our needs. The basic components that we need to take a look at are the processor, the RAM, Storage, and the GPU.

The Processor: Need for Speed

The processor is the most crucial part of our laptop as it will be the component that will handle our computations. There are several considerations in which we need to look at this, as the performance will be heavily influenced by the type of architecture (x86 or ARM), the number of cores (either physical or virtual), cache memory, and brand. The first consideration about the choice of processor is that it’s not going to be the same to choose between an x86 or ARM architecture. ARM architecture was originally designed for mobile devices to optimize energy usage, but eventually got into laptops and most manufacturers are jumping into that trend. The most noticeable ARM processor that we consider for data science applications is the M2 processor that is being used in the latest MacBooks.  Given the difference in architecture, an internal optimization process is required to be compatible with applications designed for processors with x86. 

In some applications, this will slow down the performance of the processor, but in general, the performance comparison of M1and M2 chips used in MacBook Air and MacBook Pro laptops shows that the performance is comparable with the flagship processors from Intel like the I7 1165g7 or the AMD Ryzen 5950x. Great gaming laptops like the Razer Blade 14 use AMD Ryzen 9 5900HX processors which can have options for 8 Cores allowing up to 16 Threads, with a 3.3GHz base clock speed and a 4.6GHz max boost that can be used during heavy computations. 

In the same range of processing power but using Intel we can find the Dell Alienware X17, which has an 11th Gen Intel i7 11800H with 8 cores, a massive 24MB L3 cache, and up to 4.6GHz of clock speed. Besides the raw clock speed and the number of threads used in a processor, we should also consider the number of cores and an available number of threads that the processor can manage. In data science and data engineering in general, some operations can be parallelized using multi-thread processes, which means that each core of the processor, either physical or simulated, will receive a chunk of the data to work on. After the processing is done, we can then retrieve the results and aggregate them, which can massively increase the speed of our processes, especially when dealing with complicated feature engineering on big datasets, or when we work with unstructured data. 

Finally, the other important aspect to consider is the cache memory. Cache memory is used in the processor as an intermediate memory between RAM and the actual processing. This memory is measured in MB and the least that we can consider is 8 MB, which is a threshold greatly surpassed in processors like the Intel i9 and i7 which have 24 MB of cache memory. Now that we have a clear understanding of our requirements for the processor, we can move to the next requirement.

The RAM: At the tip of your fingers

When tackling common data science courses most of the time you will spend quite a lot of time working with tools like Pandas, an excellent python library that is used for tabular data. The problem is that this library has one big bottleneck, all the information that we read using it is loaded into the RAM. This might be very problematic when we are working with files in the size of gigabytes as we will need to work on chunks rather than with the entire dataset. This is not the only task that will be limited to the amount of available RAM. Tasks like deep learning generally imply a training step in which the algorithm is exposed to data. The bigger the size of data that can be exposed to at the same time will dictate how much time the entire learning process will take. After mentioning this you might be already asking yourself which is the right amount of RAM that my laptop should have, and which are other things to take into account. First of all, we need to say that in 2021 the minimum amount of RAM that your laptop should have is 8 GB. I might say that is 4 GB, but just the operating system will eat 3 GB, leaving you just 1 GB to work with, so let’s keep the minimum at 8. If you are just starting and you are a student on a budget, you have good options like the Lenovo Laptop IdeaPad 5, which already comes with 16 Gb of RAM and a great i7 processor. It is also important to the data transfer speed of the RAM, which at least should be 2666 MHz. Most recent DDR-4 RAMs will go as much as 3200 MHz of bus speed and logically, as faster the bus speed, the faster the processing will be.

The Storage: Size and Speed

Storage is a topic that is commonly relegated while discussing the requirements of Data Science laptops, although it has a great impact. Let’s get back to the example that we have mentioned before, which was reading big files for machine learning and analysis of tabular data. One of how we can overcome this bottleneck is by using Dask, a python library that reads the data directly from the storage, rather than loading it into memory. So it all comes up with the strategy you choose to deal with larger files. We can come up with a strategy in which we process data in batches, reading from the local storage, therefore, the choices we make about the storage and the read speed will have a great impact on performance. So what are the things to consider and the options we have in terms of storage? Most important than the amount of storage itself is the technology of the storage. It is highly recommended to use SSD storage as the speed at which it will read the data will be up to 10 times faster than a common HDD. If the laptop you’ve selected doesn’t have an SSD hard drive you can update it easily. A good option in case that happens would be the Kingston SA400S37/480G A400 which has 480 GB which is more than enough. If you want to go further, you can get the Samsung 870 Qvo, which has 1TB and a good price-storage ratio. If you want all the power available, you can choose Kingston NV1 NVMe, which is 4 times faster than common SSD hard drives.

The GPU: Enabling Deep Learning

Most deep learning applications require intensive computation during training. To speed things up, developers have enabled the option of using Graphics Cards to accelerate the process of training. With the introduction of CUDA, which is a technology that can be used in Windows and Linux machines, we can speed up the training in most of the deep learning frameworks. While in practice there are only two manufacturers of graphic cards which are Nvidia and AMD, it would be recommendable to get an Nvidia GPU because CUDA was developed by Nvidia, and it only runs on their GPUs. So if you are planning to undertake deep learning tasks it is advisable to start considering Nvidia GPUs from the GTX 1650 onwards. The MSI GF75 is a good option when you are on a budget. It comes with a GTX 1650, an Intel i7, and 16 GB RAM. If you want to go further the ASUS ROG Zephyrus S15 GX502LWS-HF020T comes with a powerful Intel Core i7-10750H processor with an NVIDIA GeForce RTX 2070 SUPER graphics card, a great 32GB DDR4 RAM, and a 1TB SSD.

Conclusion

At the end of the day, the best way to plan your Data Science notebook is to understand the future requirements that you will need to tackle. The answer lies in a balance between your needs and the budget that you might be working it. Nevertheless, there are good options out there, and it’s important to choose them wisely, as they will impact your performance in your work and career. As a rule of thumb, pick the best processor and the best RAM possible, then select a good SSD and when possible, an Nvidia GPU. After that, you will be on the right track to succeed in your data science projects.

The Perfect Data Science Laptop in 2023

Leave a Reply

Your email address will not be published. Required fields are marked *