MATLAB simplifies working with big data by accessing and integrating with your existing big data storage and adapts to your data processing needs based on available resources.
With MATLAB, you can:
“High-performance computing with MATLAB enables us to process previously unanalyzed big data. We translate what we learn into an understanding of how human activities affect the health of ecosystems to inform responsible decisions about what humans do in the ocean and on land.”
Dr. Christopher Clark, Cornell University
You can use MATLAB to read data from large collections of files, databases, data platforms, and cloud storage systems. Datastores in MATLAB let you access data that do not fit into the memory of a single computer or are distributed across multiple files. These datastores support various file formats (CSV, Parquet, MDF etc.) and storage systems (AWS S3, Azure Blob, HDFS, databases, data platforms). You can also create your own datastores for custom file formats.
With MATLAB, you can perform data analysis and data engineering on big data efficiently. MATLAB supports predicate pushdown for Parquet files, so you can filter big data at the source. Once read, you can transform and combine data from different datastores for preprocessing and data engineering.
MATLAB tall arrays use a lazy evaluation framework, which lets you run in-memory table and timetable-based code on big data without rewriting. Tall arrays support hundreds of data manipulation, mathematical, statistical, and machine learning functions, which you can use for simple statistical analysis or developing predictive models on big data.