Linux: How to Split Files

Splitting large files into smaller parts can be very useful, especially when dealing with limited storage or transferring files over the internet. One of the most efficient ways to accomplish this on the command line is by using the split command. Let me guide you on how to do it.

Let's say we have a large file named mysql21-20230314.sql.gz, and we want to split it into smaller parts of 15MB each. Here's the command you can use:

split -b 15M mysql21-20230314.sql.gz mysql21-20230314.sql.gz_

When you execute this command, it will split the file into 15MB parts, and each part will be named with an additional suffix, starting from aa, then ab, and so on. In our example, it produced two files: mysql21-20230314.sql.gz_aa and mysql21-20230314.sql.gz_ab.

You can see the file sizes and timestamps using the ls command:

ls -lh mysql21-20230314.sql.gz*

The output will look like this:

-rw-r--r-- 1 root root  15M Mar 21 14:28 mysql21-20230314.sql.gz_aa
-rw-r--r-- 1 root root 7.5M Mar 21 14:28 mysql21-20230314.sql.gz_ab

This means that the first part, mysql21-20230314.sql.gz_aa, is 15MB in size, and the second part, mysql21-20230314.sql.gz_ab, is 7.5MB in size.

One great feature of the split command is that it also supports using pipes as input. This means you can combine split with other commands, allowing you to split data streams on-the-fly.

For example, if you have a continuous stream of data coming from another process or command, you can pipe it directly into split like this:

some_data_stream | split -b 100M - data_part_

This will take the incoming data from some_data_stream, split it into 100MB parts, and name the output files with the prefix data_part_.

With the split command, managing large files becomes much more manageable, and you can easily work with smaller parts of data when needed.

2022-12-13 22:50:21 | NOTE | 0 Comments


Leave A Comment