Microsoft Word - HW4-Memory Map and RadixSort Float Point Numbers.docx
Radix Sort Float Point Numbers with Memory Map
Implement a C program that sorts a set of 4-byte float point values in ascending order using radix
sort. The values are saved in a file. The program should read/write the file through memory
mapping. When the program finishes, the sorted values should be saved in the same file.
Submission Instructions
1. Submit: 1) your C program, and 2) one or two screenshots in jpg format showing that your program
eally works.
2. Name your scripts using the pattern NJITID#.c, and name your screenshots using the pattern
NJITID#_index.jpg. NJITID# is the eight-digit NJIT ID (Not your UCID, Rutgers students also have
NJIT IDs). Index is the index number showing the order of the screenshots (e.g., 1 or 2).
3. Submit individual files. DO NOT SUBMIT A ZIP FILE.
Objectives
• To gain more experience with using bitwise operations
• To understand the binary format of float point numbers
• To gain more understanding on radix-sort
• To learn how to read/write files using memory mapping
Requirements and Instructions
• Your program should take one argument, which is the pathname of the file containing the data
to be sorted. For example, to sort the float point values saved in ./file5k, you can use the
following command:
.
adixsort ./file5k
• The number of float-point values saved in the file can be calculated using file size and the size
of each float point value (i.e., 4 bytes). Thus, there is no need to specify the number of values.
• To access the data in the file, your program needs to use memory mapping. Avoid using
conventional calls, e.g., read(), fread(), write(), or fwrite(), to read/write the file. Refer to the
slides on Linux File and Directory Operations, particularly the two examples in the Memory-
mapped files part, for how to read/write files using memory mapping.
• Your program must use radix-sort and work directly on the binary data (i.e., avoid converting
the data into strings or characters). To use binary operations to extract bits from a float-point
number, you need to use a union to include two types, float and int. For example, the following
code is to extract the least significant bit from float point number 0.1.
unsigned int b;
union ufi {
float f;
int i;
} u;
u.f = 0.1;
= u.i & 0x1;
• Since the program sorts the numbers using binary format, your program only needs two
uckets to help with sorting. The memory space that maps the file can be used to merge the
data. This reduces memory space consumption. At the same time, when the last round of radix-
sort, the file automatically saves the sorted values.
• Some of the float point values are negative. Special attention is needed to handle the problem
caused by sign bits. Refer to the slides on Radix Sort for the methods handling signed float
point numbers.
• You can compile gendata.c attached with this assignment and use it to generate random
values and save them into a file. The executable file can also be found in
in in the virtual
machine. The program also reports the sum of the values. For example, to generate 5000
andom values and save them into ./file5kvalues, you can use the following command
./gendata 5000 ./file5kvalues
• You can compile checkdata.c attached with this assignment and use it to check whether
the float point values have been sorted in ascending order. The executable file can also be
found in
in in the virtual machine. The tool also calculates a sum of the values in the file.
Thus, you can compare the sum with the sum reported by gendata. The two sums should be
very similar with minor numerical e
or caused by limited precisions.
./checkdata ./file5kvalues
• Optimize your implementation. For example, to copy a large number of numbers, you can use
memcpy instead of copying the numbers one by one.
Testing
Test 1. The program can co
ectly sort 1 million float point values in a file within 1 minute, and
the file can pass the test with the checkdata program (i.e., sorted; the sum of sorted values is close
to the sum of unsorted values given by gendata when the file was created (difference <5%))
Step 1: Generate a file containing 1 million float point values using gendata, and write
down the sum of these values reported by gendata:
gendata XXXXXXXXXX ./file1mvalues
Step 2: run the program to sort the values:
time ./major_hw3_1 ./file1mvalues
Step 3: check whether the values have been sorted using checkdata, and write down the
sum reported by checkdata:
checkdata ./ file1mvalues
Step 4: compare the sum reported by gendata and the sum reported by checkdata.
Test 2. The program can co
ectly sort 100 million float point values in a file within 2 minute, and
the file can pass the test with the checkdata program (i.e., sorted; the sum of sorted values is close
to the sum of unsorted values given by gendata when the file was created (difference <5%))
Step 1: Generate a file containing 100 million float point values using gendata, and write
down the sum of these values reported by gendata:
gendata XXXXXXXXXX ./file100mvalues
Step 2: run the program to sort the values:
time ./major_hw3_1 ./file100mvalues
Step 3: check whether the values have been sorted using checkdata, and write down the
sum reported by checkdata:
checkdata ./ file100mvalues
Step 4: compare the sum report by gendata and the sum reported by checkdata.
CS 288 Intensive Programming in Linux
Memory-mapped files
Alternative: use mmap()
Associate virtual addresses with the data
Increasingly used
Four steps
open the file
map the file
use the file (read/write the co
esponding memory region)
unmap the file and close the file
3
Mmap Diagram
Code
Code
Data
Heap
Stack
Stack
File A
Process
File B
4
Map a file using mmap
#include
void * mmap(void *addr, size_t len, int prot,
XXXXXXXXXXint flags,int fildes, off_t off);
Addr: address where we want to map it (use 0 if you don’t know)
Len: how much we want mapped
Prot: PROT_READ for read permission, PROT_WRITE for write permission, PROT_EXEC for execute permission.
Flags: use MAP_SHARED for files if you want to save changes to files.
Fildes: file descripter of the file being mapped
Off: start offset in file
#include #include #include #include #include #include #include #define FILEPATH "/tmp/mmapped.bin"
#define NUMINTS (1000)
#define FILESIZE (NUMINTS * sizeof(int))
int main(int argc, char *argv[])
{
int i;
int fd;
int *map; /* mmapped a
ay of int's *
fd = open(FILEPATH, O_RDONLY);
if (fd == -1) {
pe
or("E
or opening file for reading");
exit(EXIT_FAILURE);
}
map = mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
close(fd);
pe
or("E
or mmapping the file");
exit(EXIT_FAILURE);
}
/* Read the file int-by-int from the mmap
*
for (i = 0; i printf("%d: %d\n", i, map[i]);
}
if (munmap(map, FILESIZE) == -1) {
pe
or("E
or un-mmapping the file");
}
close(fd);
return 0;
}
#include #include #include #include #include #include #include #define FILEPATH "/tmp/mmapped.bin"
#define NUMINTS (1000)
#define FILESIZE (NUMINTS * sizeof(int))
int main(int argc, char *argv[])
{
int i; int fd; int result; int *map;
/* Open a file for writing.*
fd = open(FILEPATH, O_RDWR | O_CREAT | O_TRUNC, (mode_t)0600);
if (fd == -1) {
pe
or("E
or opening file for writing");
exit(EXIT_FAILURE);
}
/*Stretch the file size to the size of the (mmapped) a
ay of ints *
result = lseek(fd, FILESIZE-1, SEEK_SET);
if (result == -1) {
close(fd);
pe
or("E
or calling lseek() to 'stretch' the file");
exit(EXIT_FAILURE);
}
result = write(fd, "", 1);
if (result != 1) {
close(fd);
pe
or("E
or writing last byte of the file");
exit(EXIT_FAILURE);
}
/* Now the file is ready to be mmapped.*
map = mmap(0, FILESIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
close(fd);
pe
or("E
or mmapping the file");
exit(EXIT_FAILURE);
}
/* Now write int's to the file as if it were memory (an a
ay of ints).
*
for (i = 0; i map[i] = 2 * i;
}
/* Don't forget to free the mmapped memory
*
if (munmap(map, FILESIZE) == -1) {
pe
or("E
or un-mmapping the file");
/* Decide here whether to close(fd) and exit() or not. Depends... *
}
/* Un-mmaping doesn't close the file, so we still need to do that.
*
close(fd);
return 0;
}
CS288 Intensive Programming in Linux
Sorting
A fundamental application for computers
Done to make finding data (searching) faste
Many different algorithms for sorting
Simple sorting algorithms run in quadratic time O(N2)
u
le sort
selection sort
insertion sort
Conventional sorting algorithms: https:
www.toptal.com/developers/sorting-algorithms
Though the order is “ascending” by default, it is not difficult to figure out how to change the order to