...making Linux just a little more fun!
By Hyouck "Hawk" Kim
When using message queue with socket or any other file descriptor based unix facilities, the most inconvenient thing is message queue does not support select() system call. So usually unix programmers solve the I/O multiplexing issue in a simple but ugly way like
select on socket with timeout;
wait on a message queue with IPC_NOWAIT
Certainly, the above implementation is ugly. I don't like it. Another solution might be adopt multi-threading. But here in this article, I want to show you a funny approach, that is, implementing a new system call called msgqToFd(). I'm not trying to provide you with full-fledged, bug-free kernel implementation. I just want to present you my experiment. This article might be interesting to readers who like to play with GNU/Linux kernel source.
Here is its signature.
int msgqToFd(int msgq_id)
It returns a file descriptor corresponding to a message queue , which can be used with select().
If any error happens, it returns -1.
An application can use the call like
q_fd = msgqToFd(msgq_id);
select(q_fd + 1, &rset, NULL, NULL, NULL);
r = msgrcv(msgq_id, &msg, sizeof(msg.buffer), 0, 0);
A file descriptor is associated with a file structure. In the file structure, there is a set of operations supported by this file type called file_operations. In the file_operations structure, there is an entry named poll. What the generic select() call does is call this poll() function to get status of a file (or socket or whatever) as the name suggests.
In general, the select() works like
for each file descriptor in the set
call file's poll() to get mask.
if(mask & can_read or mask & can_write or mask & exception)
set bit for this fd that this file is readable/writable or there is an
if(retval != 0)
For detailed implementation of select(), please take a look at sys_select() and do_select() in fs/select.c. of standard kernel source code.
Another thing required to understand is poll_wait(). What it does is put current process into a wait queue provided by each kernel facilities such as file or pipe or socket or in our case, message queue.
Please note that the current process may wait on several wait queues by calling select()
The system call should return a file descriptor corresponding to a message queue. The file descriptor should point to a file structure which contains file_operations for message queue.
To do that, sys_msgqToFd() does
with msqid, locate the corresponding struct msg_queue
allocate a new inode by calling get_msgq_inode()
allocate a new file descriptor with get_unused_fd()
allocate a new file structure with get_empty_filp()
initialize inode, file structure
set file's file_operations with msgq_file_ops
set file's private_data with msq->q_perm.key
install fd and file structure with fd_install()
return the new fd
Please take a look at
and the accompanying
provided with this article. See also
msgq_poll() implementation is pretty simple.
What it does is
With file->private_data, which is a key for a message queue, locate the corresponding message queue
put current process into the message queue's wait queue by calling poll_wait()
if the message queue is empty (msq->q_qnum == 0), set mask as writable( this may cause some arguments but let's forget this for now). If not, set mask as readable
return the mask
To support poll() on a message queue, we need to modify existing message queue source code.
The modification includes
adding a wait queue head to struct msg_queue, which will be used to put a process into for select(). Also the wait queue head should be initialized when a message queue is created. Please take a look at struct msg_queue and newque() in msg.c.
Whenever a new message is inserted to a message queue, a process waiting on the message queue( by calling select()) should be awaken. Take a look at sys_msgsnd() in msg.c.
When a message queue is removed or it's properties are changed, all the processes waiting on the message queue(by calling select()) should be awaken. Take a look at sys_msgctl() and freeque() in msg.c.
To allocate a new inode and file structure, we need to set up some file system related
s for VFS to operate properly. For this purpose, we need additional initialization code to register a new file system and set something up. Take a look at msg_init() in msg.c.
All the changes are "ifdef"ed with MSGQ_POLL_SUPPORT. So it should be easy to identify the changes.
To allocate a file structure, we need to set up the file's f_vfsmnt and f_dentry properly. Otherwise you'll see some OOPS messages printed our on your console. For VFS to work correctly with this new file structure, we need some additional setup, which is already explained briefly.
Since we support only poll() for the file_operations, we don't have to care about every detail of the file system setup code. All we need is a properly set up f_dentry and f_vfsmnt. Most of the related code is copied from pipe.c.
To add a new system call, there two things need to be done.
The first step is add a new system call in kernel level, which we already
In the GNU/Linux kernel, all system V IPC related calls are dispatched through sys_ipc() in arch/i386/kernel/sys_i386.c. sys_ipc() uses call number to identify a specific system call requested. To dispatch the new system call properly, we have to define a new call number(which is 25) for sys_msgqToFd() and modify sys_ipc() to call sys_msgqToFd(). Just for your reference, please take a look at arch/i386/kernel/entry.S in the standard kernel source and sys_ipc() in sys_i386.c provided with this article.
The second step is add a stub function for user level application. Actually all the system call stub functions are provided by GLIBC. And to add a new system call, you have to modify the GLIBC and build your own and install it. Oh hell, NO THANKS!!!. I don't want to do that and I don't want you to do that either. To solve the problem, I did some copy and paste from GLIBC. If you look at user/syscall_stuff.c provided with this article, there is a function named msgqToFd(), which is the stub for msgqToFd() system call.
What it does is simply
Here is a brief description for the macro.
return INLINE_SYSCALL(ipc, 5, 25, key, 0, 0, NULL);
ipc : system call number for sys_ipc(). ipc is expanded as __NR_ipc, which is 117.
5 : number of arguments for this macro.
25 : call number for sys_msgqToFd()
key : an argument to sys_msgqToFd()
INLINE_SYSCALL sets up the arguments property and invokes interrupt 0x80 to switch to kernel mode to invoke a system call.
I'm not so sure about practical usability of this modification.
I just wanted to see whether this kind of modification was possible or not.
Besides that, I want to talk about a few issues needed to be addressed.
If two or more threads or processes are accessing a message queue and one process is waiting on the message queue with msgrcv() and another is waiting with select(), then always the former process/thread will receive the new message. Take a look at pipelined_send() in msg.c.
For writability test, msgq_poll() sets the mask as writable only if the message queue is empty. Actually we can set the mask as writable if a message queue is not full and there will be no big difference. But I chose the implementation for simplicity.
Let's think about this scenario.
In this kind of case, what should be do? A correct solution would be close the fd when the queue is removed. But this is impossible since a message queue can be removed by any process which has a right to do that. This means a process removing the message queue may not have a file descriptor associated with the message queue even if the message queue is mapped to a file descriptor by some other process.
Additionally, if the same queue (with the same key) is created again, the mapping will be still maintained.
Efficiency problem. All the processes waiting on the wait queue by calling select() will be awaken when there is a new message. Eventually only one process will receive the message and all the other processes will go to sleep again.
I used GNU/Linux kernel 2-4-20 on x86 for this experiment.
To build a new kernel with this modification, I suggest you should copy
msg.c to ipc/msg.c
msg.h to include/linux/msg.h
sys_i386.c to arch/i386/kernel/sys_i386.c
and build and install it!!!!
Before running the test programs, please be sure to make key files: