1

I am working on a simple client server program in C in which multiple clients will be connected to a single server.

Clients will submit operations/actions to the server and the server will process these requests. These operations may be expensive and/or long running so ideally I would like to have a thread pool on the server that can concurrently process requests rather than block the main thread.

In addition I also thought that using poll (can't use epoll as I need to be POSIX compliant) might be better performance wise rather than creating a new thread per socket connection (stems from the C10K problem: https://en.wikipedia.org/wiki/C10k_problem).

So in theory the server might look like the following pseudo code

int main()
{
  // Pretend these are initialized in some manner
  ThreadPool thread_pool;
  Socket server_listening_socket;
  PolledFileDescriptors list_of_polled_fds;

  // The first pollfd will be the listening socket which looks for read events on it
  list_of_polled_fds[0].fd = server_listening_socket;
  list_of_polled_fds[0].events = POLLIN;

  while (true)
  {
    // Call poll on our list of file descriptors with unlimited timeout (-1)
    poll(&list_of_polled_fds, number_of_fds, -1);
    for (int i = 0; i < number_of_fds; i++)
    {
      // We received a read event on this file descriptor
      if (list_of_polled_fds[i].revents & POLLIN)
      {
 
        // The listening socket has an event (meaning a new connection was created) 
        if (i == 0)
        {
          Socket client_socket = accept();
          AddClientConnectionToListOfPollFds(&list_of_polled_fds, client_socket);
        }

        // A connected client has an event (data was sent over the socket)
        else
        {  
          ThreadPoolTask task = {
            .argument = list_of_polled_fds[i].fd // client connected file descriptor
            .function = SomeFunctionToReadDataFromSocketAndProcessIt
          };
          AddTaskToThreadPool(&thread_pool, &task);
        }
      }
    }
  }

  return 0;
}

Now with this high level design I have a few concerns.

Single Message Causes Multiple Events

  • Suppose the client tries to send the server a message of 10 bytes, but for some reason the bytes get split into 2 TCP packets.
  • The first packet will come in on the client socket and this will cause poll to detect an event.
  • It will then place this socket into a task on the thread pool which will read and process the data.
  • The second packet then comes in and causes poll to do the same thing.
  • Now I have 2 tasks in my thread pool that correspond to the same socket and for what should be the same "message".

How should I manage this? Should I just keep track of which sockets are currently being worked on in the thread pool and not add the same socket if a task exists?

If I guard the thread pool from adding the same socket twice, then that means if a single client sends 2 independent requests, I will not be able to process them in parallel. I will have to wait for the first message to finish and then process the next one.

What is a good mechanism for detecting if multiple poll events belong so a single client message so I can both not add redundant tasks to my thread pool, but still process multiple requests from the client simultaneously?

1 Answer 1

2

Here is the crux of your problem:

  • App-level record boundaries matter to your app.
  • TCP PSH (segment push) events won't necessarily correspond to record boundaries.

As with everything else in computer science, solve it with another level of indirection.

w.l.o.g. I will assume that app-level records have the format: (length, message).

Use kafka or some lighter weight IPC technology to create an app-specific message queue. Spawn a few threads which wait for TCP connections to become readable. Once that happens, the worker

  • acquires a per-socket mutex (or marks the connection "busy"),
  • reads length N,
  • reads N bytes of message (which can possibly be blocking reads),
  • atomically submits the N-byte message to the queue,
  • releases mutex, and
  • returns to the idle pool.

Hardly any processing happening. These are strictly I/O tasks, whose responsibility is to enforce record boundaries.

Then a larger pool of threads pulls from the queue to do the "real" work.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.