Silent Try_catch in high load socket server or not?

Posted on

Problem

I’m working on a socket application that will listen to incoming messages from about 400 devices. They all send a message at least once per minute. Now I have a working program and it has been running for a few days without problems as a test with about 10 devices, but I keep thinking I missed something and it could fail at one point for some reason.

Now below is a stripped version of the code. It listens for clients and saves the raw data it receives to a database. It is important that it runs all the time.

Overall question: Excluding external factors like loss of internet connection/power, could this program run indefinitely in the current version?

My main concerns:

  • Missing an occasional message is acceptable. Is it wise to have a try..catch around most of the code that just logs the error and continue? I know I should handle errors case by case, but since I have no idea what to expect, how can I handle it?
  • Once started, could the _tcpListener stop listening for some reason or crash the while loop?
  • I have the _db on the main thread and handle all updates there. Is this better then a new db connection for each thread? (e.g.: using (var db = new entity) { updatedb })
  • If we would upscale to more devices (e.g.: 1000 or even 5000), what would eventually be the downfall?

The code shows I just save the raw data, but there are actually about 7 types of messages. The form contains 7 listboxes, one for each type. I want to show the last 1000 messages for each type. But I would also like to be able to filter the last 1000 records based on an input field. This part is handled in the Updatelist method.

Although this works, it just feels wrong. I have to update the entire list every time a message is received. Is there a better way of doing this?

namespace MyCommunication
{
  public partial class Form1 : Form
  {
    private const int MaxLengthList = 1000;
    private readonly List<string> _listRaw = new List<string>();

    private MyEntities _db;
    private TcpListener _tcpListener;
    private Thread _listenThread;

    public Form1()
    {
      InitializeComponent();

      if (InitServer())
          StartServer();
    }

    delegate void UpdatelistDelegate(ListBox box, List<string> list, string message);
    private void Updatelist(ListBox box, List<string> list, string message)
    {
      if (InvokeRequired)
      {
        Invoke(new UpdatelistDelegate(Updatelist), new object[] { box, list, message });
        return;
      }

      if (!string.IsNullOrEmpty(message))
      {
        list.Add(message);
        while (list.Count() > MaxLengthList)
          list.RemoveAt(0);
      }

      //filter input field.
      if (tbFilter.Text.Length > 0)
      {
        var filter = new List<String>();
        filter.AddRange(list.Where(e => e.Contains(tbFilter.Text)));
        box.DataSource = null;
        box.DataSource = filter;
      }
      else
      {
        box.DataSource = null;
        box.DataSource = list;
      }
    }

    delegate void UpdateRawDataDelegate(RawData data);
    private void UpdateRawData(RawData data)
    {
      if (data == null) return;
      if (InvokeRequired) { Invoke(new UpdateRawDataDelegate(UpdateRawData), new object[] { data }); return; }

      try
      {
        _db.RawData.Add(data);
        _db.SaveChanges();

       //example of update listbox
       Updatelist(lbRawData, _listRaw, "Added rawdata");
      }
      catch (Exception ex)
      {
        //log error
        UpdateError(ex);
      }

    }

    private Boolean InitServer()
    {
      try
      {
        _db = new MyEntities();
        _tcpListener = new TcpListener(IPAddress.Any, 5000);
        _listenThread = new Thread(ListenForClients);
      }
      catch (Exception ex)
      {
        //log error
        UpdateError(ex);
        return false;
      }

      return true;
    }

    private void StartServer()
    {
      _listenThread.Start();      
    }

    private void ListenForClients()
    {
      try
      {
        _tcpListener.Start();

        while (true)
        {
          var client = _tcpListener.AcceptTcpClient();
          var clientThread = new Thread(HandleClientComm);
          clientThread.Start(client);
        }
      }
      catch (Exception)
      {
        //log error
        UpdateError(ex);
      }
    }

    private void HandleClientComm(object client)
    {     
      var tcpClient = (TcpClient)client;
      var clientStream = tcpClient.GetStream();

      var received = new byte[966]; //966 max length of message
      int bytesRead = 0;

      try
      {
        bytesRead = clientStream.Read(received, 0, 966);
      }
      catch (Exception ex)
      {
        UpdateError(ex);
      }

      if (bytesRead <= 16)
      {
        tcpClient.Close();
        return; //nothing to see here (needs atleast 16 bytes for header and footer)
      }

      var data = new RawData
      {
        DateFormat = DateTime.Now,
        Data = received
      };

      UpdateRawData(data);

      try
      {
        //do whatever needs to be done with the received message
      }
      catch (Exception ex)
      {
        //log error
        UpdateError(ex);
      }

    }

  }
}

Solution

Each time your server receives a message, it creates a new thread to handle the message and then ends the thread after handling one message. This is not scalable, because creating and ending threads is so expensive.

IMHO, a better approach would be to have a thread that listens for messages and places them on a queue. Create and maintain a pool of threads which pops messages off the queue and deals with them.

The total number of threads should be equal to the number of CPU cores, so that the OS does not waste time context switching between threads.

You will need to monitor the queue length. If it start to grow, then your server is being overwhelmed and you will need to execute some way of reducing the workload.

Remember that the pool of worker threads needs to access the queue so that only one thread can access the queue at a time. Some frameworks provide a thread safe queue ( very cool! ) but if you do not have one of these you will need to protect the queue with a mutex or similar – don’t try to create your own thread safe queue ( that is a huge challenge )


A more detailed problem that I believe exists in your code: message framing. As your server becaomes busy, messages will begin to arrive together and your read will start reading more than one message at a time. You can also expect that you will get fractions of the next message being read along with the previous message. I do not see any code to deal with this. Your message protocol needs to have an unambiguous marker of where messages start and end, and you need code to repair framing errors.

In my experience, even if you have just one client and one server, if you run the connection anywhere near its bandwidth, you will start to get framing errors. In this case, with multiple devices sending asynchronous messages, you will get framing errors from time to time with even moderate loads. It is best to be always ready to handle them, because they never show up in trials, but then when your code gets out into the real world …

Visualize this:

  1. A message arrives from one of your devices

  2. You code asks for a new thread to be created – with the result a lot of thrashing around, memory allocation and context switching begins.

  3. Meanwhile another message arrives

  4. Finally, the new thread is ready to read … and gets more than the single message expected.

I got a few comments on your code. First of all, try to separate your UI code from your backend code.

You should be using the using statement here

private void HandleClientComm(object client)
{     
  var tcpClient = (TcpClient)client;
  using(var clientStream = tcpClient.GetStream()){
   // code goes here
 }
}

Always try to put your constants in constant variables

var received = new byte[966]; // dont do that

use constants instead

private const int messageLength  = 966;

and use it afterwards

var received = new byte[messageLength];

There is some redundant code

if (tbFilter.Text.Length > 0)
  {
    var filter = new List<String>();
    filter.AddRange(list.Where(e => e.Contains(tbFilter.Text)));
    box.DataSource = null;
    box.DataSource = filter;
  }
  else
  {
    box.DataSource = null; // no need for that 
    box.DataSource = list;
  }

can be reworked for this

  if(tbFilter.Text.Length > 0)
  {
    list  = list.Where(e => e.Contains(tbFilter.Text));
    box.DataSource = null;
  }
  box.DataSource = null; // I doubt thats needed
  box.DataSource = list;

Consider using a higher abstraction for dealing with concurrency . Task for instance. Threads are very expensive and low level. With Tasks, you can easily report the progress. [Have a look at async programming in .Net 1. You have to be a bit careful when writing concurrent programs.

Coupling of UI and business logic:

All of the socket handling is done in private methods of your form class. This makes it very difficult to test without actually starting up the application. Instead, if you move the business logic of processing a message into a different class, you can write automated tests that verify the code does what you want for good and bad messages.


Magic numbers:

You define MaxLengthList as a constant, which is the correct thing to do. However, you don’t do this with any of the other values (socket timeout, required bytes, …).


Exception handling:

If the socket read fails, it is very unlikely that you will be able to continue any further. However, your catch block does not stop the code from executing further.

ListenForClients references ex but does not define the variable.

Even for applications that are meant to be fault tolerant, I don’t like many instances of catching Exception. This will catch all exceptions including NullReferenceException and OutOfMemoryException. When you make a call to a specific method, you should know the “normal” exceptions it might throw and you should handle those directly. An IOException when reading from a socket and a NullReferenceException when calling Read() are two very different things and should be treated differently.


Consistent if statements:

Your code has all of the following if formats.

  • One line, no braces for true clause
  • One line, braces for true clause (technically two statements with second being return)
  • Multi line, braces for true clause, two statements on different lines with second being return
  • Multi line, braces for true clause, many statements on different lines
  • Two line, no braces for single line true (style also used for while loops)

Pick one style and be consistent. I prefer to always be multi-line and always include braces. If you need to add a line to a block, the braces are already there and you won’t have to worry about the second line executing unconditionally.


There is no way to stop you while(true) loop.

A minor note about TCP. You are not guaranteed to get the whole message at once. You may receive just part of the message — and then it may take an arbitrary amount of time for you to get the next part (which may or may not complete the message). This is rare but happens consistently as network load increases (when the network detects congestion, automatic throttling occurs and the client doesn’t send all the data in one packet). It’s not as much of an issue in your case since you’ve said you can tolerate missing some messages (and hopefully the client can tolerate your slamming the connection shut).

Leave a Reply

Your email address will not be published. Required fields are marked *