2

Recently I got interested in scheduling problems or rather dynamic scheduling problem. The problem is that I want to develop some kind of layer in my application which will be polling circa about 50-100 e-mail boxes on regular basis. I do not have any control on IMAP/POP3 servers, I cannot customize or modify them in any way so I would receive some kind of event about incoming mail - and that's the core of my problem. I have to query those e-mail boxes in order to find out whether there are some mails to be fetched and processed or not. The important thing is that those mailboxes differ in terms of ratio of number of incoming mails in the unit of time, so I would like to check them on different, optimized basis, e. g. the most stressed mailbox I would like to check more often and the one that is not so popular I would check rarely.

I did some high level research on my own and was thinking about using:

  • Priority queue
  • Genetic algorithm
  • Predicting incoming data based on historical data (some kind of regression?)

But I am stuck - I am not sure that I am choosing the best solution, because the genetic algorithm seems to be the most appealing for me, but I have some doubts. My genome would be a set of time intervals for jobs (check single email box) it could be initialized by random values at first and a fitness function could use statistics of the result of the previous iteration (number of actual mails in mailbox on the check, time consumed by processing those emails). But I wonder whether this approach is good enough, because it would "evolve" very slow.

I am designing relatively simple workflow:

  1. Check mailboxes on regular basis using dynamic scheduling
  2. Put event on a queue
  3. Workers receive events and process emails in given mailbox
  4. Historical entry about batch size, total time of processing it is inserted into DB.
  5. Scheduler adjust priorities/time intervals basing on the historical data.

Do you have any experience with this kind of problem? Should I treat it as CSP or Job Shop Scheduling problem? I would be grateful for any insight!

1 Answer 1

1

I think a simpler approach may be enough. First determine on a minimum and maximum time to wait between checking a mailbox, like 5 minutes and 1 hour.

Start with a a waiting period of 10 minutes for each mailbox. When new mail is found reduce the time by 50%, when no mail found increase the time by 10%. And Make sure the times stay between the allowed range.

This algorithm quickly responds to new data and will check those sources more frequently and slowly reduce the load when no data needs to be processed.

You can adjust the increase/decrease percentages to match the desired behavior.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.