For years now NS (Nationale Spoorwegen – the Dutch National Railways) has used mathematical models to forecast the number of passengers per train. In general these models worked well, but deviations were sometimes difficult to explain and the way that predictions were arrived at was not always apparent. NS wanted gradually to make the system clearer and more consistent, and brought in just one organisation to do this for them: CQM has put NS on the right track.

Forecasting the number of passengers per train is very important

Forecasting the number of passengers per train is very important to NS for high customer satisfaction levels. It determines the deployment of rolling stock, and thus transport costs and passenger comfort: enough passengers must be able to sit. Detailed passenger forecasts make it possible to achieve an optimal balance between deployment of rolling stock and travel comfort, and thereby the balance between financial sustainability and customer satisfaction. But what if a forecast for an individual train suddenly deviates? For example, if the model forecasts that in one specific train in the normally quiet month of June there will be more travelers than in the busy month of September? In such cases, the NS will want to be able to explain this strange anomaly. That wasn’t possible with the previous system, so the NS decided to simplify the system. The name of the project: Borealis; the nerve-wracking challenge: within just a few weeks to deliver the first forecasts using a new, transparent methodology.

Data science with big data issues

This is a typical data science project with Big Data issues. Every two months we deliver a release with forecasts for a half year. This is done in various steps. One so-called ‘release’ consists of some 125,000 predicted combinations of train-route-day. The input is much greater: nearly 200 million numbers from multiple data sources. Amongst which, an increasingly important source is the in- and out-checking data from passengers’ OV smart cards (Openbaar Vervoer - Public Transport). Another example is the actual counting of the number of passengers that is carried out annually in all the trains. We also use data from the timetable. The first release wasn’t optimal, but it was adequate enough. The most important thing was that everyone was confident Borealis would prove the right approach. Precisely because we do now know exactly what data and models we’re using, and can therefore understand the cause of any deviations.

New forecasting methodology giant leap forward

Borealis project leader Raoul Klein Kranenbarg from the NS sees the new forecasting methodology as a vital improvement for both traveler and NS: "First, because the data from the OV smart card can now, anonymously of course, be optimally utilized by the system. Since in- and out-checking with the smart card became standard this is our main data source. So it’s good that in recent years through Borealis we’ve prepared the model to make increasingly robust forecasts. In the model we now also use reports from travelers via the NS App and Customer Services. This helps us get a clearer picture of overcrowded trains. With Borealis we also have a clearer picture of empty trains. But the most important thing is that the whole system has over time become much clearer, and much more consistent." The intention is that NS will this year start to run Borealis entirely in-house, with guidance from CQM.

Would you like to predict capacity and costs?​

Whether it is for small, medium or big data, CQM can help you! Contact Marnix Zoutenbier, he can tell you all about it.


Learn more about Data Science?