By Sander Prenen
When you’re visiting a city for a day, you don’t want to spend half of your time stuck in city traffic. In Antwerp this is no different. Therefore, the city of Antwerp introduced the ‘Velo’ bike sharing system in 2011. Go to one of the 300 stations spread throughout the city, rent a bike and return it whenever you want, wherever you want. But what if you park your car just outside the city center and when you approach the nearest bike station, you see that it’s empty? You would have to walk to the next station, which in turn may be empty as well…
I want to ride ‘A’ bicycle
The website of the Velo-project has this neat tool where you can see the number of bikes at each station. You may think that this solves your problem. But you would have already parked your car before you can check the site. Furthermore, maybe someone else will take the last velo in between you are checking the site and you reach the station.
It would be much better if you could know in advance whether the station of your choice will be empty once you arrive. This is exactly the problem that was tackled in cooperation with Digipolis. During the months of April and May 2020, Digipolis collected data from all 300 velo stations. Using this data, a model predicting whether a station will be empty, was built.
Maybe we should take a step back first, because there is no need for this model if the stations are never empty, right? Looking at the data we were provided, we can see that stations do in fact empty out. As you can see in the following graph, there are even some stations that are empty almost 10 percent of the time.
In this graph every dot represents a station. Keep in mind that a station being full can also be a problem when returning the velo, but this occurs less frequent so we will only focus on the stations being empty.
It should also be noted that April and May where both months during which the lockdown was carried out in Belgium. Therefore, it would not be unrealistic to say that during normal operation, the stations would be empty much more frequently.
A ML model a day, keeps the doctor away
Let’s take a dive in the technicalities of creating this model. Since stations are at most 11 percent of the time empty, we can conclude that each station is at least 89 percent of the time not empty. So, we are dealing with an imbalanced dataset.
To deal with this problem we use a technique of oversampling and undersampling, called SMOTETOMEK. This name maybe sounds like some crazy Russian superhero or something. But in reality, it is actually a really simple trick to balance the imbalanced dataset.
First the SMOTE algorithm is used to create new samples of the minority class (a station being empty). Then the TOMEK algorithm undersamples the data by removing pairs of datapoints that are very similar but belong to different classes. The data you end up with is balanced and can be used to train the model.
But if you’re travelling with a large group, knowing whether the station is empty or not can sometimes not be enough. You would like to know the exact number of velos in the station.
Therefore, a model is created in the same way but using regression instead of classification. SMOTETOMEK has done its duty and he can stay at home and a simple random forest regression model is used.
Show me the bicycles!
Enough technical stuff for now, as a tourist you probably don’t care how the model works that tells you if there is a bike available for you to use. And you also don’t want to run the calculations yourself. Therefore, a map was created just like the one you saw on the velo website, but with a few improvements.
You can see, for up to four hours in the future, with what odds the station will be empty and how many bikes will be available. So, before firing up the engine at home, you should first check the map to see where you can park your car and have velos nearby.