CP-SAT is great at solving small and medium-sized problems. But what if you have a really big problem on your hands? One option might be to use a special kind of algorithm known as a "meta-heuristic", like a genetic algorithm. But these can be hard to set up and might not even give you good results.
Sometimes you will see new algorithms with cool-sounding names in scientific papers. While tempting, these are often just small twists on older methods and might leave out key details that make them work. If you are interested, there's a discussion about this issue in a paper by Sörensen, called "Metaheuristics – The Metaphor Exposed".
The good news? You do not have to implement an algorithm that simulates the mating behavior of forest frogs to solve your problem. If you already know how to use CP-SAT, you can stick with it to solve big problems without adding unnecessary complications. Even better? This technique, called Large Neighborhood Search, often outperforms all other approaches.
Many traditional methods generate several "neighbor" options around an existing solution and pick the best one. However, making each neighbor solution takes time, limiting how many you can examine.
Large Neighborhood Search (LNS) offers a more efficient approach. Instead of generating neighbors one by one, LNS formulates a "mini-problem" that modifies parts of the current solution. This often involves randomly selecting some variables, resetting them, and using CP-SAT (or a similar tool) to find the optimal new values within the context of the remaining solution. This method, known as "destroy and repair," allows for a broader exploration of neighbor solutions without constructing each one individually.
Moreover, LNS can easily be mixed with other methods like genetic algorithms. If you are already using a genetic algorithm, you could supercharge it by applying CP-SAT to find the best possible crossover of two or more existing solutions. It is like genetic engineering, but without any ethical worries!
When looking into the logs of CP-SAT, you may notice that it uses LNS itself to find better solutions.
8 incomplete subsolvers: [feasibility_pump, graph_arc_lns, graph_cst_lns, graph_dec_lns, graph_var_lns, rins/rens, rnd_cst_lns, rnd_var_lns]
Why does it not suffice to just run CP-SAT if it already solves the problem with LNS? The reason is that CP-SAT has to be relatively problem-agnostic. It has no way of knowing the structure of your problem and thus cannot use this information to improve the search. You on the other hand know a lot about your problem and can use this knowledge to implement a more efficient version.
Literature:
- General Paper on LNS-variants: Pisinger and Ropke - 2010
- A generic variant (RINS), that is also used by CP-SAT: Danna et al. 2005
We will now look into some examples to see this approach in action.
You are given a knapsack that can carry a certain weight limit
This is one of the simplest NP-hard problems and can be solved with a dynamic programming approach in pseudo-polynomial time. CP-SAT is also able to solve many large instances of this problem in an instant. However, its simple structure makes it a good example to demonstrate the use of Large Neighborhood Search, even if the algorithm will not be of much use for this problem.
A simple idea for the LNS is to delete some elements from the current solution, compute the remaining capacity after deletion, select some additional items from the remaining items, and try to find the optimal solution to fill the remaining capacity with the deleted items and the newly selected items. Repeat this until you are happy with the solution quality. The number of items you delete and select can be fixed such that the problem can be easily solved by CP-SAT. You can find a full implementation under examples/lns_knapsack.ipynb.
Let us look only on an example here:
Instance:
Initial solution of value 442:
We will now repeatedly delete 5 items from the current solution and try to fill
the newly gained capacity with an optimal solution built from the deleted items
and 10 additional items. Note that this approach essentially considers
Round 1 of LNS algorithm:
- Deleting the following 5 items from the solution:
$\{I_{0}, I_{7}, I_{8}, I_{9}, I_{6}\}$ - Repairing solution by considering the following subproblem:
- Subproblem:
$C=72$ ,$I=\{I_{6},I_{9},I_{86},I_{13},I_{47},I_{73},I_{0},I_{8},I_{7},I_{38},I_{57},I_{11},I_{60},I_{14}\}$
- Subproblem:
- Computed the following solution of value 244 for the subproblem:
$\{I_{8}, I_{9}, I_{13}, I_{38}, I_{47}\}$ - Combining
$\{I_{1}, I_{2}, I_{3}, I_{4}, I_{5}\}\cup \{I_{8}, I_{9}, I_{13}, I_{38}, I_{47}\}$ - New solution of value 455:
$\{I_{1}, I_{2}, I_{3}, I_{4}, I_{5}, I_{8}, I_{9}, I_{13}, I_{38}, I_{47}\}$
Round 2 of LNS algorithm:
- Deleting the following 5 items from the solution:
$\{I_{3}, I_{13}, I_{2}, I_{9}, I_{1}\}$ - Repairing solution by considering the following subproblem:
- Subproblem:
$C=78$ ,$I=\{I_{13},I_{9},I_{84},I_{41},I_{15},I_{42},I_{74},I_{16},I_{3},I_{1},I_{2},I_{67},I_{50},I_{89},I_{43}\}$
- Subproblem:
- Computed the following solution of value 275 for the subproblem:
$\{I_{1}, I_{15}, I_{43}, I_{50}, I_{84}\}$ - Combining
$\{I_{4}, I_{5}, I_{8}, I_{38}, I_{47}\}\cup \{I_{1}, I_{15}, I_{43}, I_{50}, I_{84}\}$ - New solution of value 509:
$\{I_{1}, I_{4}, I_{5}, I_{8}, I_{15}, I_{38}, I_{43}, I_{47}, I_{50}, I_{84}\}$
Round 3 of LNS algorithm:
- Deleting the following 5 items from the solution:
$\{I_{8}, I_{43}, I_{84}, I_{1}, I_{50}\}$ - Repairing solution by considering the following subproblem:
- Subproblem:
$C=79$ ,$I=\{I_{84},I_{76},I_{34},I_{16},I_{37},I_{20},I_{8},I_{43},I_{50},I_{1},I_{12},I_{35},I_{53}\}$
- Subproblem:
- Computed the following solution of value 283 for the subproblem:
$\{I_{8}, I_{12}, I_{20}, I_{37}, I_{50}, I_{84}\}$ - Combining
$\{I_{4}, I_{5}, I_{15}, I_{38}, I_{47}\}\cup \{I_{8}, I_{12}, I_{20}, I_{37}, I_{50}, I_{84}\}$ - New solution of value 526:
$\{I_{4}, I_{5}, I_{8}, I_{12}, I_{15}, I_{20}, I_{37}, I_{38}, I_{47}, I_{50}, I_{84}\}$
Round 4 of LNS algorithm:
- Deleting the following 5 items from the solution:
$\{I_{37}, I_{4}, I_{20}, I_{5}, I_{15}\}$ - Repairing solution by considering the following subproblem:
- Subproblem:
$C=69$ ,$I=\{I_{37},I_{4},I_{20},I_{15},I_{82},I_{41},I_{56},I_{76},I_{85},I_{5},I_{32},I_{57},I_{7},I_{67}\}$
- Subproblem:
- Computed the following solution of value 260 for the subproblem:
$\{I_{5}, I_{7}, I_{15}, I_{20}, I_{37}\}$ - Combining
$\{I_{8}, I_{12}, I_{38}, I_{47}, I_{50}, I_{84}\}\cup \{I_{5}, I_{7}, I_{15}, I_{20}, I_{37}\}$ - New solution of value 540:
$\{I_{5}, I_{7}, I_{8}, I_{12}, I_{15}, I_{20}, I_{37}, I_{38}, I_{47}, I_{50}, I_{84}\}$
Round 5 of LNS algorithm:
- Deleting the following 5 items from the solution:
$\{I_{38}, I_{12}, I_{20}, I_{47}, I_{37}\}$ - Repairing solution by considering the following subproblem:
- Subproblem:
$C=66$ ,$I=\{I_{20},I_{47},I_{37},I_{86},I_{58},I_{56},I_{54},I_{38},I_{12},I_{39},I_{68},I_{75},I_{66},I_{2},I_{99}\}$
- Subproblem:
- Computed the following solution of value 254 for the subproblem:
$\{I_{12}, I_{20}, I_{37}, I_{47}, I_{75}\}$ - Combining
$\{I_{5}, I_{7}, I_{8}, I_{15}, I_{50}, I_{84}\}\cup \{I_{12}, I_{20}, I_{37}, I_{47}, I_{75}\}$ - New solution of value 560:
$\{I_{5}, I_{7}, I_{8}, I_{12}, I_{15}, I_{20}, I_{37}, I_{47}, I_{50}, I_{75}, I_{84}\}$
Simply removing a portion of the solution and then trying to fix it is not the most effective approach. In this section, we will explore various neighborhoods for the Traveling Salesman Problem (TSP). The geometry of TSP not only permits advantageous neighborhoods but also offers visually appealing representations. When you have several neighborhood strategies, they can be dynamically integrated using an Adaptive Large Neighborhood Search (ALNS).
The image illustrates an optimization process for a tour that needs to traverse the green areas, factoring in turn costs, within an embedded graph (mesh). The optimization involves choosing specific regions (highlighted in red) and calculating the optimal tour within them. As iterations progress, the initial tour generally improves, although some iterations may not yield any enhancement. Regions in red are selected due to the high cost of the tour within them. Once optimized, the center of that region is added to a tabu list, preventing it from being chosen again.
How can you determine the appropriate size of a region to select? You have two main options: conduct preliminary experiments or adjust the size adaptively during the search. Simply allocate a time limit for each iteration. If the solver does not optimize within that timeframe, decrease the region size. Conversely, if it does, increase the size. Utilizing exponential factors will help the size swiftly converge to its optimal dimension. However, it's essential to note that this method assumes subproblems are of comparable difficulty and may necessitate additional conditions.
For the Euclidean TSP, as opposed to a mesh, optimizing regions is not straightforward. Multiple effective strategies exist, such as employing a segment from the previous tour rather than a geometric region. By implementing various neighborhoods and evaluating their success rates, you can allocate a higher selection probability to the top-performing ones. This approach is demonstrated in an animation crafted by two of my students, Gabriel Gehrke and Laurenz Illner. They incorporated four distinct neighborhoods and utilized ALNS to dynamically select the most effective one.
Having multiple strategies for each iteration of your Large Neighborhood Search (LNS) is beneficial, but how do you decide which one to use? While you could select a strategy at random, this approach is inefficient because it is unlikely to choose the best one. Alternatively, you could stick to the strategy that performed well in the past, but there may be a better option you have not tried yet. This brings us to the classic exploration vs. exploitation dilemma. On one hand, you want to exploit strategies that have proven successful, but on the other, you need to explore new strategies to discover potentially better ones.
Fortunately, this issue has been widely studied as the Multi-Armed Bandit Problem, and there are numerous effective solutions available. One popular approach is the Upper Confidence Bound (UCB1) algorithm. I wanted to highlight this dilemma to make you aware that many experts have already tackled it and developed sophisticated strategies.
In practice, CP-SAT schedules its LNS strategies using a simple round-robin
method, as reflected by their equal number of calls in the logs. Whenever a
worker thread finishes its work, it will execute an iteration with the next
strategy in the list, leading to a fair distribution of calls. In this case, we
can see that the 'routing_path_lns'
strategy was the most successful,
improving the incumbent solution 41 times out of 65 calls. There might be room
for performance improvements by employing a more advanced strategy selection
algorithm. However, it is important to recognize the relatively low number of
calls to each strategy in combination with diminishing and noisy returns. If you
only have a few iterations, a more advanced algorithm may not have enough data
to make reliable decisions.
LNS stats Improv/Calls Closed Difficulty TimeLimit
'graph_arc_lns': 5/65 49% 0.26 0.10
'graph_cst_lns': 4/65 54% 0.47 0.10
'graph_dec_lns': 3/65 49% 0.26 0.10
'graph_var_lns': 4/66 55% 0.56 0.10
'rins/rens': 23/66 39% 0.03 0.10
'rnd_cst_lns': 12/66 50% 0.19 0.10
'rnd_var_lns': 6/66 52% 0.36 0.10
'routing_path_lns': 41/65 48% 0.10 0.10
'routing_random_lns': 24/65 52% 0.26 0.10