Throwing the fattest people off of an overloaded airplane.

C++AlgorithmSortingStl

C++ Problem Overview


Let's say you've got an airplane, and it is low on fuel. Unless the plane drops 3000 pounds of passenger weight, it will not be able to reach the next airport. To save the maximum number of lives, we would like to throw the heaviest people off of the plane first.

And oh yeah, there are millions of people on the airplane, and we would like an optimal algorithm to find the heaviest passengers, without necessarily sorting the entire list.

This is a proxy problem for something I'm trying to code in C++. I would like to do a "partial_sort" on the passenger manifest by weight, but I don't know how many elements I'm going to need. I could implement my own "partial_sort" algorithm ("partial_sort_accumulate_until"), but I'm wondering if there's any easier way to do this using standard STL.

C++ Solutions


Solution 1 - C++

This won't help for your proxy problem, however:

For 1,000,000 passengers to drop 3000 pounds of weight, each passenger must lose (3000/1000000) = 0.003 lbs per person. That could be achieved through jettisoning every ones shirt, or shoes, or probably even fingernail clippings, saving everyone. This assumes efficient collection and jettison before the weight loss needed increased as the plane used more fuel.

Actually, they don't allow fingernail clippers on board anymore, so that's out.

Solution 2 - C++

One way would be to use a min heap (std::priority_queue in C++). Here's how you'd do it, assuming you had a MinHeap class. (Yes, my example is in C#. I think you get the idea.)

int targetTotal = 3000;
int totalWeight = 0;
// this creates an empty heap!
var myHeap = new MinHeap<Passenger>(/* need comparer here to order by weight */);
foreach (var pass in passengers)
{
    if (totalWeight < targetTotal)
    {
        // unconditionally add this passenger
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
    else if (pass.Weight > myHeap.Peek().Weight)
    {
        // If this passenger is heavier than the lightest
        // passenger already on the heap,
        // then remove the lightest passenger and add this one
        var oldPass = myHeap.RemoveFirst();
        totalWeight -= oldPass.Weight;
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
}

// At this point, the heaviest people are on the heap,
// but there might be too many of them.
// Remove the lighter people until we have the minimum necessary
while ((totalWeight - myHeap.Peek().Weight) > targetTotal)
{
    var oldPass = myHeap.RemoveFirst();
    totalWeight -= oldPass.Weight; 
}
// The heap now contains the passengers who will be thrown overboard.

According to the standard references, running time should be proportional to n log k, where n is the number of passengers and k is the maximum number of items on the heap. If we assume that passengers' weights will typically be 100 lbs or more, then it's unlikely that the heap will contain more than 30 items at any time.

The worst case would be if the passengers are presented in order from lowest weight to highest. That would require that every passenger be added to the heap, and every passenger be removed from the heap. Still, with a million passengers and assuming that the lightest weighs 100 lbs, the n log k works out to a reasonably small number.

If you get the passengers' weights randomly, performance is much better. I use something quite like this for a recommendation engine (I select the top 200 items from a list of several million). I typically end up with only 50,000 or 70,000 items actually added to the heap.

I suspect that you'll see something quite similar: the majority of your candidates will be rejected because they're lighter than the lightest person already on the heap. And Peek is an O(1) operation.

For a more information about the performance of heap select and quick select, see When theory meets practice. Short version: if you're selecting fewer than 1% of the total number of items, then heap select is a clear winner over quick select. More than 1%, then use quick select or a variant like Introselect.

Solution 3 - C++

Below is a rather simple implementation of the straightforward solution. I don't think there is a faster way that is 100% correct.

size_t total = 0;
std::set<passenger> dead;
for ( auto p : passengers ) {
    if (dead.empty()) {
       dead.insert(p);
       total += p.weight;
       continue;
    }
    if (total < threshold || p.weight > dead.begin()->weight)
    {
        dead.insert(p);
        total += p.weight;
        while (total > threshold)
        {
            if (total - dead.begin()->weight < threshold)
                break;
            total -= dead.begin()->weight;
            dead.erase(dead.begin());
        }
    }
 }

This works by filling up the set of "dead people" until it meets the threshold. Once the threshold is met, we keep going through the list of passengers trying to find any that are heavier than the lightest dead person. When we have found one, we add them to the list and then start "Saving" the lightest people off the list until we can't save any more.

In the worst case, this will perform about the same as a sort of the entire list. But in the best case (the "dead list" is filled up properly with the first X people) it will perform O(n).

Solution 4 - C++

Assuming all passengers will cooperate: Use a parallel sorting network. (see also this)

Here is a live demonstration

Update: Alternative video (jump to 1:00)

Asking pairs of people to compare-exchange - you can't get faster than this.

Solution 5 - C++

@Blastfurnace was on the right track. You use quickselect where the pivots are weight thresholds. Each partition splits one set of people into sets, and returns the total weight for each set of people. You continue breaking the appropriate bucket until your buckets corresponding to the highest weight people are over 3000 pounds, and your lowest bucket that is in that set has 1 person (that is, it can't be split any further.)

This algorithm is linear time amortized, but quadratic worst case. I think it is the only linear time algorithm.


Here's a Python solution that illustrates this algorithm:

#!/usr/bin/env python
import math
import numpy as np
import random

OVERWEIGHT = 3000.0
in_trouble = [math.floor(x * 10) / 10
              for x in np.random.standard_gamma(16.0, 100) * 8.0]
dead = []
spared = []

dead_weight = 0.0

while in_trouble:
    m = np.median(list(set(random.sample(in_trouble, min(len(in_trouble), 5)))))
    print("Partitioning with pivot:", m)
    lighter_partition = []
    heavier_partition = []
    heavier_partition_weight = 0.0
    in_trouble_is_indivisible = True
    for p in in_trouble:
        if p < m:
            lighter_partition.append(p)
        else:
            heavier_partition.append(p)
            heavier_partition_weight += p
        if p != m:
            in_trouble_is_indivisible = False
    if heavier_partition_weight + dead_weight >= OVERWEIGHT and not in_trouble_is_indivisible:
        spared += lighter_partition
        in_trouble = heavier_partition
    else:
        dead += heavier_partition
        dead_weight += heavier_partition_weight
        in_trouble = lighter_partition

print("weight of dead people: {}; spared people: {}".format(
    dead_weight, sum(spared)))
print("Dead: ", dead)
print("Spared: ", spared)

Output:

Partitioning with pivot: 121.2
Partitioning with pivot: 158.9
Partitioning with pivot: 168.8
Partitioning with pivot: 161.5
Partitioning with pivot: 159.7
Partitioning with pivot: 158.9
weight of dead people: 3051.7; spared people: 9551.7
Dead:  [179.1, 182.5, 179.2, 171.6, 169.9, 179.9, 168.8, 172.2, 169.9, 179.6, 164.4, 164.8, 161.5, 163.1, 165.7, 160.9, 159.7, 158.9]
Spared:  [82.2, 91.9, 94.7, 116.5, 108.2, 78.9, 83.1, 114.6, 87.7, 103.0, 106.0, 102.3, 104.9, 117.0, 96.7, 109.2, 98.0, 108.4, 99.0, 96.8, 90.7, 79.4, 101.7, 119.3, 87.2, 114.7, 90.0, 84.7, 83.5, 84.7, 111.0, 118.1, 112.1, 92.5, 100.9, 114.1, 114.7, 114.1, 113.7, 99.4, 79.3, 100.1, 82.6, 108.9, 103.5, 89.5, 121.8, 156.1, 121.4, 130.3, 157.4, 138.9, 143.0, 145.1, 125.1, 138.5, 143.8, 146.8, 140.1, 136.9, 123.1, 140.2, 153.6, 138.6, 146.5, 143.6, 130.8, 155.7, 128.9, 143.8, 124.0, 134.0, 145.0, 136.0, 121.2, 133.4, 144.0, 126.3, 127.0, 148.3, 144.9, 128.1]

Solution 6 - C++

Assuming that, like people's weights, you have a good idea of what the maximum and minimum values are likely to be use a radix sort to sort them in O(n). Then simply work from the heaviest end of the list towards the lightest. Total running time: O(n). Unfortunately, there isn't an implementation of a radix sort in the STL, but it's pretty straightforward to write.

Solution 7 - C++

Why don't you use a partial quicksort with a different abort rule than "sorted". You can run it and then use just the higher half and go on until the weight within this higher half does not contain the weight that has at least to be thrown out anymore, than you go back one step in the recursion and sort the list. After that you can start throwing people out from the high end of that sorted list.

Solution 8 - C++

Massively Parallel Tournament Sort:-

Assuming a standard three seats each side of the ailse:-

  1. Ask the passengers in the window seat to move to the middle seat if they are heavier than the person in the window seat.

  2. Ask the passengers in the middle seat to swap with the passenger in aisle seat if they are heavier.

  3. Ask the passenger in the left aisle seat to swap with the passenger in the right aisle seat id they are heavier.

  4. Bubble sort the passengers in the right aisle seat. (Takes n steps for n rows). -- ask the passengers in the right aisle seat to swap with the person in front n -1 times.

5 Kick them out the door until you reach 3000 pounds.

3 steps + n steps plus 30 steps if you have a really skinny passenger load.

For a two aisle plane -- the instructions are more complex but the performance is about the same.

Solution 9 - C++

I would probably use std::nth_element to partition off the 20 heaviest people in linear time. Then use a more complex method to find and bump off the heaviest of the heavies.

Solution 10 - C++

You could make one pass over the list to get the mean and the standard deviation, then use that to approximate the number of people that have to go. Use partial_sort to generate the list based on that number. If the guess was low, use partial_sort again on the remainder with a new guess.

Solution 11 - C++

@James has the answer in the comments: a std::priority_queue if you can use any container, or a combination of std::make_heap and std::pop_heap (and std::push_heap) if you want to use something like a std::vector.

Solution 12 - C++

Here's a heap-based solution using Python's built-in heapq module. It's in Python so doesn't answer the original question, but it's cleaner (IMHO) than the other posted Python solution.

import itertools, heapq

# Test data
from collections import namedtuple

Passenger = namedtuple("Passenger", "name seat weight")

passengers = [Passenger(*p) for p in (
    ("Alpha", "1A", 200),
    ("Bravo", "2B", 800),
    ("Charlie", "3C", 400),
    ("Delta", "4A", 300),
    ("Echo", "5B", 100),
    ("Foxtrot", "6F", 100),
    ("Golf", "7E", 200),
    ("Hotel", "8D", 250),
    ("India", "8D", 250),
    ("Juliet", "9D", 450),
    ("Kilo", "10D", 125),
    ("Lima", "11E", 110),
    )]

# Find the heaviest passengers, so long as their
# total weight does not exceeed 3000

to_toss = []
total_weight = 0.0

for passenger in passengers:
    weight = passenger.weight
    total_weight += weight
    heapq.heappush(to_toss, (weight, passenger))
    
    while total_weight - to_toss[0][0] >= 3000:
        weight, repreived_passenger = heapq.heappop(to_toss)
        total_weight -= weight


if total_weight < 3000:
    # Not enough people!
    raise Exception("We're all going to die!")

# List the ones to toss. (Order doesn't matter.)

print "We can get rid of", total_weight, "pounds"
for weight, passenger in to_toss:
    print "Toss {p.name!r} in seat {p.seat} (weighs {p.weight} pounds)".format(p=passenger)

If k = the number of passengers to toss and N = the number of passengers, then the best case for this algorithm is O(N) and the worst case for this algorithm is Nlog(N). The worst case occurs if k is near N for a long time. Here's an example of the worst cast:

weights = [2500] + [1/(2**n+0.0) for n in range(100000)] + [3000]

However, in this case (throwing people off the plane (with a parachute, I presume)) then k must be less than 3000, which is << "millions of people". The average runtime should therefore be about Nlog(k), which is linear to the number of people.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionIvyMikeView Question on Stackoverflow
Solution 1 - C++aportrView Answer on Stackoverflow
Solution 2 - C++Jim MischelView Answer on Stackoverflow
Solution 3 - C++SoapBoxView Answer on Stackoverflow
Solution 4 - C++Lior KoganView Answer on Stackoverflow
Solution 5 - C++Neil GView Answer on Stackoverflow
Solution 6 - C++Keith IrwinView Answer on Stackoverflow
Solution 7 - C++SimView Answer on Stackoverflow
Solution 8 - C++James AndersonView Answer on Stackoverflow
Solution 9 - C++BlastfurnaceView Answer on Stackoverflow
Solution 10 - C++Mark RansomView Answer on Stackoverflow
Solution 11 - C++Max LybbertView Answer on Stackoverflow
Solution 12 - C++Andrew DalkeView Answer on Stackoverflow