Efficiently find an integer not in a set of size 40, 400, or 4000

Related to the classic problem find an integer not among four billion given ones but not exactly the same.

To clarify, by integers what I really mean is only a subset of its mathemtical definition. That is, assume there are only finite number of integers. Say in C++, they are int in the range of [INT_MIN, INT_MAX].

Now given a std::vector<int> (no duplicates) or std::unordered_set<int>, whose size can be 40, 400, 4000 or so, but not too large, how to efficiently generate a number that is guaranteed to be not among the given ones?

If there is no worry for overflow, then I could multiply all nonzero ones together and add the product by 1. But there is. The adversary test cases could delibrately contain INT_MAX.

I am more in favor of simple, non-random approaches. Is there any?

Thank you!

Update: to clear up ambiguity, let's say an unsorted std::vector<int> which is guaranteed to have no duplicates. So I am asking if there is anything better than O(n log(n)). Also please note that test cases may contain both INT_MIN and INT_MAX.

edited Dec 18 at 15:17

asked Dec 18 at 14:37

fleix

40610

2

sorting the vector can be done in O(n log(n)), not sure if you can get anything more efficient
– user463035818
Dec 18 at 14:41

2

Is the vector sorted? For the set it is trivial since it is sorted.
– NathanOliver
Dec 18 at 14:41

1

@user463035818 if you can sort a vector in O(log(n)), you should become very rich indeed.
– Walter
Dec 18 at 14:43

1

@Walter meh it was a typo, no millions for me unfortunately
– user463035818
Dec 18 at 14:44

1

Once sorted, you do: Last + 1?
– JVApen
Dec 18 at 14:47

|
show 13 more comments

Related to the classic problem find an integer not among four billion given ones but not exactly the same.

If there is no worry for overflow, then I could multiply all nonzero ones together and add the product by 1. But there is. The adversary test cases could delibrately contain INT_MAX.

I am more in favor of simple, non-random approaches. Is there any?

Thank you!

edited Dec 18 at 15:17

asked Dec 18 at 14:37

fleix

40610

2

sorting the vector can be done in O(n log(n)), not sure if you can get anything more efficient
– user463035818
Dec 18 at 14:41

2

Is the vector sorted? For the set it is trivial since it is sorted.
– NathanOliver
Dec 18 at 14:41

1

@user463035818 if you can sort a vector in O(log(n)), you should become very rich indeed.
– Walter
Dec 18 at 14:43

1

@Walter meh it was a typo, no millions for me unfortunately
– user463035818
Dec 18 at 14:44

1

Once sorted, you do: Last + 1?
– JVApen
Dec 18 at 14:47

|
show 13 more comments

Related to the classic problem find an integer not among four billion given ones but not exactly the same.

If there is no worry for overflow, then I could multiply all nonzero ones together and add the product by 1. But there is. The adversary test cases could delibrately contain INT_MAX.

I am more in favor of simple, non-random approaches. Is there any?

Thank you!

edited Dec 18 at 15:17

asked Dec 18 at 14:37

fleix

40610

Related to the classic problem find an integer not among four billion given ones but not exactly the same.

If there is no worry for overflow, then I could multiply all nonzero ones together and add the product by 1. But there is. The adversary test cases could delibrately contain INT_MAX.

I am more in favor of simple, non-random approaches. Is there any?

Thank you!

c++ algorithm

edited Dec 18 at 15:17

asked Dec 18 at 14:37

fleix

40610

edited Dec 18 at 15:17

asked Dec 18 at 14:37

fleix

40610

edited Dec 18 at 15:17

asked Dec 18 at 14:37

fleix

40610

asked Dec 18 at 14:37

fleix

40610

asked Dec 18 at 14:37

fleix

40610

2

sorting the vector can be done in O(n log(n)), not sure if you can get anything more efficient
– user463035818
Dec 18 at 14:41

2

Is the vector sorted? For the set it is trivial since it is sorted.
– NathanOliver
Dec 18 at 14:41

1

@user463035818 if you can sort a vector in O(log(n)), you should become very rich indeed.
– Walter
Dec 18 at 14:43

1

@Walter meh it was a typo, no millions for me unfortunately
– user463035818
Dec 18 at 14:44

1

Once sorted, you do: Last + 1?
– JVApen
Dec 18 at 14:47

|
show 13 more comments

2

sorting the vector can be done in O(n log(n)), not sure if you can get anything more efficient
– user463035818
Dec 18 at 14:41

2

Is the vector sorted? For the set it is trivial since it is sorted.
– NathanOliver
Dec 18 at 14:41

1

@user463035818 if you can sort a vector in O(log(n)), you should become very rich indeed.
– Walter
Dec 18 at 14:43

1

@Walter meh it was a typo, no millions for me unfortunately
– user463035818
Dec 18 at 14:44

1

Once sorted, you do: Last + 1?
– JVApen
Dec 18 at 14:47

sorting the vector can be done in O(n log(n)), not sure if you can get anything more efficient
– user463035818
Dec 18 at 14:41

Is the vector sorted? For the set it is trivial since it is sorted.
– NathanOliver
Dec 18 at 14:41

@user463035818 if you can sort a vector in O(log(n)), you should become very rich indeed.
– Walter
Dec 18 at 14:43

@Walter meh it was a typo, no millions for me unfortunately
– user463035818
Dec 18 at 14:44

Once sorted, you do: Last + 1?
– JVApen
Dec 18 at 14:47

|
show 13 more comments

6 Answers
6

active

oldest

votes

You could just return the first of N+1 candidate integers not contained in your input. The simplest candidates are the numbers 0 to N. This requires O(N) space and time.

 int find_not_contained(container<int> const&data)

 {

     const int N=data.size();

     std::vector<char> known(N+1, 0);   // one more candidates than data

     for(int i=0; i< N; ++i)

         if(data[i]>=0 && data[i]<=N)

             known[data[i]]=1;

     for(int i=0; i<=N; ++i)

         if(!known[i])

             return i;

     assert(false);                     // should never be reached.

 }

Random methods can be more space efficient, but may require more passes over the data in the worst case.

edited Dec 18 at 15:34

answered Dec 18 at 15:15

Walter

27k1375151

2

I like this. It avoids sorting and searching.
– fleix
Dec 18 at 15:50

3

This is the standard algorithm for finding the smallest integer not in a given set
– DreamConspiracy
Dec 18 at 22:20

@DreamConspiracy That makes sense. I didn't know that, though.
– Walter
Dec 19 at 0:39

1

Fun fact: on a machine with scatter stores, like x86 with AVX512F, this algorithm can be vectorized. (At least with x86-style scatters where a mask controls which elements are actually scattered). And unlike a histogram problem where you have to do conflict detection for multiple vector elements hitting the same index, you're only ever storing a 1 so it's fine. (x86 can only scatter with 32 or 64-bit granularity, so you'd have to promote known to int, which may not be worth it for large N. With large enough N, you'd want to use a bitmap instead of a char array.)
– Peter Cordes
Dec 19 at 3:59

The search at the end can be vectorized easily, using whatever tricks are applicable for a hand-optimized strlen. (e.g. modern x86 should be able to check 16 to 64 bytes per clock cycle, with a well-tuned loop using SSE2 or AVX2, depending on the hardware.)
– Peter Cordes
Dec 19 at 4:03

add a comment |

Random methods are indeed very efficient here.

If we want to use a deterministic method and by assuming the size n is not too large, 4000 for example, then we can create a vector x of size m = n + 1 (or a little bit larger, 4096 for example to facilitate calculation), initialised with 0.

For each i in the range, we just set x[array[i] modulo m] = 1.

Then a simple O(n) search in x will provide a value which is not in array

Note: the modulo operation is not exactly the "%" operation

Edit: I mentioned that calculations are made easier by selecting here a size of 4096. To be more concrete, this implies that the modulo operation is performed with a simple & operation

edited Dec 18 at 15:30

answered Dec 18 at 15:12

Damien

59415

add a comment |

You can find the smallest unused integer in O(N) time using O(1) auxiliary space if you are allowed to reorder the input vector, using the following algorithm. [Note 1] (The algorithm also works if the vector contains repeated data.)

size_t smallest_unused(std::vector<unsigned>& data) {

  size_t N = data.size(), scan = 0;

  while (scan < N) {

    auto other = data[scan];

    if (other < scan && data[other] != other) {

      data[scan] = data[other];

      data[other] = other;

    }

    else

      ++scan;

  }

  for (scan = 0; scan < N && data[scan] == scan; ++scan) { }

  return scan;

}

The first pass guarantees that if some k in the range [0, N) was found after position k, then it is now present at position k. This rearrangement is done by swapping in order to avoid losing data. Once that scan is complete, the first entry whose value is not the same as its index is not referenced anywhere in the array.

That assertion may not be 100% obvious, since a entry could be referenced from an earlier index. However, in that case the entry could not be the first entry unequal to its index, since the earlier entry would be meet that criterion.

To see that this algorithm is O(N), it should be observed that the swap at lines 6 and 7 can only happen if the target entry is not equal to its index, and that after the swap the target entry is equal to its index. So at most N swaps can be performed, and the if condition at line 5 will be true at most N times. On the other hand, if the if condition is false, scan will be incremented, which can also only happen N times. So the if statement is evaluated at most 2N times (which is O(N)).

Notes:

I used unsigned integers here because it makes the code clearer. The algorithm can easily be adjusted for signed integers, for example by mapping signed integers from [INT_MIN, 0) onto unsigned integers [INT_MAX, INT_MAX - INT_MIN) (The subtraction is mathematical, not according to C semantics which wouldn't allow the result to be represented.) In 2's-complement, that's the same bit pattern. That changes the order of the numbers, of course, which affects the semantics of "smallest unused integer"; an order-preserving mapping could also be used.

edited Dec 19 at 2:18

answered Dec 18 at 20:13

rici

151k19131196

To handle OP's requirement smallest positive unused is sufficient, so an even easier way to handle signed integers is to not swap if the number is negative.
– Taemyr
Dec 19 at 14:13

@taemyr: that's what happens if you use the "2's complement" mapping of negative integers, which doesn't require an extra test (assuming C in which the cast is a no-op). In general, if you want to find the smallest available number in some range, you can ignore any entries outside of that range and subtracting the beginning of the range from useful entries. I was going to add that observation but I was mostly interested in the elegance of the original algorithm.
– rici
Dec 19 at 14:18

add a comment |

Make random x (INT_MIN..INT_MAX) and test it against all. Test x++ on failure (very rare case for 40/400/4000).

answered Dec 18 at 14:52

Alexey Birukov

465213

2

He is looking for an efficient way to do it, this would be O(n^2). Yes, it will be way better most times, but still very inefficient worst-case (and OP specifically asked for O, that's for worst case)
– dquijada
Dec 18 at 14:54

1

@dquijada No, it isn't. Big O notation describes the asymptotic behavior of a function, but it does not mean it is the worst-case of anything. In other words, an algorithm may have different upper bounds depending on the input you are studying.
– Acorn
Dec 18 at 15:15

1

@Acorn Typically with big-O notation you describe the worst-case time complexity of an algorithm, i.e. the worst time it can have for all inputs. Sure, a subset of the inputs may run in with a smaller complexity but worst-case means considering all possibilities. Big-O can also be used to describe the average-case complexity. However, as far as I know, the "default meaning" for the sentence "This algorithm takes O(f(n))" time is that O(f(n)) is the worst-case scenario, not the average case.
– Bakuriu
Dec 18 at 23:31

1

@Bakuriu Big O notation has nothing to do with the type of complexity analyzed. It may not even be time complexity. While you can, of course, argue that one may assume we are talking about worst-case time complexity, dquijada said that "OP specifically asked for O, that's for worst case", which is simply not true. OP didn't specifically say anything about worst-case (quite the opposite: we can only assume), nor "O" means worst-case either (that is not true at all).
– Acorn
Dec 18 at 23:35

3

Further, in this specific problem, this solution is actually very efficient in practice, because you almost never hit the worst case (and/or you can simply fallback to another solution if you detect you may be hitting it, e.g. after a few tries; giving you back a worst-case O(n) algorithm easily). So dismissing this solution by saying it is O(n^2) and claiming that OP wanted a good worst-case algorithm, is simply wrong.
– Acorn
Dec 18 at 23:42

|
show 4 more comments

Step 1: Sort the vector.

That can be done in O(n log(n)), you can find a few different algorithms online, use the one you like the most.

Step 2: Find the first int not in the vector.

Easily iterate from INT_MIN to INT_MIN + 40/400/4000 checking if the vector has the current int:

Pseudocode:

SIZE = 40|400|4000 // The one you are using

for (int i = 0; i < SIZE; i++) {

    if (array[i] != INT_MIN + i)

        return INT_MIN + i;

The solution would be O(n log(n) + n) meaning: O(n log(n))

Edit: just read your edit asking for something better than O(n log(n)), sorry.

edited Dec 18 at 15:14

answered Dec 18 at 14:52

dquijada

865820

Thanks for the answer. A minor point: though it being pseudocode, it doesn't seem array[i] would make any sense when i starts from INT_MIN.
– fleix
Dec 18 at 15:10

You are right, @fleix, it's fixed now
– dquijada
Dec 18 at 15:14

1

You can do the search using binary search in O(log N).
– rici
Dec 18 at 20:25

1

@rici: how would you binary search for "a missing number"? If you knew a number that was missing, you could find the position it belongs in a sorted array in log(n) time. I don't think you can do better than linear, although probably with a low constant factor. With a uniform distribution, a linear search will hit a gap in O(n / max_n) time on average, but worst case n (no gaps until the end.)
– Peter Cordes
Dec 18 at 23:50

2

@peter: the same way you do a linear search, compare i + min with vec[i]. If they are unequal, there is a missing value before i, so you bisect down; otherwise you bisect up. Binary search works with any monotonic predicate.
– rici
Dec 19 at 2:05

|
show 4 more comments

For the case in which the integers are provided in an std::unordered_set<int> (as opposed to a std::vector<int>), you could simply traverse the range of integer values until you come up against one integer value that is not present in the unordered_set<int>. Searching for the presence of an integer in an std::unordered_set<int> is quite straightforward, since std::unodered_set does provide searching through its find() member function.

The space complexity of this approach would be O(1).

If you start traversing at the lowest possible value for an int (i.e., std::numeric_limits<int>::min()), you will obtain the lowest int not contained in the std::unordered_set<int>:

int find_lowest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::min(); ; ++i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

Analogously, if you start traversing at the greatest possible value for an int (i.e., std::numeric_limits<int>::max()), you will obtain the lowest int not contained in the std::unordered_set<int>:

int find_greatest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::max(); ; --i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

Assuming that the ints are uniformly mapped by the hash function into the unordered_set<int>'s buckets, a search operation on the unordered_set<int> can be achieved in constant time. The run-time complexity would then be O(M ), where M is the size of the integer range you are looking for a non-contained value. M is upper-bounded by the size of the unordered_set<int> (i.e., in your case M <= 4000).

Indeed, with this approach, selecting any integer range whose size is greater than the size of the unordered_set, is guaranteed to come up against an integer value which is not present in the unordered_set<int>.

edited Dec 19 at 9:50

answered Dec 19 at 9:18

ネロク

10.5k32140

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53835354%2fefficiently-find-an-integer-not-in-a-set-of-size-40-400-or-4000%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

You could just return the first of N+1 candidate integers not contained in your input. The simplest candidates are the numbers 0 to N. This requires O(N) space and time.

 int find_not_contained(container<int> const&data)

 {

     const int N=data.size();

     std::vector<char> known(N+1, 0);   // one more candidates than data

     for(int i=0; i< N; ++i)

         if(data[i]>=0 && data[i]<=N)

             known[data[i]]=1;

     for(int i=0; i<=N; ++i)

         if(!known[i])

             return i;

     assert(false);                     // should never be reached.

 }

Random methods can be more space efficient, but may require more passes over the data in the worst case.

edited Dec 18 at 15:34

answered Dec 18 at 15:15

Walter

27k1375151

2

I like this. It avoids sorting and searching.
– fleix
Dec 18 at 15:50

3

This is the standard algorithm for finding the smallest integer not in a given set
– DreamConspiracy
Dec 18 at 22:20

@DreamConspiracy That makes sense. I didn't know that, though.
– Walter
Dec 19 at 0:39

1

Fun fact: on a machine with scatter stores, like x86 with AVX512F, this algorithm can be vectorized. (At least with x86-style scatters where a mask controls which elements are actually scattered). And unlike a histogram problem where you have to do conflict detection for multiple vector elements hitting the same index, you're only ever storing a 1 so it's fine. (x86 can only scatter with 32 or 64-bit granularity, so you'd have to promote known to int, which may not be worth it for large N. With large enough N, you'd want to use a bitmap instead of a char array.)
– Peter Cordes
Dec 19 at 3:59

The search at the end can be vectorized easily, using whatever tricks are applicable for a hand-optimized strlen. (e.g. modern x86 should be able to check 16 to 64 bytes per clock cycle, with a well-tuned loop using SSE2 or AVX2, depending on the hardware.)
– Peter Cordes
Dec 19 at 4:03

add a comment |

You could just return the first of N+1 candidate integers not contained in your input. The simplest candidates are the numbers 0 to N. This requires O(N) space and time.

 int find_not_contained(container<int> const&data)

 {

     const int N=data.size();

     std::vector<char> known(N+1, 0);   // one more candidates than data

     for(int i=0; i< N; ++i)

         if(data[i]>=0 && data[i]<=N)

             known[data[i]]=1;

     for(int i=0; i<=N; ++i)

         if(!known[i])

             return i;

     assert(false);                     // should never be reached.

 }

Random methods can be more space efficient, but may require more passes over the data in the worst case.

edited Dec 18 at 15:34

answered Dec 18 at 15:15

Walter

27k1375151

2

I like this. It avoids sorting and searching.
– fleix
Dec 18 at 15:50

3

This is the standard algorithm for finding the smallest integer not in a given set
– DreamConspiracy
Dec 18 at 22:20

@DreamConspiracy That makes sense. I didn't know that, though.
– Walter
Dec 19 at 0:39

1

Fun fact: on a machine with scatter stores, like x86 with AVX512F, this algorithm can be vectorized. (At least with x86-style scatters where a mask controls which elements are actually scattered). And unlike a histogram problem where you have to do conflict detection for multiple vector elements hitting the same index, you're only ever storing a 1 so it's fine. (x86 can only scatter with 32 or 64-bit granularity, so you'd have to promote known to int, which may not be worth it for large N. With large enough N, you'd want to use a bitmap instead of a char array.)
– Peter Cordes
Dec 19 at 3:59

The search at the end can be vectorized easily, using whatever tricks are applicable for a hand-optimized strlen. (e.g. modern x86 should be able to check 16 to 64 bytes per clock cycle, with a well-tuned loop using SSE2 or AVX2, depending on the hardware.)
– Peter Cordes
Dec 19 at 4:03

add a comment |

You could just return the first of N+1 candidate integers not contained in your input. The simplest candidates are the numbers 0 to N. This requires O(N) space and time.

 int find_not_contained(container<int> const&data)

 {

     const int N=data.size();

     std::vector<char> known(N+1, 0);   // one more candidates than data

     for(int i=0; i< N; ++i)

         if(data[i]>=0 && data[i]<=N)

             known[data[i]]=1;

     for(int i=0; i<=N; ++i)

         if(!known[i])

             return i;

     assert(false);                     // should never be reached.

 }

Random methods can be more space efficient, but may require more passes over the data in the worst case.

edited Dec 18 at 15:34

answered Dec 18 at 15:15

Walter

27k1375151

You could just return the first of N+1 candidate integers not contained in your input. The simplest candidates are the numbers 0 to N. This requires O(N) space and time.

 int find_not_contained(container<int> const&data)

 {

     const int N=data.size();

     std::vector<char> known(N+1, 0);   // one more candidates than data

     for(int i=0; i< N; ++i)

         if(data[i]>=0 && data[i]<=N)

             known[data[i]]=1;

     for(int i=0; i<=N; ++i)

         if(!known[i])

             return i;

     assert(false);                     // should never be reached.

 }

Random methods can be more space efficient, but may require more passes over the data in the worst case.

edited Dec 18 at 15:34

answered Dec 18 at 15:15

Walter

27k1375151

edited Dec 18 at 15:34

answered Dec 18 at 15:15

Walter

27k1375151

answered Dec 18 at 15:15

Walter

27k1375151

answered Dec 18 at 15:15

Walter

27k1375151

2

I like this. It avoids sorting and searching.
– fleix
Dec 18 at 15:50

3

This is the standard algorithm for finding the smallest integer not in a given set
– DreamConspiracy
Dec 18 at 22:20

@DreamConspiracy That makes sense. I didn't know that, though.
– Walter
Dec 19 at 0:39

1

Fun fact: on a machine with scatter stores, like x86 with AVX512F, this algorithm can be vectorized. (At least with x86-style scatters where a mask controls which elements are actually scattered). And unlike a histogram problem where you have to do conflict detection for multiple vector elements hitting the same index, you're only ever storing a 1 so it's fine. (x86 can only scatter with 32 or 64-bit granularity, so you'd have to promote known to int, which may not be worth it for large N. With large enough N, you'd want to use a bitmap instead of a char array.)
– Peter Cordes
Dec 19 at 3:59

The search at the end can be vectorized easily, using whatever tricks are applicable for a hand-optimized strlen. (e.g. modern x86 should be able to check 16 to 64 bytes per clock cycle, with a well-tuned loop using SSE2 or AVX2, depending on the hardware.)
– Peter Cordes
Dec 19 at 4:03

add a comment |

2

I like this. It avoids sorting and searching.
– fleix
Dec 18 at 15:50

3

This is the standard algorithm for finding the smallest integer not in a given set
– DreamConspiracy
Dec 18 at 22:20

@DreamConspiracy That makes sense. I didn't know that, though.
– Walter
Dec 19 at 0:39

1

Fun fact: on a machine with scatter stores, like x86 with AVX512F, this algorithm can be vectorized. (At least with x86-style scatters where a mask controls which elements are actually scattered). And unlike a histogram problem where you have to do conflict detection for multiple vector elements hitting the same index, you're only ever storing a 1 so it's fine. (x86 can only scatter with 32 or 64-bit granularity, so you'd have to promote known to int, which may not be worth it for large N. With large enough N, you'd want to use a bitmap instead of a char array.)
– Peter Cordes
Dec 19 at 3:59

The search at the end can be vectorized easily, using whatever tricks are applicable for a hand-optimized strlen. (e.g. modern x86 should be able to check 16 to 64 bytes per clock cycle, with a well-tuned loop using SSE2 or AVX2, depending on the hardware.)
– Peter Cordes
Dec 19 at 4:03

I like this. It avoids sorting and searching.
– fleix
Dec 18 at 15:50

This is the standard algorithm for finding the smallest integer not in a given set
– DreamConspiracy
Dec 18 at 22:20

@DreamConspiracy That makes sense. I didn't know that, though.
– Walter
Dec 19 at 0:39

Fun fact: on a machine with scatter stores, like x86 with AVX512F, this algorithm can be vectorized. (At least with x86-style scatters where a mask controls which elements are actually scattered). And unlike a histogram problem where you have to do conflict detection for multiple vector elements hitting the same index, you're only ever storing a 1 so it's fine. (x86 can only scatter with 32 or 64-bit granularity, so you'd have to promote known to int, which may not be worth it for large N. With large enough N, you'd want to use a bitmap instead of a char array.)
– Peter Cordes
Dec 19 at 3:59

The search at the end can be vectorized easily, using whatever tricks are applicable for a hand-optimized strlen. (e.g. modern x86 should be able to check 16 to 64 bytes per clock cycle, with a well-tuned loop using SSE2 or AVX2, depending on the hardware.)
– Peter Cordes
Dec 19 at 4:03

add a comment |

Random methods are indeed very efficient here.

For each i in the range, we just set x[array[i] modulo m] = 1.

Then a simple O(n) search in x will provide a value which is not in array

Note: the modulo operation is not exactly the "%" operation

Edit: I mentioned that calculations are made easier by selecting here a size of 4096. To be more concrete, this implies that the modulo operation is performed with a simple & operation

edited Dec 18 at 15:30

answered Dec 18 at 15:12

Damien

59415

add a comment |

Random methods are indeed very efficient here.

For each i in the range, we just set x[array[i] modulo m] = 1.

Then a simple O(n) search in x will provide a value which is not in array

Note: the modulo operation is not exactly the "%" operation

Edit: I mentioned that calculations are made easier by selecting here a size of 4096. To be more concrete, this implies that the modulo operation is performed with a simple & operation

edited Dec 18 at 15:30

answered Dec 18 at 15:12

Damien

59415

add a comment |

Random methods are indeed very efficient here.

For each i in the range, we just set x[array[i] modulo m] = 1.

Then a simple O(n) search in x will provide a value which is not in array

Note: the modulo operation is not exactly the "%" operation

Edit: I mentioned that calculations are made easier by selecting here a size of 4096. To be more concrete, this implies that the modulo operation is performed with a simple & operation

edited Dec 18 at 15:30

answered Dec 18 at 15:12

Damien

59415

Random methods are indeed very efficient here.

For each i in the range, we just set x[array[i] modulo m] = 1.

Then a simple O(n) search in x will provide a value which is not in array

Note: the modulo operation is not exactly the "%" operation

Edit: I mentioned that calculations are made easier by selecting here a size of 4096. To be more concrete, this implies that the modulo operation is performed with a simple & operation

edited Dec 18 at 15:30

answered Dec 18 at 15:12

Damien

59415

edited Dec 18 at 15:30

answered Dec 18 at 15:12

Damien

59415

answered Dec 18 at 15:12

Damien

59415

answered Dec 18 at 15:12

Damien

59415

add a comment |

size_t smallest_unused(std::vector<unsigned>& data) {

  size_t N = data.size(), scan = 0;

  while (scan < N) {

    auto other = data[scan];

    if (other < scan && data[other] != other) {

      data[scan] = data[other];

      data[other] = other;

    }

    else

      ++scan;

  }

  for (scan = 0; scan < N && data[scan] == scan; ++scan) { }

  return scan;

}

Notes:

I used unsigned integers here because it makes the code clearer. The algorithm can easily be adjusted for signed integers, for example by mapping signed integers from [INT_MIN, 0) onto unsigned integers [INT_MAX, INT_MAX - INT_MIN) (The subtraction is mathematical, not according to C semantics which wouldn't allow the result to be represented.) In 2's-complement, that's the same bit pattern. That changes the order of the numbers, of course, which affects the semantics of "smallest unused integer"; an order-preserving mapping could also be used.

edited Dec 19 at 2:18

answered Dec 18 at 20:13

rici

151k19131196

To handle OP's requirement smallest positive unused is sufficient, so an even easier way to handle signed integers is to not swap if the number is negative.
– Taemyr
Dec 19 at 14:13

@taemyr: that's what happens if you use the "2's complement" mapping of negative integers, which doesn't require an extra test (assuming C in which the cast is a no-op). In general, if you want to find the smallest available number in some range, you can ignore any entries outside of that range and subtracting the beginning of the range from useful entries. I was going to add that observation but I was mostly interested in the elegance of the original algorithm.
– rici
Dec 19 at 14:18

add a comment |

size_t smallest_unused(std::vector<unsigned>& data) {

  size_t N = data.size(), scan = 0;

  while (scan < N) {

    auto other = data[scan];

    if (other < scan && data[other] != other) {

      data[scan] = data[other];

      data[other] = other;

    }

    else

      ++scan;

  }

  for (scan = 0; scan < N && data[scan] == scan; ++scan) { }

  return scan;

}

Notes:

I used unsigned integers here because it makes the code clearer. The algorithm can easily be adjusted for signed integers, for example by mapping signed integers from [INT_MIN, 0) onto unsigned integers [INT_MAX, INT_MAX - INT_MIN) (The subtraction is mathematical, not according to C semantics which wouldn't allow the result to be represented.) In 2's-complement, that's the same bit pattern. That changes the order of the numbers, of course, which affects the semantics of "smallest unused integer"; an order-preserving mapping could also be used.

edited Dec 19 at 2:18

answered Dec 18 at 20:13

rici

151k19131196

To handle OP's requirement smallest positive unused is sufficient, so an even easier way to handle signed integers is to not swap if the number is negative.
– Taemyr
Dec 19 at 14:13

@taemyr: that's what happens if you use the "2's complement" mapping of negative integers, which doesn't require an extra test (assuming C in which the cast is a no-op). In general, if you want to find the smallest available number in some range, you can ignore any entries outside of that range and subtracting the beginning of the range from useful entries. I was going to add that observation but I was mostly interested in the elegance of the original algorithm.
– rici
Dec 19 at 14:18

add a comment |

size_t smallest_unused(std::vector<unsigned>& data) {

  size_t N = data.size(), scan = 0;

  while (scan < N) {

    auto other = data[scan];

    if (other < scan && data[other] != other) {

      data[scan] = data[other];

      data[other] = other;

    }

    else

      ++scan;

  }

  for (scan = 0; scan < N && data[scan] == scan; ++scan) { }

  return scan;

}

Notes:

I used unsigned integers here because it makes the code clearer. The algorithm can easily be adjusted for signed integers, for example by mapping signed integers from [INT_MIN, 0) onto unsigned integers [INT_MAX, INT_MAX - INT_MIN) (The subtraction is mathematical, not according to C semantics which wouldn't allow the result to be represented.) In 2's-complement, that's the same bit pattern. That changes the order of the numbers, of course, which affects the semantics of "smallest unused integer"; an order-preserving mapping could also be used.

edited Dec 19 at 2:18

answered Dec 18 at 20:13

rici

151k19131196

size_t smallest_unused(std::vector<unsigned>& data) {

  size_t N = data.size(), scan = 0;

  while (scan < N) {

    auto other = data[scan];

    if (other < scan && data[other] != other) {

      data[scan] = data[other];

      data[other] = other;

    }

    else

      ++scan;

  }

  for (scan = 0; scan < N && data[scan] == scan; ++scan) { }

  return scan;

}

Notes:

I used unsigned integers here because it makes the code clearer. The algorithm can easily be adjusted for signed integers, for example by mapping signed integers from [INT_MIN, 0) onto unsigned integers [INT_MAX, INT_MAX - INT_MIN) (The subtraction is mathematical, not according to C semantics which wouldn't allow the result to be represented.) In 2's-complement, that's the same bit pattern. That changes the order of the numbers, of course, which affects the semantics of "smallest unused integer"; an order-preserving mapping could also be used.

edited Dec 19 at 2:18

answered Dec 18 at 20:13

rici

151k19131196

edited Dec 19 at 2:18

answered Dec 18 at 20:13

rici

151k19131196

answered Dec 18 at 20:13

rici

151k19131196

answered Dec 18 at 20:13

rici

151k19131196

To handle OP's requirement smallest positive unused is sufficient, so an even easier way to handle signed integers is to not swap if the number is negative.
– Taemyr
Dec 19 at 14:13

@taemyr: that's what happens if you use the "2's complement" mapping of negative integers, which doesn't require an extra test (assuming C in which the cast is a no-op). In general, if you want to find the smallest available number in some range, you can ignore any entries outside of that range and subtracting the beginning of the range from useful entries. I was going to add that observation but I was mostly interested in the elegance of the original algorithm.
– rici
Dec 19 at 14:18

add a comment |

To handle OP's requirement smallest positive unused is sufficient, so an even easier way to handle signed integers is to not swap if the number is negative.
– Taemyr
Dec 19 at 14:13

@taemyr: that's what happens if you use the "2's complement" mapping of negative integers, which doesn't require an extra test (assuming C in which the cast is a no-op). In general, if you want to find the smallest available number in some range, you can ignore any entries outside of that range and subtracting the beginning of the range from useful entries. I was going to add that observation but I was mostly interested in the elegance of the original algorithm.
– rici
Dec 19 at 14:18

To handle OP's requirement smallest positive unused is sufficient, so an even easier way to handle signed integers is to not swap if the number is negative.
– Taemyr
Dec 19 at 14:13

@taemyr: that's what happens if you use the "2's complement" mapping of negative integers, which doesn't require an extra test (assuming C in which the cast is a no-op). In general, if you want to find the smallest available number in some range, you can ignore any entries outside of that range and subtracting the beginning of the range from useful entries. I was going to add that observation but I was mostly interested in the elegance of the original algorithm.
– rici
Dec 19 at 14:18

add a comment |

Make random x (INT_MIN..INT_MAX) and test it against all. Test x++ on failure (very rare case for 40/400/4000).

answered Dec 18 at 14:52

Alexey Birukov

465213

2

He is looking for an efficient way to do it, this would be O(n^2). Yes, it will be way better most times, but still very inefficient worst-case (and OP specifically asked for O, that's for worst case)
– dquijada
Dec 18 at 14:54

1

@dquijada No, it isn't. Big O notation describes the asymptotic behavior of a function, but it does not mean it is the worst-case of anything. In other words, an algorithm may have different upper bounds depending on the input you are studying.
– Acorn
Dec 18 at 15:15

1

@Acorn Typically with big-O notation you describe the worst-case time complexity of an algorithm, i.e. the worst time it can have for all inputs. Sure, a subset of the inputs may run in with a smaller complexity but worst-case means considering all possibilities. Big-O can also be used to describe the average-case complexity. However, as far as I know, the "default meaning" for the sentence "This algorithm takes O(f(n))" time is that O(f(n)) is the worst-case scenario, not the average case.
– Bakuriu
Dec 18 at 23:31

1

@Bakuriu Big O notation has nothing to do with the type of complexity analyzed. It may not even be time complexity. While you can, of course, argue that one may assume we are talking about worst-case time complexity, dquijada said that "OP specifically asked for O, that's for worst case", which is simply not true. OP didn't specifically say anything about worst-case (quite the opposite: we can only assume), nor "O" means worst-case either (that is not true at all).
– Acorn
Dec 18 at 23:35

3

Further, in this specific problem, this solution is actually very efficient in practice, because you almost never hit the worst case (and/or you can simply fallback to another solution if you detect you may be hitting it, e.g. after a few tries; giving you back a worst-case O(n) algorithm easily). So dismissing this solution by saying it is O(n^2) and claiming that OP wanted a good worst-case algorithm, is simply wrong.
– Acorn
Dec 18 at 23:42

|
show 4 more comments

Make random x (INT_MIN..INT_MAX) and test it against all. Test x++ on failure (very rare case for 40/400/4000).

answered Dec 18 at 14:52

Alexey Birukov

465213

2

He is looking for an efficient way to do it, this would be O(n^2). Yes, it will be way better most times, but still very inefficient worst-case (and OP specifically asked for O, that's for worst case)
– dquijada
Dec 18 at 14:54

1

@dquijada No, it isn't. Big O notation describes the asymptotic behavior of a function, but it does not mean it is the worst-case of anything. In other words, an algorithm may have different upper bounds depending on the input you are studying.
– Acorn
Dec 18 at 15:15

1

@Acorn Typically with big-O notation you describe the worst-case time complexity of an algorithm, i.e. the worst time it can have for all inputs. Sure, a subset of the inputs may run in with a smaller complexity but worst-case means considering all possibilities. Big-O can also be used to describe the average-case complexity. However, as far as I know, the "default meaning" for the sentence "This algorithm takes O(f(n))" time is that O(f(n)) is the worst-case scenario, not the average case.
– Bakuriu
Dec 18 at 23:31

1

@Bakuriu Big O notation has nothing to do with the type of complexity analyzed. It may not even be time complexity. While you can, of course, argue that one may assume we are talking about worst-case time complexity, dquijada said that "OP specifically asked for O, that's for worst case", which is simply not true. OP didn't specifically say anything about worst-case (quite the opposite: we can only assume), nor "O" means worst-case either (that is not true at all).
– Acorn
Dec 18 at 23:35

3

Further, in this specific problem, this solution is actually very efficient in practice, because you almost never hit the worst case (and/or you can simply fallback to another solution if you detect you may be hitting it, e.g. after a few tries; giving you back a worst-case O(n) algorithm easily). So dismissing this solution by saying it is O(n^2) and claiming that OP wanted a good worst-case algorithm, is simply wrong.
– Acorn
Dec 18 at 23:42

|
show 4 more comments

Make random x (INT_MIN..INT_MAX) and test it against all. Test x++ on failure (very rare case for 40/400/4000).

answered Dec 18 at 14:52

Alexey Birukov

465213

Make random x (INT_MIN..INT_MAX) and test it against all. Test x++ on failure (very rare case for 40/400/4000).

answered Dec 18 at 14:52

Alexey Birukov

465213

answered Dec 18 at 14:52

Alexey Birukov

465213

answered Dec 18 at 14:52

Alexey Birukov

465213

answered Dec 18 at 14:52

Alexey Birukov

465213

2

He is looking for an efficient way to do it, this would be O(n^2). Yes, it will be way better most times, but still very inefficient worst-case (and OP specifically asked for O, that's for worst case)
– dquijada
Dec 18 at 14:54

1

@dquijada No, it isn't. Big O notation describes the asymptotic behavior of a function, but it does not mean it is the worst-case of anything. In other words, an algorithm may have different upper bounds depending on the input you are studying.
– Acorn
Dec 18 at 15:15

1

@Acorn Typically with big-O notation you describe the worst-case time complexity of an algorithm, i.e. the worst time it can have for all inputs. Sure, a subset of the inputs may run in with a smaller complexity but worst-case means considering all possibilities. Big-O can also be used to describe the average-case complexity. However, as far as I know, the "default meaning" for the sentence "This algorithm takes O(f(n))" time is that O(f(n)) is the worst-case scenario, not the average case.
– Bakuriu
Dec 18 at 23:31

1

@Bakuriu Big O notation has nothing to do with the type of complexity analyzed. It may not even be time complexity. While you can, of course, argue that one may assume we are talking about worst-case time complexity, dquijada said that "OP specifically asked for O, that's for worst case", which is simply not true. OP didn't specifically say anything about worst-case (quite the opposite: we can only assume), nor "O" means worst-case either (that is not true at all).
– Acorn
Dec 18 at 23:35

3

Further, in this specific problem, this solution is actually very efficient in practice, because you almost never hit the worst case (and/or you can simply fallback to another solution if you detect you may be hitting it, e.g. after a few tries; giving you back a worst-case O(n) algorithm easily). So dismissing this solution by saying it is O(n^2) and claiming that OP wanted a good worst-case algorithm, is simply wrong.
– Acorn
Dec 18 at 23:42

|
show 4 more comments

2

He is looking for an efficient way to do it, this would be O(n^2). Yes, it will be way better most times, but still very inefficient worst-case (and OP specifically asked for O, that's for worst case)
– dquijada
Dec 18 at 14:54

1

@dquijada No, it isn't. Big O notation describes the asymptotic behavior of a function, but it does not mean it is the worst-case of anything. In other words, an algorithm may have different upper bounds depending on the input you are studying.
– Acorn
Dec 18 at 15:15

1

@Acorn Typically with big-O notation you describe the worst-case time complexity of an algorithm, i.e. the worst time it can have for all inputs. Sure, a subset of the inputs may run in with a smaller complexity but worst-case means considering all possibilities. Big-O can also be used to describe the average-case complexity. However, as far as I know, the "default meaning" for the sentence "This algorithm takes O(f(n))" time is that O(f(n)) is the worst-case scenario, not the average case.
– Bakuriu
Dec 18 at 23:31

1

@Bakuriu Big O notation has nothing to do with the type of complexity analyzed. It may not even be time complexity. While you can, of course, argue that one may assume we are talking about worst-case time complexity, dquijada said that "OP specifically asked for O, that's for worst case", which is simply not true. OP didn't specifically say anything about worst-case (quite the opposite: we can only assume), nor "O" means worst-case either (that is not true at all).
– Acorn
Dec 18 at 23:35

3

Further, in this specific problem, this solution is actually very efficient in practice, because you almost never hit the worst case (and/or you can simply fallback to another solution if you detect you may be hitting it, e.g. after a few tries; giving you back a worst-case O(n) algorithm easily). So dismissing this solution by saying it is O(n^2) and claiming that OP wanted a good worst-case algorithm, is simply wrong.
– Acorn
Dec 18 at 23:42

He is looking for an efficient way to do it, this would be O(n^2). Yes, it will be way better most times, but still very inefficient worst-case (and OP specifically asked for O, that's for worst case)
– dquijada
Dec 18 at 14:54

@dquijada No, it isn't. Big O notation describes the asymptotic behavior of a function, but it does not mean it is the worst-case of anything. In other words, an algorithm may have different upper bounds depending on the input you are studying.
– Acorn
Dec 18 at 15:15

@Acorn Typically with big-O notation you describe the worst-case time complexity of an algorithm, i.e. the worst time it can have for all inputs. Sure, a subset of the inputs may run in with a smaller complexity but worst-case means considering all possibilities. Big-O can also be used to describe the average-case complexity. However, as far as I know, the "default meaning" for the sentence "This algorithm takes O(f(n))" time is that O(f(n)) is the worst-case scenario, not the average case.
– Bakuriu
Dec 18 at 23:31

@Bakuriu Big O notation has nothing to do with the type of complexity analyzed. It may not even be time complexity. While you can, of course, argue that one may assume we are talking about worst-case time complexity, dquijada said that "OP specifically asked for O, that's for worst case", which is simply not true. OP didn't specifically say anything about worst-case (quite the opposite: we can only assume), nor "O" means worst-case either (that is not true at all).
– Acorn
Dec 18 at 23:35

Further, in this specific problem, this solution is actually very efficient in practice, because you almost never hit the worst case (and/or you can simply fallback to another solution if you detect you may be hitting it, e.g. after a few tries; giving you back a worst-case O(n) algorithm easily). So dismissing this solution by saying it is O(n^2) and claiming that OP wanted a good worst-case algorithm, is simply wrong.
– Acorn
Dec 18 at 23:42

|
show 4 more comments

Step 1: Sort the vector.

That can be done in O(n log(n)), you can find a few different algorithms online, use the one you like the most.

Step 2: Find the first int not in the vector.

Easily iterate from INT_MIN to INT_MIN + 40/400/4000 checking if the vector has the current int:

Pseudocode:

SIZE = 40|400|4000 // The one you are using

for (int i = 0; i < SIZE; i++) {

    if (array[i] != INT_MIN + i)

        return INT_MIN + i;

The solution would be O(n log(n) + n) meaning: O(n log(n))

Edit: just read your edit asking for something better than O(n log(n)), sorry.

edited Dec 18 at 15:14

answered Dec 18 at 14:52

dquijada

865820

Thanks for the answer. A minor point: though it being pseudocode, it doesn't seem array[i] would make any sense when i starts from INT_MIN.
– fleix
Dec 18 at 15:10

You are right, @fleix, it's fixed now
– dquijada
Dec 18 at 15:14

1

You can do the search using binary search in O(log N).
– rici
Dec 18 at 20:25

1

@rici: how would you binary search for "a missing number"? If you knew a number that was missing, you could find the position it belongs in a sorted array in log(n) time. I don't think you can do better than linear, although probably with a low constant factor. With a uniform distribution, a linear search will hit a gap in O(n / max_n) time on average, but worst case n (no gaps until the end.)
– Peter Cordes
Dec 18 at 23:50

2

@peter: the same way you do a linear search, compare i + min with vec[i]. If they are unequal, there is a missing value before i, so you bisect down; otherwise you bisect up. Binary search works with any monotonic predicate.
– rici
Dec 19 at 2:05

|
show 4 more comments

Step 1: Sort the vector.

That can be done in O(n log(n)), you can find a few different algorithms online, use the one you like the most.

Step 2: Find the first int not in the vector.

Easily iterate from INT_MIN to INT_MIN + 40/400/4000 checking if the vector has the current int:

Pseudocode:

SIZE = 40|400|4000 // The one you are using

for (int i = 0; i < SIZE; i++) {

    if (array[i] != INT_MIN + i)

        return INT_MIN + i;

The solution would be O(n log(n) + n) meaning: O(n log(n))

Edit: just read your edit asking for something better than O(n log(n)), sorry.

edited Dec 18 at 15:14

answered Dec 18 at 14:52

dquijada

865820

Thanks for the answer. A minor point: though it being pseudocode, it doesn't seem array[i] would make any sense when i starts from INT_MIN.
– fleix
Dec 18 at 15:10

You are right, @fleix, it's fixed now
– dquijada
Dec 18 at 15:14

1

You can do the search using binary search in O(log N).
– rici
Dec 18 at 20:25

1

@rici: how would you binary search for "a missing number"? If you knew a number that was missing, you could find the position it belongs in a sorted array in log(n) time. I don't think you can do better than linear, although probably with a low constant factor. With a uniform distribution, a linear search will hit a gap in O(n / max_n) time on average, but worst case n (no gaps until the end.)
– Peter Cordes
Dec 18 at 23:50

2

@peter: the same way you do a linear search, compare i + min with vec[i]. If they are unequal, there is a missing value before i, so you bisect down; otherwise you bisect up. Binary search works with any monotonic predicate.
– rici
Dec 19 at 2:05

|
show 4 more comments

Step 1: Sort the vector.

That can be done in O(n log(n)), you can find a few different algorithms online, use the one you like the most.

Step 2: Find the first int not in the vector.

Easily iterate from INT_MIN to INT_MIN + 40/400/4000 checking if the vector has the current int:

Pseudocode:

SIZE = 40|400|4000 // The one you are using

for (int i = 0; i < SIZE; i++) {

    if (array[i] != INT_MIN + i)

        return INT_MIN + i;

The solution would be O(n log(n) + n) meaning: O(n log(n))

Edit: just read your edit asking for something better than O(n log(n)), sorry.

edited Dec 18 at 15:14

answered Dec 18 at 14:52

dquijada

865820

Step 1: Sort the vector.

That can be done in O(n log(n)), you can find a few different algorithms online, use the one you like the most.

Step 2: Find the first int not in the vector.

Easily iterate from INT_MIN to INT_MIN + 40/400/4000 checking if the vector has the current int:

Pseudocode:

SIZE = 40|400|4000 // The one you are using

for (int i = 0; i < SIZE; i++) {

    if (array[i] != INT_MIN + i)

        return INT_MIN + i;

The solution would be O(n log(n) + n) meaning: O(n log(n))

Edit: just read your edit asking for something better than O(n log(n)), sorry.

edited Dec 18 at 15:14

answered Dec 18 at 14:52

dquijada

865820

edited Dec 18 at 15:14

answered Dec 18 at 14:52

dquijada

865820

answered Dec 18 at 14:52

dquijada

865820

answered Dec 18 at 14:52

dquijada

865820

Thanks for the answer. A minor point: though it being pseudocode, it doesn't seem array[i] would make any sense when i starts from INT_MIN.
– fleix
Dec 18 at 15:10

You are right, @fleix, it's fixed now
– dquijada
Dec 18 at 15:14

1

You can do the search using binary search in O(log N).
– rici
Dec 18 at 20:25

1

@rici: how would you binary search for "a missing number"? If you knew a number that was missing, you could find the position it belongs in a sorted array in log(n) time. I don't think you can do better than linear, although probably with a low constant factor. With a uniform distribution, a linear search will hit a gap in O(n / max_n) time on average, but worst case n (no gaps until the end.)
– Peter Cordes
Dec 18 at 23:50

2

@peter: the same way you do a linear search, compare i + min with vec[i]. If they are unequal, there is a missing value before i, so you bisect down; otherwise you bisect up. Binary search works with any monotonic predicate.
– rici
Dec 19 at 2:05

|
show 4 more comments

Thanks for the answer. A minor point: though it being pseudocode, it doesn't seem array[i] would make any sense when i starts from INT_MIN.
– fleix
Dec 18 at 15:10

You are right, @fleix, it's fixed now
– dquijada
Dec 18 at 15:14

1

You can do the search using binary search in O(log N).
– rici
Dec 18 at 20:25

1

@rici: how would you binary search for "a missing number"? If you knew a number that was missing, you could find the position it belongs in a sorted array in log(n) time. I don't think you can do better than linear, although probably with a low constant factor. With a uniform distribution, a linear search will hit a gap in O(n / max_n) time on average, but worst case n (no gaps until the end.)
– Peter Cordes
Dec 18 at 23:50

2

@peter: the same way you do a linear search, compare i + min with vec[i]. If they are unequal, there is a missing value before i, so you bisect down; otherwise you bisect up. Binary search works with any monotonic predicate.
– rici
Dec 19 at 2:05

Thanks for the answer. A minor point: though it being pseudocode, it doesn't seem array[i] would make any sense when i starts from INT_MIN.
– fleix
Dec 18 at 15:10

You are right, @fleix, it's fixed now
– dquijada
Dec 18 at 15:14

You can do the search using binary search in O(log N).
– rici
Dec 18 at 20:25

@rici: how would you binary search for "a missing number"? If you knew a number that was missing, you could find the position it belongs in a sorted array in log(n) time. I don't think you can do better than linear, although probably with a low constant factor. With a uniform distribution, a linear search will hit a gap in O(n / max_n) time on average, but worst case n (no gaps until the end.)
– Peter Cordes
Dec 18 at 23:50

@peter: the same way you do a linear search, compare i + min with vec[i]. If they are unequal, there is a missing value before i, so you bisect down; otherwise you bisect up. Binary search works with any monotonic predicate.
– rici
Dec 19 at 2:05

|
show 4 more comments

The space complexity of this approach would be O(1).

If you start traversing at the lowest possible value for an int (i.e., std::numeric_limits<int>::min()), you will obtain the lowest int not contained in the std::unordered_set<int>:

int find_lowest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::min(); ; ++i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

int find_greatest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::max(); ; --i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

edited Dec 19 at 9:50

answered Dec 19 at 9:18

ネロク

10.5k32140

add a comment |

The space complexity of this approach would be O(1).

If you start traversing at the lowest possible value for an int (i.e., std::numeric_limits<int>::min()), you will obtain the lowest int not contained in the std::unordered_set<int>:

int find_lowest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::min(); ; ++i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

int find_greatest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::max(); ; --i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

edited Dec 19 at 9:50

answered Dec 19 at 9:18

ネロク

10.5k32140

add a comment |

The space complexity of this approach would be O(1).

If you start traversing at the lowest possible value for an int (i.e., std::numeric_limits<int>::min()), you will obtain the lowest int not contained in the std::unordered_set<int>:

int find_lowest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::min(); ; ++i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

int find_greatest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::max(); ; --i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

edited Dec 19 at 9:50

answered Dec 19 at 9:18

ネロク

10.5k32140

The space complexity of this approach would be O(1).

If you start traversing at the lowest possible value for an int (i.e., std::numeric_limits<int>::min()), you will obtain the lowest int not contained in the std::unordered_set<int>:

int find_lowest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::min(); ; ++i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

int find_greatest_not_contained(const std::unordered_set<int>& set) {

   for (auto i = std::numeric_limits<int>::max(); ; --i) {

      auto it = set.find(i); // search in set

      if (it == set.end()) // integer not in set?

         return *it;

   }

}

edited Dec 19 at 9:50

answered Dec 19 at 9:18

ネロク

10.5k32140

edited Dec 19 at 9:50

answered Dec 19 at 9:18

ネロク

10.5k32140

answered Dec 19 at 9:18

ネロク

10.5k32140

answered Dec 19 at 9:18

ネロク

10.5k32140

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr