Why is <= slower than < in V8?
up vote
107
down vote
favorite
I am reading the slides Breaking the Javascript Speed Limit with V8, and there is an example like the code below. I cannot figure out why <=
is slower than <
in this case, can anybody explain that? Any comments are appreciated.
Slow:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
(Hint: primes is an array of length prime_count)
Faster:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i < this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
[More Info] the speed improvement is significant, in my local environment test, the results are as follows:
V8 version 7.3.0 (candidate)
Slow:
time d8 prime.js
287107
12.71 user
0.05 system
0:12.84 elapsed
Faster:
time d8 prime.js
287107
1.82 user
0.01 system
0:01.84 elapsed
javascript v8
|
show 2 more comments
up vote
107
down vote
favorite
I am reading the slides Breaking the Javascript Speed Limit with V8, and there is an example like the code below. I cannot figure out why <=
is slower than <
in this case, can anybody explain that? Any comments are appreciated.
Slow:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
(Hint: primes is an array of length prime_count)
Faster:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i < this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
[More Info] the speed improvement is significant, in my local environment test, the results are as follows:
V8 version 7.3.0 (candidate)
Slow:
time d8 prime.js
287107
12.71 user
0.05 system
0:12.84 elapsed
Faster:
time d8 prime.js
287107
1.82 user
0.01 system
0:01.84 elapsed
javascript v8
8
@DacreDenny The computational difficulty of<=
and<
is identical, both in theory and in actual implementation in all modern processors (and interpreters).
– TypeIA
Dec 6 at 3:11
1
I have read the document, there is amain
code that call that function in a loop that runs25000
times, so you are doing a lot of less iterations overall doing that change. Also, if an array have a lenght of 5, trying to obtainarray[5]
will go outside his limit giving anundefined
value since arrays start indexing on0
.
– Shidersz
Dec 6 at 3:32
1
It would be helpful if this question explained how much of a speed improvement is gained (e.g., 5 times faster) so people don't get thrown off by the extra iteration. I tried to find how fast in the slides but there were a lot and I had trouble finding it, otherwise I would edit it myself.
– Captain Man
Dec 6 at 16:19
@CaptainMan You're right, the exact speed improvement is hard to glean from the slides because they cover several different issues all at once. But in my conversation with the speaker after this talk, he confirmed that it is not just a tiny fraction of a percent as you might expect from the one extra iteration in this test case, but a big difference: several times faster, perhaps an order of magnitude or more. And the reason for it is that V8 falls back (or fell back in those days) to the de-optimized array format when you try to read outside the array bounds.
– Michael Geary
Dec 6 at 23:24
2
It may be useful to compare a version that uses<=
but otherwise acts identically to the<
version by doingi <= this.prime_count - 1
. This solves both the "extra iteration" issue and the "one past the end of the array" issue.
– TheHansinator
Dec 6 at 23:41
|
show 2 more comments
up vote
107
down vote
favorite
up vote
107
down vote
favorite
I am reading the slides Breaking the Javascript Speed Limit with V8, and there is an example like the code below. I cannot figure out why <=
is slower than <
in this case, can anybody explain that? Any comments are appreciated.
Slow:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
(Hint: primes is an array of length prime_count)
Faster:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i < this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
[More Info] the speed improvement is significant, in my local environment test, the results are as follows:
V8 version 7.3.0 (candidate)
Slow:
time d8 prime.js
287107
12.71 user
0.05 system
0:12.84 elapsed
Faster:
time d8 prime.js
287107
1.82 user
0.01 system
0:01.84 elapsed
javascript v8
I am reading the slides Breaking the Javascript Speed Limit with V8, and there is an example like the code below. I cannot figure out why <=
is slower than <
in this case, can anybody explain that? Any comments are appreciated.
Slow:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
(Hint: primes is an array of length prime_count)
Faster:
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i < this.prime_count; ++i) {
if (candidate % this.primes[i] == 0) return true;
}
return false;
}
[More Info] the speed improvement is significant, in my local environment test, the results are as follows:
V8 version 7.3.0 (candidate)
Slow:
time d8 prime.js
287107
12.71 user
0.05 system
0:12.84 elapsed
Faster:
time d8 prime.js
287107
1.82 user
0.01 system
0:01.84 elapsed
javascript v8
javascript v8
edited 2 days ago
Mathias Bynens
102k38173221
102k38173221
asked Dec 6 at 2:59
Leonardo Physh
495238
495238
8
@DacreDenny The computational difficulty of<=
and<
is identical, both in theory and in actual implementation in all modern processors (and interpreters).
– TypeIA
Dec 6 at 3:11
1
I have read the document, there is amain
code that call that function in a loop that runs25000
times, so you are doing a lot of less iterations overall doing that change. Also, if an array have a lenght of 5, trying to obtainarray[5]
will go outside his limit giving anundefined
value since arrays start indexing on0
.
– Shidersz
Dec 6 at 3:32
1
It would be helpful if this question explained how much of a speed improvement is gained (e.g., 5 times faster) so people don't get thrown off by the extra iteration. I tried to find how fast in the slides but there were a lot and I had trouble finding it, otherwise I would edit it myself.
– Captain Man
Dec 6 at 16:19
@CaptainMan You're right, the exact speed improvement is hard to glean from the slides because they cover several different issues all at once. But in my conversation with the speaker after this talk, he confirmed that it is not just a tiny fraction of a percent as you might expect from the one extra iteration in this test case, but a big difference: several times faster, perhaps an order of magnitude or more. And the reason for it is that V8 falls back (or fell back in those days) to the de-optimized array format when you try to read outside the array bounds.
– Michael Geary
Dec 6 at 23:24
2
It may be useful to compare a version that uses<=
but otherwise acts identically to the<
version by doingi <= this.prime_count - 1
. This solves both the "extra iteration" issue and the "one past the end of the array" issue.
– TheHansinator
Dec 6 at 23:41
|
show 2 more comments
8
@DacreDenny The computational difficulty of<=
and<
is identical, both in theory and in actual implementation in all modern processors (and interpreters).
– TypeIA
Dec 6 at 3:11
1
I have read the document, there is amain
code that call that function in a loop that runs25000
times, so you are doing a lot of less iterations overall doing that change. Also, if an array have a lenght of 5, trying to obtainarray[5]
will go outside his limit giving anundefined
value since arrays start indexing on0
.
– Shidersz
Dec 6 at 3:32
1
It would be helpful if this question explained how much of a speed improvement is gained (e.g., 5 times faster) so people don't get thrown off by the extra iteration. I tried to find how fast in the slides but there were a lot and I had trouble finding it, otherwise I would edit it myself.
– Captain Man
Dec 6 at 16:19
@CaptainMan You're right, the exact speed improvement is hard to glean from the slides because they cover several different issues all at once. But in my conversation with the speaker after this talk, he confirmed that it is not just a tiny fraction of a percent as you might expect from the one extra iteration in this test case, but a big difference: several times faster, perhaps an order of magnitude or more. And the reason for it is that V8 falls back (or fell back in those days) to the de-optimized array format when you try to read outside the array bounds.
– Michael Geary
Dec 6 at 23:24
2
It may be useful to compare a version that uses<=
but otherwise acts identically to the<
version by doingi <= this.prime_count - 1
. This solves both the "extra iteration" issue and the "one past the end of the array" issue.
– TheHansinator
Dec 6 at 23:41
8
8
@DacreDenny The computational difficulty of
<=
and <
is identical, both in theory and in actual implementation in all modern processors (and interpreters).– TypeIA
Dec 6 at 3:11
@DacreDenny The computational difficulty of
<=
and <
is identical, both in theory and in actual implementation in all modern processors (and interpreters).– TypeIA
Dec 6 at 3:11
1
1
I have read the document, there is a
main
code that call that function in a loop that runs 25000
times, so you are doing a lot of less iterations overall doing that change. Also, if an array have a lenght of 5, trying to obtain array[5]
will go outside his limit giving an undefined
value since arrays start indexing on 0
.– Shidersz
Dec 6 at 3:32
I have read the document, there is a
main
code that call that function in a loop that runs 25000
times, so you are doing a lot of less iterations overall doing that change. Also, if an array have a lenght of 5, trying to obtain array[5]
will go outside his limit giving an undefined
value since arrays start indexing on 0
.– Shidersz
Dec 6 at 3:32
1
1
It would be helpful if this question explained how much of a speed improvement is gained (e.g., 5 times faster) so people don't get thrown off by the extra iteration. I tried to find how fast in the slides but there were a lot and I had trouble finding it, otherwise I would edit it myself.
– Captain Man
Dec 6 at 16:19
It would be helpful if this question explained how much of a speed improvement is gained (e.g., 5 times faster) so people don't get thrown off by the extra iteration. I tried to find how fast in the slides but there were a lot and I had trouble finding it, otherwise I would edit it myself.
– Captain Man
Dec 6 at 16:19
@CaptainMan You're right, the exact speed improvement is hard to glean from the slides because they cover several different issues all at once. But in my conversation with the speaker after this talk, he confirmed that it is not just a tiny fraction of a percent as you might expect from the one extra iteration in this test case, but a big difference: several times faster, perhaps an order of magnitude or more. And the reason for it is that V8 falls back (or fell back in those days) to the de-optimized array format when you try to read outside the array bounds.
– Michael Geary
Dec 6 at 23:24
@CaptainMan You're right, the exact speed improvement is hard to glean from the slides because they cover several different issues all at once. But in my conversation with the speaker after this talk, he confirmed that it is not just a tiny fraction of a percent as you might expect from the one extra iteration in this test case, but a big difference: several times faster, perhaps an order of magnitude or more. And the reason for it is that V8 falls back (or fell back in those days) to the de-optimized array format when you try to read outside the array bounds.
– Michael Geary
Dec 6 at 23:24
2
2
It may be useful to compare a version that uses
<=
but otherwise acts identically to the <
version by doing i <= this.prime_count - 1
. This solves both the "extra iteration" issue and the "one past the end of the array" issue.– TheHansinator
Dec 6 at 23:41
It may be useful to compare a version that uses
<=
but otherwise acts identically to the <
version by doing i <= this.prime_count - 1
. This solves both the "extra iteration" issue and the "one past the end of the array" issue.– TheHansinator
Dec 6 at 23:41
|
show 2 more comments
4 Answers
4
active
oldest
votes
up vote
203
down vote
accepted
Other answers and comments mention that the difference between the two loops is that the first one executes one more iteration than the second one. This is true, but in an array that grows to 25,000 elements, one iteration more or less would only make a miniscule difference. As a ballpark guess, if we assume the average length as it grows is 12,500, then the difference we might expect should be around 1/12,500, or only 0.008%.
The performance difference here is much larger than would be explained by that one extra iteration, and the problem is explained near the end of the presentation.
this.primes
is a contiguous array (every element holds a value) and the elements are all numbers.
A JavaScript engine may optimize such an array to be an simple array of actual numbers, instead of an array of objects which happen to contain numbers but could contain other values or no value. The first format is much faster to access: it takes less code, and the array is much smaller so it will fit better in cache. But there are some conditions that may prevent this optimized format from being used.
One condition would be if some of the array elements are missing. For example:
let array = ;
a[0] = 10;
a[2] = 20;
Now what is the value of a[1]
? It has no value. (It isn't even correct to say it has the value undefined
- an array element containing the undefined
value is different from an array element that is missing entirely.)
There isn't a way to represent this with numbers only, so the JavaScript engine is forced to use the less optimized format. If a[1]
contained a numeric value like the other two elements, the array could potentially be optimized into an array of numbers only.
Another reason for an array to be forced into the deoptimized format can be if you attempt to access an element outside the bounds of the array, as discussed in the presentation.
The first loop with <=
attempts to read an element past the end of the array. The algorithm still works correctly, because in the last extra iteration:
this.primes[i]
evaluates toundefined
becausei
is past the array end.
candidate % undefined
(for any value ofcandidate
) evaluates toNaN
.
NaN == 0
evaluates tofalse
.- Therefore, the
return true
is not executed.
So it's as if the extra iteration never happened - it has no effect on the rest of the logic. The code produces the same result as it would without the extra iteration.
But to get there, it tried to read a nonexistent element past the end of the array. This forces the array out of optimization - or at least did at the time of this talk.
The second loop with <
reads only elements that exist within the array, so it allows an optimized array and code.
The problem is described in pages 90-91 of the talk, with related discussion in the pages before and after that.
I happened to attend this very Google I/O presentation and talked with the speaker (one of the V8 authors) afterward. I had been using a technique in my own code that involved reading past the end of an array as a misguided (in hindsight) attempt to optimize one particular situation. He confirmed that if you tried to even read past the end of an array, it would prevent the simple optimized format from being used.
If what the V8 author said is still true, then reading past the end of the array would prevent it from being optimized and it would have to fall back to the slower format.
Now it's possible that V8 has been improved in the meantime to efficiently handle this case, or that other JavaScript engines handle it differently. I don't know one way or the other on that, but this deoptimization is what the presentation was talking about.
1
I'm pretty sure that the array is still contiguous - there's no reason to change the memory layout. What matters though is that the index-out-of-bounds check in the property access cannot be optimised away, and the code sometimes is fedundefined
instead of a number leading to a different calculation.
– Bergi
Dec 6 at 15:04
1
@Bergi I'm no JS/V8 expert, but objects in GC languages are almost always references to the actual objects. Those actual objects have independent allocation, even if the references are contiguous, because GC object lifetime isn't tied together. Optimizers can pack those independent allocations to being adjacent, but (a) memory use skyrockets and (b) you have two contiguous blocks being iterated over (the references and the data referred to) instead of one. I suppose an insane optimizer could interleave the references and the data referred to and have an array that owns memory stripes...
– Yakk - Adam Nevraumont
Dec 6 at 16:21
1
@Bergi The array may still be contiguous in the non-optimized case, but the array elements are not the same type as in the optimized case. The optimized version is a simple array of numbers with no additional fluff. The non-optimized version is an array of objects (an internal object format, not JavaScriptObject
), because it has to support any mix of data types in the array. As I mentioned above, the code in the loop being fedundefined
doesn't affect the correctness of the algorithm - it doesn't change the calculation at all (it's as if the extra iteration never happened).
– Michael Geary
Dec 6 at 22:01
3
@Bergi The V8 author who gave this talk said that the attempted read outside the array bounds causes the array to be treated as if it had a mix of types: instead of the optimized number-only format, it de-optimizes the array back to the generic format. In the optimized case it is a simple array of numbers as you might use in a C program. In the de-optimized case it is an array ofValue
objects which can hold references to values of any type. (I made up the nameValue
, but the point is that the array elements are not just simple numbers but are objects that wrap numbers or other types.)
– Michael Geary
Dec 6 at 23:13
2
I work on V8. The array in question would be marked asHOLEY
because it's created usingnew Array(n)
(although this part of the code was not visible in the OP).HOLEY
arrays remainHOLEY
forever in V8, even when they're later filled. That said, the array being holey is not the reason for the perf issue in this case; it just means we have to do an extra Smi check on each iteration (to guard against holes), which is no big deal.
– Mathias Bynens
2 days ago
|
show 7 more comments
up vote
47
down vote
I work on V8 at Google, and wanted to provide some additional insight on top of the existing answers and comments.
For reference, here's the full code example from the slides:
var iterations = 25000;
function Primes() {
this.prime_count = 0;
this.primes = new Array(iterations);
this.getPrimeCount = function() { return this.prime_count; }
this.getPrime = function(i) { return this.primes[i]; }
this.addPrime = function(i) {
this.primes[this.prime_count++] = i;
}
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if ((candidate % this.primes[i]) == 0) return true;
}
return false;
}
};
function main() {
var p = new Primes();
var c = 1;
while (p.getPrimeCount() < iterations) {
if (!p.isPrimeDivisible(c)) {
p.addPrime(c);
}
c++;
}
console.log(p.getPrime(p.getPrimeCount() - 1));
}
main();
First and foremost, the performance difference has nothing to do with the <
and <=
operators directly. So please don't jump through hoops just to avoid <=
in your code because you read on Stack Overflow that it's slow --- it isn't!
Second, folks pointed out that the array is "holey". This was not clear from the code snippet in OP's post, but it is clear when you look at the code that initializes this.primes
:
this.primes = new Array(iterations);
This results in an array with a HOLEY
elements kind in V8, even if the array ends up completely filled/packed/contiguous. In general, operations on holey arrays are slower than operations on packed arrays, but in this case the difference is negligible: it amounts to 1 additional Smi (small integer) check (to guard against holes) each time we hit this.primes[i]
in the loop within isPrimeDivisible
. No big deal!
TL;DR The array being HOLEY
is not the problem here.
Others pointed out that the code reads out of bounds. It's generally recommended to avoid reading beyond the length of arrays, and in this case it would indeed have avoided the massive drop in performance. But why though? V8 can handle some of these out-of-bound scenarios with only a minor performance impact. What's so special about this particular case, then?
The out-of-bounds read results in this.primes[i]
being undefined
on this line:
if ((candidate % this.primes[i]) == 0) return true;
And that brings us to the real issue: the %
operator is now being used with non-integer operands!
integer % someOtherInteger
can be computed very efficiently; JavaScript engines can produce highly-optimized machine code for this case.integer % undefined
on the other hand amounts to a way less efficientFloat64Mod
, sinceundefined
is represented as a double.
The code snippet can indeed be improved by changing the <=
into <
on this line:
for (var i = 1; i <= this.prime_count; ++i) {
...not because <=
is somehow a superior operator than <
, but just because this avoids the out-of-bounds read in this particular case.
3
To be 100% complete, the keyed load IC forthis.primes[i]
inisPrimeDivisible
unexpectedly goes megamorphic in V8. That seems like a bug: bugs.chromium.org/p/v8/issues/detail?id=8561
– Mathias Bynens
2 days ago
1
Thanks for your great explanation. Just one question, Can you please explain little more onFloat64Mod since undefined is represented as a double
? So you mean undefined holds a real value in v8 code?
– Hardik Modha
2 days ago
3
Indeed; any JavaScript value needs to be represented somehow behind the scenes. Things likeNaN
,Infinity
, andundefined
are represented as doubles.0
would be a Smi (small integer), but-0
would be a double. Holes in arrays are represented using a special value calledthe_hole
. Etc.
– Mathias Bynens
2 days ago
+1 Thanks Mathias!!! I now have so many questions. (1) Regarding "undefined are represented as doubles" I assume you use tagged/flagged IEEE745 NaN-values as I suggested in my [comment](#comment94188748_53644250) (and another NaN-class for 'the_hole' when in floaty-mode)? (2) What constitutes as a small integer (I understand why-0
is double, and I understand 'integer') so what is the largest value that constitutes as optimisable SmallInteger (I've had this question for a long time ever since an older V8 talk where the lady at the end couldn't answer this question after talking about it).
– GitaarLAB
2 days ago
(3) Did calculating ONEFloat64Mod
(the last one at the end) take more than 6 times the time it took to crunch through all previous values, OR did passing the 'holey array' prevent theisPrimeDivisible
function-code to be optimized to begin with (so compiled usingFloat64Mod
from the start) OR did we get RE-compilation at the end (as soon as it encountered theval%undefined
calculation) and if so, did it then recalculate all work using the newly compiled function now withFloat64Mod
?
– GitaarLAB
2 days ago
|
show 4 more comments
up vote
10
down vote
TL;DR The slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
Someone just has to say this (utterly amazed nobody already did):
There used to be a time when the OP's snippet would be a de-facto example in a beginners programming book intended to outline/emphasize that 'arrays' in javascript are indexed starting at 0, not 1, and as such be used as an example of a common 'beginners mistake' (don't you love how I avoided the phrase 'programing error'
;)
): out-of-bounds Array access. Example 1:
a Dense Array
(being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between <
vs <=
(or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):
- Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
- Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language.
And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results.
A spec conforming engine should give spec conforming results for any defined input.
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
Imagine the following simple function:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number
, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr
...
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations...
We don't even know what kind of TYPE the function RETURN until we are done...
Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
Thus, if you were to compile function sum
JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
All of this takes time!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
Access to nonexistent object properties and out of bounds array elements returns the
undefined
value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.
...
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
- unboxed representations of integers and floating-point numbers;
- absence of runtime type checks;
- absence of garbage collection; and
- efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
http://asmjs.org/spec/latest/
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
Theoretical example based on above information, regarding the OP's snippet:
- Call to isPrimeDivisible
- Compile isPrimeDivisible using general assumptions (like no out of bounds access)
- Do work
- BAM, suddenly array accesses out of bounds (right at the end).
- Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
- Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
- Return result
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
It's pretty tempting to assume that this.primes
is a 'dense array' pure numerical which was either
- Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
- most likely generated using a numerical function filling a pre-sized (
new Array(/*size value*/)
) in ascending sequential order (another long-time known candidate to become a 'real' array).
We also know that the primes
array's length is cached as prime_count
! (indicating it's intent and fixed size).
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
It is therefore reasonable to assume that Array primes
is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array
.
As I have tried to make clear with my sum
function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String
to the sum
function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum
should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
As it seems slightly bonkus to convert the Typed_Array-like primes
Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!
Under these assumptions that leaves 2 options:
- Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
- Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
I now really wonder which of these 2 it is!
2
A good discussion on some of the underlying issues - however you barely explain the answer at all (in the very last sentence). Maybe add a tl;dr to to the very top? e.g. "The slower loop is due to exceeding the bounds array, which forces the engine to re-evaluate the loop without optimizations. Read on to learn why."
– brichins
Dec 7 at 4:19
@brichins: thanks, and thanks for the suggestion, which I have reworded a bit in light of my second additional edit, because the more I now think of it, that statement on the top seems actually correct as well
– GitaarLAB
Dec 7 at 5:39
add a comment |
up vote
3
down vote
To add some scientificness to it, here's a jsperf
https://jsperf.com/ints-values-in-out-of-array-bounds
It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:
- 1. Looping out of bounds
- 2. Holey arrays
- 3. Modular arithmetic against NaNs
- 4. Completely undefined values
- 5. Using a
new Array()
It shows that the first 4 cases are really bad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array()
case is almost as good as the raw array, just a few percent slower.
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
203
down vote
accepted
Other answers and comments mention that the difference between the two loops is that the first one executes one more iteration than the second one. This is true, but in an array that grows to 25,000 elements, one iteration more or less would only make a miniscule difference. As a ballpark guess, if we assume the average length as it grows is 12,500, then the difference we might expect should be around 1/12,500, or only 0.008%.
The performance difference here is much larger than would be explained by that one extra iteration, and the problem is explained near the end of the presentation.
this.primes
is a contiguous array (every element holds a value) and the elements are all numbers.
A JavaScript engine may optimize such an array to be an simple array of actual numbers, instead of an array of objects which happen to contain numbers but could contain other values or no value. The first format is much faster to access: it takes less code, and the array is much smaller so it will fit better in cache. But there are some conditions that may prevent this optimized format from being used.
One condition would be if some of the array elements are missing. For example:
let array = ;
a[0] = 10;
a[2] = 20;
Now what is the value of a[1]
? It has no value. (It isn't even correct to say it has the value undefined
- an array element containing the undefined
value is different from an array element that is missing entirely.)
There isn't a way to represent this with numbers only, so the JavaScript engine is forced to use the less optimized format. If a[1]
contained a numeric value like the other two elements, the array could potentially be optimized into an array of numbers only.
Another reason for an array to be forced into the deoptimized format can be if you attempt to access an element outside the bounds of the array, as discussed in the presentation.
The first loop with <=
attempts to read an element past the end of the array. The algorithm still works correctly, because in the last extra iteration:
this.primes[i]
evaluates toundefined
becausei
is past the array end.
candidate % undefined
(for any value ofcandidate
) evaluates toNaN
.
NaN == 0
evaluates tofalse
.- Therefore, the
return true
is not executed.
So it's as if the extra iteration never happened - it has no effect on the rest of the logic. The code produces the same result as it would without the extra iteration.
But to get there, it tried to read a nonexistent element past the end of the array. This forces the array out of optimization - or at least did at the time of this talk.
The second loop with <
reads only elements that exist within the array, so it allows an optimized array and code.
The problem is described in pages 90-91 of the talk, with related discussion in the pages before and after that.
I happened to attend this very Google I/O presentation and talked with the speaker (one of the V8 authors) afterward. I had been using a technique in my own code that involved reading past the end of an array as a misguided (in hindsight) attempt to optimize one particular situation. He confirmed that if you tried to even read past the end of an array, it would prevent the simple optimized format from being used.
If what the V8 author said is still true, then reading past the end of the array would prevent it from being optimized and it would have to fall back to the slower format.
Now it's possible that V8 has been improved in the meantime to efficiently handle this case, or that other JavaScript engines handle it differently. I don't know one way or the other on that, but this deoptimization is what the presentation was talking about.
1
I'm pretty sure that the array is still contiguous - there's no reason to change the memory layout. What matters though is that the index-out-of-bounds check in the property access cannot be optimised away, and the code sometimes is fedundefined
instead of a number leading to a different calculation.
– Bergi
Dec 6 at 15:04
1
@Bergi I'm no JS/V8 expert, but objects in GC languages are almost always references to the actual objects. Those actual objects have independent allocation, even if the references are contiguous, because GC object lifetime isn't tied together. Optimizers can pack those independent allocations to being adjacent, but (a) memory use skyrockets and (b) you have two contiguous blocks being iterated over (the references and the data referred to) instead of one. I suppose an insane optimizer could interleave the references and the data referred to and have an array that owns memory stripes...
– Yakk - Adam Nevraumont
Dec 6 at 16:21
1
@Bergi The array may still be contiguous in the non-optimized case, but the array elements are not the same type as in the optimized case. The optimized version is a simple array of numbers with no additional fluff. The non-optimized version is an array of objects (an internal object format, not JavaScriptObject
), because it has to support any mix of data types in the array. As I mentioned above, the code in the loop being fedundefined
doesn't affect the correctness of the algorithm - it doesn't change the calculation at all (it's as if the extra iteration never happened).
– Michael Geary
Dec 6 at 22:01
3
@Bergi The V8 author who gave this talk said that the attempted read outside the array bounds causes the array to be treated as if it had a mix of types: instead of the optimized number-only format, it de-optimizes the array back to the generic format. In the optimized case it is a simple array of numbers as you might use in a C program. In the de-optimized case it is an array ofValue
objects which can hold references to values of any type. (I made up the nameValue
, but the point is that the array elements are not just simple numbers but are objects that wrap numbers or other types.)
– Michael Geary
Dec 6 at 23:13
2
I work on V8. The array in question would be marked asHOLEY
because it's created usingnew Array(n)
(although this part of the code was not visible in the OP).HOLEY
arrays remainHOLEY
forever in V8, even when they're later filled. That said, the array being holey is not the reason for the perf issue in this case; it just means we have to do an extra Smi check on each iteration (to guard against holes), which is no big deal.
– Mathias Bynens
2 days ago
|
show 7 more comments
up vote
203
down vote
accepted
Other answers and comments mention that the difference between the two loops is that the first one executes one more iteration than the second one. This is true, but in an array that grows to 25,000 elements, one iteration more or less would only make a miniscule difference. As a ballpark guess, if we assume the average length as it grows is 12,500, then the difference we might expect should be around 1/12,500, or only 0.008%.
The performance difference here is much larger than would be explained by that one extra iteration, and the problem is explained near the end of the presentation.
this.primes
is a contiguous array (every element holds a value) and the elements are all numbers.
A JavaScript engine may optimize such an array to be an simple array of actual numbers, instead of an array of objects which happen to contain numbers but could contain other values or no value. The first format is much faster to access: it takes less code, and the array is much smaller so it will fit better in cache. But there are some conditions that may prevent this optimized format from being used.
One condition would be if some of the array elements are missing. For example:
let array = ;
a[0] = 10;
a[2] = 20;
Now what is the value of a[1]
? It has no value. (It isn't even correct to say it has the value undefined
- an array element containing the undefined
value is different from an array element that is missing entirely.)
There isn't a way to represent this with numbers only, so the JavaScript engine is forced to use the less optimized format. If a[1]
contained a numeric value like the other two elements, the array could potentially be optimized into an array of numbers only.
Another reason for an array to be forced into the deoptimized format can be if you attempt to access an element outside the bounds of the array, as discussed in the presentation.
The first loop with <=
attempts to read an element past the end of the array. The algorithm still works correctly, because in the last extra iteration:
this.primes[i]
evaluates toundefined
becausei
is past the array end.
candidate % undefined
(for any value ofcandidate
) evaluates toNaN
.
NaN == 0
evaluates tofalse
.- Therefore, the
return true
is not executed.
So it's as if the extra iteration never happened - it has no effect on the rest of the logic. The code produces the same result as it would without the extra iteration.
But to get there, it tried to read a nonexistent element past the end of the array. This forces the array out of optimization - or at least did at the time of this talk.
The second loop with <
reads only elements that exist within the array, so it allows an optimized array and code.
The problem is described in pages 90-91 of the talk, with related discussion in the pages before and after that.
I happened to attend this very Google I/O presentation and talked with the speaker (one of the V8 authors) afterward. I had been using a technique in my own code that involved reading past the end of an array as a misguided (in hindsight) attempt to optimize one particular situation. He confirmed that if you tried to even read past the end of an array, it would prevent the simple optimized format from being used.
If what the V8 author said is still true, then reading past the end of the array would prevent it from being optimized and it would have to fall back to the slower format.
Now it's possible that V8 has been improved in the meantime to efficiently handle this case, or that other JavaScript engines handle it differently. I don't know one way or the other on that, but this deoptimization is what the presentation was talking about.
1
I'm pretty sure that the array is still contiguous - there's no reason to change the memory layout. What matters though is that the index-out-of-bounds check in the property access cannot be optimised away, and the code sometimes is fedundefined
instead of a number leading to a different calculation.
– Bergi
Dec 6 at 15:04
1
@Bergi I'm no JS/V8 expert, but objects in GC languages are almost always references to the actual objects. Those actual objects have independent allocation, even if the references are contiguous, because GC object lifetime isn't tied together. Optimizers can pack those independent allocations to being adjacent, but (a) memory use skyrockets and (b) you have two contiguous blocks being iterated over (the references and the data referred to) instead of one. I suppose an insane optimizer could interleave the references and the data referred to and have an array that owns memory stripes...
– Yakk - Adam Nevraumont
Dec 6 at 16:21
1
@Bergi The array may still be contiguous in the non-optimized case, but the array elements are not the same type as in the optimized case. The optimized version is a simple array of numbers with no additional fluff. The non-optimized version is an array of objects (an internal object format, not JavaScriptObject
), because it has to support any mix of data types in the array. As I mentioned above, the code in the loop being fedundefined
doesn't affect the correctness of the algorithm - it doesn't change the calculation at all (it's as if the extra iteration never happened).
– Michael Geary
Dec 6 at 22:01
3
@Bergi The V8 author who gave this talk said that the attempted read outside the array bounds causes the array to be treated as if it had a mix of types: instead of the optimized number-only format, it de-optimizes the array back to the generic format. In the optimized case it is a simple array of numbers as you might use in a C program. In the de-optimized case it is an array ofValue
objects which can hold references to values of any type. (I made up the nameValue
, but the point is that the array elements are not just simple numbers but are objects that wrap numbers or other types.)
– Michael Geary
Dec 6 at 23:13
2
I work on V8. The array in question would be marked asHOLEY
because it's created usingnew Array(n)
(although this part of the code was not visible in the OP).HOLEY
arrays remainHOLEY
forever in V8, even when they're later filled. That said, the array being holey is not the reason for the perf issue in this case; it just means we have to do an extra Smi check on each iteration (to guard against holes), which is no big deal.
– Mathias Bynens
2 days ago
|
show 7 more comments
up vote
203
down vote
accepted
up vote
203
down vote
accepted
Other answers and comments mention that the difference between the two loops is that the first one executes one more iteration than the second one. This is true, but in an array that grows to 25,000 elements, one iteration more or less would only make a miniscule difference. As a ballpark guess, if we assume the average length as it grows is 12,500, then the difference we might expect should be around 1/12,500, or only 0.008%.
The performance difference here is much larger than would be explained by that one extra iteration, and the problem is explained near the end of the presentation.
this.primes
is a contiguous array (every element holds a value) and the elements are all numbers.
A JavaScript engine may optimize such an array to be an simple array of actual numbers, instead of an array of objects which happen to contain numbers but could contain other values or no value. The first format is much faster to access: it takes less code, and the array is much smaller so it will fit better in cache. But there are some conditions that may prevent this optimized format from being used.
One condition would be if some of the array elements are missing. For example:
let array = ;
a[0] = 10;
a[2] = 20;
Now what is the value of a[1]
? It has no value. (It isn't even correct to say it has the value undefined
- an array element containing the undefined
value is different from an array element that is missing entirely.)
There isn't a way to represent this with numbers only, so the JavaScript engine is forced to use the less optimized format. If a[1]
contained a numeric value like the other two elements, the array could potentially be optimized into an array of numbers only.
Another reason for an array to be forced into the deoptimized format can be if you attempt to access an element outside the bounds of the array, as discussed in the presentation.
The first loop with <=
attempts to read an element past the end of the array. The algorithm still works correctly, because in the last extra iteration:
this.primes[i]
evaluates toundefined
becausei
is past the array end.
candidate % undefined
(for any value ofcandidate
) evaluates toNaN
.
NaN == 0
evaluates tofalse
.- Therefore, the
return true
is not executed.
So it's as if the extra iteration never happened - it has no effect on the rest of the logic. The code produces the same result as it would without the extra iteration.
But to get there, it tried to read a nonexistent element past the end of the array. This forces the array out of optimization - or at least did at the time of this talk.
The second loop with <
reads only elements that exist within the array, so it allows an optimized array and code.
The problem is described in pages 90-91 of the talk, with related discussion in the pages before and after that.
I happened to attend this very Google I/O presentation and talked with the speaker (one of the V8 authors) afterward. I had been using a technique in my own code that involved reading past the end of an array as a misguided (in hindsight) attempt to optimize one particular situation. He confirmed that if you tried to even read past the end of an array, it would prevent the simple optimized format from being used.
If what the V8 author said is still true, then reading past the end of the array would prevent it from being optimized and it would have to fall back to the slower format.
Now it's possible that V8 has been improved in the meantime to efficiently handle this case, or that other JavaScript engines handle it differently. I don't know one way or the other on that, but this deoptimization is what the presentation was talking about.
Other answers and comments mention that the difference between the two loops is that the first one executes one more iteration than the second one. This is true, but in an array that grows to 25,000 elements, one iteration more or less would only make a miniscule difference. As a ballpark guess, if we assume the average length as it grows is 12,500, then the difference we might expect should be around 1/12,500, or only 0.008%.
The performance difference here is much larger than would be explained by that one extra iteration, and the problem is explained near the end of the presentation.
this.primes
is a contiguous array (every element holds a value) and the elements are all numbers.
A JavaScript engine may optimize such an array to be an simple array of actual numbers, instead of an array of objects which happen to contain numbers but could contain other values or no value. The first format is much faster to access: it takes less code, and the array is much smaller so it will fit better in cache. But there are some conditions that may prevent this optimized format from being used.
One condition would be if some of the array elements are missing. For example:
let array = ;
a[0] = 10;
a[2] = 20;
Now what is the value of a[1]
? It has no value. (It isn't even correct to say it has the value undefined
- an array element containing the undefined
value is different from an array element that is missing entirely.)
There isn't a way to represent this with numbers only, so the JavaScript engine is forced to use the less optimized format. If a[1]
contained a numeric value like the other two elements, the array could potentially be optimized into an array of numbers only.
Another reason for an array to be forced into the deoptimized format can be if you attempt to access an element outside the bounds of the array, as discussed in the presentation.
The first loop with <=
attempts to read an element past the end of the array. The algorithm still works correctly, because in the last extra iteration:
this.primes[i]
evaluates toundefined
becausei
is past the array end.
candidate % undefined
(for any value ofcandidate
) evaluates toNaN
.
NaN == 0
evaluates tofalse
.- Therefore, the
return true
is not executed.
So it's as if the extra iteration never happened - it has no effect on the rest of the logic. The code produces the same result as it would without the extra iteration.
But to get there, it tried to read a nonexistent element past the end of the array. This forces the array out of optimization - or at least did at the time of this talk.
The second loop with <
reads only elements that exist within the array, so it allows an optimized array and code.
The problem is described in pages 90-91 of the talk, with related discussion in the pages before and after that.
I happened to attend this very Google I/O presentation and talked with the speaker (one of the V8 authors) afterward. I had been using a technique in my own code that involved reading past the end of an array as a misguided (in hindsight) attempt to optimize one particular situation. He confirmed that if you tried to even read past the end of an array, it would prevent the simple optimized format from being used.
If what the V8 author said is still true, then reading past the end of the array would prevent it from being optimized and it would have to fall back to the slower format.
Now it's possible that V8 has been improved in the meantime to efficiently handle this case, or that other JavaScript engines handle it differently. I don't know one way or the other on that, but this deoptimization is what the presentation was talking about.
edited Dec 6 at 22:16
answered Dec 6 at 3:34
Michael Geary
22.6k64358
22.6k64358
1
I'm pretty sure that the array is still contiguous - there's no reason to change the memory layout. What matters though is that the index-out-of-bounds check in the property access cannot be optimised away, and the code sometimes is fedundefined
instead of a number leading to a different calculation.
– Bergi
Dec 6 at 15:04
1
@Bergi I'm no JS/V8 expert, but objects in GC languages are almost always references to the actual objects. Those actual objects have independent allocation, even if the references are contiguous, because GC object lifetime isn't tied together. Optimizers can pack those independent allocations to being adjacent, but (a) memory use skyrockets and (b) you have two contiguous blocks being iterated over (the references and the data referred to) instead of one. I suppose an insane optimizer could interleave the references and the data referred to and have an array that owns memory stripes...
– Yakk - Adam Nevraumont
Dec 6 at 16:21
1
@Bergi The array may still be contiguous in the non-optimized case, but the array elements are not the same type as in the optimized case. The optimized version is a simple array of numbers with no additional fluff. The non-optimized version is an array of objects (an internal object format, not JavaScriptObject
), because it has to support any mix of data types in the array. As I mentioned above, the code in the loop being fedundefined
doesn't affect the correctness of the algorithm - it doesn't change the calculation at all (it's as if the extra iteration never happened).
– Michael Geary
Dec 6 at 22:01
3
@Bergi The V8 author who gave this talk said that the attempted read outside the array bounds causes the array to be treated as if it had a mix of types: instead of the optimized number-only format, it de-optimizes the array back to the generic format. In the optimized case it is a simple array of numbers as you might use in a C program. In the de-optimized case it is an array ofValue
objects which can hold references to values of any type. (I made up the nameValue
, but the point is that the array elements are not just simple numbers but are objects that wrap numbers or other types.)
– Michael Geary
Dec 6 at 23:13
2
I work on V8. The array in question would be marked asHOLEY
because it's created usingnew Array(n)
(although this part of the code was not visible in the OP).HOLEY
arrays remainHOLEY
forever in V8, even when they're later filled. That said, the array being holey is not the reason for the perf issue in this case; it just means we have to do an extra Smi check on each iteration (to guard against holes), which is no big deal.
– Mathias Bynens
2 days ago
|
show 7 more comments
1
I'm pretty sure that the array is still contiguous - there's no reason to change the memory layout. What matters though is that the index-out-of-bounds check in the property access cannot be optimised away, and the code sometimes is fedundefined
instead of a number leading to a different calculation.
– Bergi
Dec 6 at 15:04
1
@Bergi I'm no JS/V8 expert, but objects in GC languages are almost always references to the actual objects. Those actual objects have independent allocation, even if the references are contiguous, because GC object lifetime isn't tied together. Optimizers can pack those independent allocations to being adjacent, but (a) memory use skyrockets and (b) you have two contiguous blocks being iterated over (the references and the data referred to) instead of one. I suppose an insane optimizer could interleave the references and the data referred to and have an array that owns memory stripes...
– Yakk - Adam Nevraumont
Dec 6 at 16:21
1
@Bergi The array may still be contiguous in the non-optimized case, but the array elements are not the same type as in the optimized case. The optimized version is a simple array of numbers with no additional fluff. The non-optimized version is an array of objects (an internal object format, not JavaScriptObject
), because it has to support any mix of data types in the array. As I mentioned above, the code in the loop being fedundefined
doesn't affect the correctness of the algorithm - it doesn't change the calculation at all (it's as if the extra iteration never happened).
– Michael Geary
Dec 6 at 22:01
3
@Bergi The V8 author who gave this talk said that the attempted read outside the array bounds causes the array to be treated as if it had a mix of types: instead of the optimized number-only format, it de-optimizes the array back to the generic format. In the optimized case it is a simple array of numbers as you might use in a C program. In the de-optimized case it is an array ofValue
objects which can hold references to values of any type. (I made up the nameValue
, but the point is that the array elements are not just simple numbers but are objects that wrap numbers or other types.)
– Michael Geary
Dec 6 at 23:13
2
I work on V8. The array in question would be marked asHOLEY
because it's created usingnew Array(n)
(although this part of the code was not visible in the OP).HOLEY
arrays remainHOLEY
forever in V8, even when they're later filled. That said, the array being holey is not the reason for the perf issue in this case; it just means we have to do an extra Smi check on each iteration (to guard against holes), which is no big deal.
– Mathias Bynens
2 days ago
1
1
I'm pretty sure that the array is still contiguous - there's no reason to change the memory layout. What matters though is that the index-out-of-bounds check in the property access cannot be optimised away, and the code sometimes is fed
undefined
instead of a number leading to a different calculation.– Bergi
Dec 6 at 15:04
I'm pretty sure that the array is still contiguous - there's no reason to change the memory layout. What matters though is that the index-out-of-bounds check in the property access cannot be optimised away, and the code sometimes is fed
undefined
instead of a number leading to a different calculation.– Bergi
Dec 6 at 15:04
1
1
@Bergi I'm no JS/V8 expert, but objects in GC languages are almost always references to the actual objects. Those actual objects have independent allocation, even if the references are contiguous, because GC object lifetime isn't tied together. Optimizers can pack those independent allocations to being adjacent, but (a) memory use skyrockets and (b) you have two contiguous blocks being iterated over (the references and the data referred to) instead of one. I suppose an insane optimizer could interleave the references and the data referred to and have an array that owns memory stripes...
– Yakk - Adam Nevraumont
Dec 6 at 16:21
@Bergi I'm no JS/V8 expert, but objects in GC languages are almost always references to the actual objects. Those actual objects have independent allocation, even if the references are contiguous, because GC object lifetime isn't tied together. Optimizers can pack those independent allocations to being adjacent, but (a) memory use skyrockets and (b) you have two contiguous blocks being iterated over (the references and the data referred to) instead of one. I suppose an insane optimizer could interleave the references and the data referred to and have an array that owns memory stripes...
– Yakk - Adam Nevraumont
Dec 6 at 16:21
1
1
@Bergi The array may still be contiguous in the non-optimized case, but the array elements are not the same type as in the optimized case. The optimized version is a simple array of numbers with no additional fluff. The non-optimized version is an array of objects (an internal object format, not JavaScript
Object
), because it has to support any mix of data types in the array. As I mentioned above, the code in the loop being fed undefined
doesn't affect the correctness of the algorithm - it doesn't change the calculation at all (it's as if the extra iteration never happened).– Michael Geary
Dec 6 at 22:01
@Bergi The array may still be contiguous in the non-optimized case, but the array elements are not the same type as in the optimized case. The optimized version is a simple array of numbers with no additional fluff. The non-optimized version is an array of objects (an internal object format, not JavaScript
Object
), because it has to support any mix of data types in the array. As I mentioned above, the code in the loop being fed undefined
doesn't affect the correctness of the algorithm - it doesn't change the calculation at all (it's as if the extra iteration never happened).– Michael Geary
Dec 6 at 22:01
3
3
@Bergi The V8 author who gave this talk said that the attempted read outside the array bounds causes the array to be treated as if it had a mix of types: instead of the optimized number-only format, it de-optimizes the array back to the generic format. In the optimized case it is a simple array of numbers as you might use in a C program. In the de-optimized case it is an array of
Value
objects which can hold references to values of any type. (I made up the name Value
, but the point is that the array elements are not just simple numbers but are objects that wrap numbers or other types.)– Michael Geary
Dec 6 at 23:13
@Bergi The V8 author who gave this talk said that the attempted read outside the array bounds causes the array to be treated as if it had a mix of types: instead of the optimized number-only format, it de-optimizes the array back to the generic format. In the optimized case it is a simple array of numbers as you might use in a C program. In the de-optimized case it is an array of
Value
objects which can hold references to values of any type. (I made up the name Value
, but the point is that the array elements are not just simple numbers but are objects that wrap numbers or other types.)– Michael Geary
Dec 6 at 23:13
2
2
I work on V8. The array in question would be marked as
HOLEY
because it's created using new Array(n)
(although this part of the code was not visible in the OP). HOLEY
arrays remain HOLEY
forever in V8, even when they're later filled. That said, the array being holey is not the reason for the perf issue in this case; it just means we have to do an extra Smi check on each iteration (to guard against holes), which is no big deal.– Mathias Bynens
2 days ago
I work on V8. The array in question would be marked as
HOLEY
because it's created using new Array(n)
(although this part of the code was not visible in the OP). HOLEY
arrays remain HOLEY
forever in V8, even when they're later filled. That said, the array being holey is not the reason for the perf issue in this case; it just means we have to do an extra Smi check on each iteration (to guard against holes), which is no big deal.– Mathias Bynens
2 days ago
|
show 7 more comments
up vote
47
down vote
I work on V8 at Google, and wanted to provide some additional insight on top of the existing answers and comments.
For reference, here's the full code example from the slides:
var iterations = 25000;
function Primes() {
this.prime_count = 0;
this.primes = new Array(iterations);
this.getPrimeCount = function() { return this.prime_count; }
this.getPrime = function(i) { return this.primes[i]; }
this.addPrime = function(i) {
this.primes[this.prime_count++] = i;
}
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if ((candidate % this.primes[i]) == 0) return true;
}
return false;
}
};
function main() {
var p = new Primes();
var c = 1;
while (p.getPrimeCount() < iterations) {
if (!p.isPrimeDivisible(c)) {
p.addPrime(c);
}
c++;
}
console.log(p.getPrime(p.getPrimeCount() - 1));
}
main();
First and foremost, the performance difference has nothing to do with the <
and <=
operators directly. So please don't jump through hoops just to avoid <=
in your code because you read on Stack Overflow that it's slow --- it isn't!
Second, folks pointed out that the array is "holey". This was not clear from the code snippet in OP's post, but it is clear when you look at the code that initializes this.primes
:
this.primes = new Array(iterations);
This results in an array with a HOLEY
elements kind in V8, even if the array ends up completely filled/packed/contiguous. In general, operations on holey arrays are slower than operations on packed arrays, but in this case the difference is negligible: it amounts to 1 additional Smi (small integer) check (to guard against holes) each time we hit this.primes[i]
in the loop within isPrimeDivisible
. No big deal!
TL;DR The array being HOLEY
is not the problem here.
Others pointed out that the code reads out of bounds. It's generally recommended to avoid reading beyond the length of arrays, and in this case it would indeed have avoided the massive drop in performance. But why though? V8 can handle some of these out-of-bound scenarios with only a minor performance impact. What's so special about this particular case, then?
The out-of-bounds read results in this.primes[i]
being undefined
on this line:
if ((candidate % this.primes[i]) == 0) return true;
And that brings us to the real issue: the %
operator is now being used with non-integer operands!
integer % someOtherInteger
can be computed very efficiently; JavaScript engines can produce highly-optimized machine code for this case.integer % undefined
on the other hand amounts to a way less efficientFloat64Mod
, sinceundefined
is represented as a double.
The code snippet can indeed be improved by changing the <=
into <
on this line:
for (var i = 1; i <= this.prime_count; ++i) {
...not because <=
is somehow a superior operator than <
, but just because this avoids the out-of-bounds read in this particular case.
3
To be 100% complete, the keyed load IC forthis.primes[i]
inisPrimeDivisible
unexpectedly goes megamorphic in V8. That seems like a bug: bugs.chromium.org/p/v8/issues/detail?id=8561
– Mathias Bynens
2 days ago
1
Thanks for your great explanation. Just one question, Can you please explain little more onFloat64Mod since undefined is represented as a double
? So you mean undefined holds a real value in v8 code?
– Hardik Modha
2 days ago
3
Indeed; any JavaScript value needs to be represented somehow behind the scenes. Things likeNaN
,Infinity
, andundefined
are represented as doubles.0
would be a Smi (small integer), but-0
would be a double. Holes in arrays are represented using a special value calledthe_hole
. Etc.
– Mathias Bynens
2 days ago
+1 Thanks Mathias!!! I now have so many questions. (1) Regarding "undefined are represented as doubles" I assume you use tagged/flagged IEEE745 NaN-values as I suggested in my [comment](#comment94188748_53644250) (and another NaN-class for 'the_hole' when in floaty-mode)? (2) What constitutes as a small integer (I understand why-0
is double, and I understand 'integer') so what is the largest value that constitutes as optimisable SmallInteger (I've had this question for a long time ever since an older V8 talk where the lady at the end couldn't answer this question after talking about it).
– GitaarLAB
2 days ago
(3) Did calculating ONEFloat64Mod
(the last one at the end) take more than 6 times the time it took to crunch through all previous values, OR did passing the 'holey array' prevent theisPrimeDivisible
function-code to be optimized to begin with (so compiled usingFloat64Mod
from the start) OR did we get RE-compilation at the end (as soon as it encountered theval%undefined
calculation) and if so, did it then recalculate all work using the newly compiled function now withFloat64Mod
?
– GitaarLAB
2 days ago
|
show 4 more comments
up vote
47
down vote
I work on V8 at Google, and wanted to provide some additional insight on top of the existing answers and comments.
For reference, here's the full code example from the slides:
var iterations = 25000;
function Primes() {
this.prime_count = 0;
this.primes = new Array(iterations);
this.getPrimeCount = function() { return this.prime_count; }
this.getPrime = function(i) { return this.primes[i]; }
this.addPrime = function(i) {
this.primes[this.prime_count++] = i;
}
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if ((candidate % this.primes[i]) == 0) return true;
}
return false;
}
};
function main() {
var p = new Primes();
var c = 1;
while (p.getPrimeCount() < iterations) {
if (!p.isPrimeDivisible(c)) {
p.addPrime(c);
}
c++;
}
console.log(p.getPrime(p.getPrimeCount() - 1));
}
main();
First and foremost, the performance difference has nothing to do with the <
and <=
operators directly. So please don't jump through hoops just to avoid <=
in your code because you read on Stack Overflow that it's slow --- it isn't!
Second, folks pointed out that the array is "holey". This was not clear from the code snippet in OP's post, but it is clear when you look at the code that initializes this.primes
:
this.primes = new Array(iterations);
This results in an array with a HOLEY
elements kind in V8, even if the array ends up completely filled/packed/contiguous. In general, operations on holey arrays are slower than operations on packed arrays, but in this case the difference is negligible: it amounts to 1 additional Smi (small integer) check (to guard against holes) each time we hit this.primes[i]
in the loop within isPrimeDivisible
. No big deal!
TL;DR The array being HOLEY
is not the problem here.
Others pointed out that the code reads out of bounds. It's generally recommended to avoid reading beyond the length of arrays, and in this case it would indeed have avoided the massive drop in performance. But why though? V8 can handle some of these out-of-bound scenarios with only a minor performance impact. What's so special about this particular case, then?
The out-of-bounds read results in this.primes[i]
being undefined
on this line:
if ((candidate % this.primes[i]) == 0) return true;
And that brings us to the real issue: the %
operator is now being used with non-integer operands!
integer % someOtherInteger
can be computed very efficiently; JavaScript engines can produce highly-optimized machine code for this case.integer % undefined
on the other hand amounts to a way less efficientFloat64Mod
, sinceundefined
is represented as a double.
The code snippet can indeed be improved by changing the <=
into <
on this line:
for (var i = 1; i <= this.prime_count; ++i) {
...not because <=
is somehow a superior operator than <
, but just because this avoids the out-of-bounds read in this particular case.
3
To be 100% complete, the keyed load IC forthis.primes[i]
inisPrimeDivisible
unexpectedly goes megamorphic in V8. That seems like a bug: bugs.chromium.org/p/v8/issues/detail?id=8561
– Mathias Bynens
2 days ago
1
Thanks for your great explanation. Just one question, Can you please explain little more onFloat64Mod since undefined is represented as a double
? So you mean undefined holds a real value in v8 code?
– Hardik Modha
2 days ago
3
Indeed; any JavaScript value needs to be represented somehow behind the scenes. Things likeNaN
,Infinity
, andundefined
are represented as doubles.0
would be a Smi (small integer), but-0
would be a double. Holes in arrays are represented using a special value calledthe_hole
. Etc.
– Mathias Bynens
2 days ago
+1 Thanks Mathias!!! I now have so many questions. (1) Regarding "undefined are represented as doubles" I assume you use tagged/flagged IEEE745 NaN-values as I suggested in my [comment](#comment94188748_53644250) (and another NaN-class for 'the_hole' when in floaty-mode)? (2) What constitutes as a small integer (I understand why-0
is double, and I understand 'integer') so what is the largest value that constitutes as optimisable SmallInteger (I've had this question for a long time ever since an older V8 talk where the lady at the end couldn't answer this question after talking about it).
– GitaarLAB
2 days ago
(3) Did calculating ONEFloat64Mod
(the last one at the end) take more than 6 times the time it took to crunch through all previous values, OR did passing the 'holey array' prevent theisPrimeDivisible
function-code to be optimized to begin with (so compiled usingFloat64Mod
from the start) OR did we get RE-compilation at the end (as soon as it encountered theval%undefined
calculation) and if so, did it then recalculate all work using the newly compiled function now withFloat64Mod
?
– GitaarLAB
2 days ago
|
show 4 more comments
up vote
47
down vote
up vote
47
down vote
I work on V8 at Google, and wanted to provide some additional insight on top of the existing answers and comments.
For reference, here's the full code example from the slides:
var iterations = 25000;
function Primes() {
this.prime_count = 0;
this.primes = new Array(iterations);
this.getPrimeCount = function() { return this.prime_count; }
this.getPrime = function(i) { return this.primes[i]; }
this.addPrime = function(i) {
this.primes[this.prime_count++] = i;
}
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if ((candidate % this.primes[i]) == 0) return true;
}
return false;
}
};
function main() {
var p = new Primes();
var c = 1;
while (p.getPrimeCount() < iterations) {
if (!p.isPrimeDivisible(c)) {
p.addPrime(c);
}
c++;
}
console.log(p.getPrime(p.getPrimeCount() - 1));
}
main();
First and foremost, the performance difference has nothing to do with the <
and <=
operators directly. So please don't jump through hoops just to avoid <=
in your code because you read on Stack Overflow that it's slow --- it isn't!
Second, folks pointed out that the array is "holey". This was not clear from the code snippet in OP's post, but it is clear when you look at the code that initializes this.primes
:
this.primes = new Array(iterations);
This results in an array with a HOLEY
elements kind in V8, even if the array ends up completely filled/packed/contiguous. In general, operations on holey arrays are slower than operations on packed arrays, but in this case the difference is negligible: it amounts to 1 additional Smi (small integer) check (to guard against holes) each time we hit this.primes[i]
in the loop within isPrimeDivisible
. No big deal!
TL;DR The array being HOLEY
is not the problem here.
Others pointed out that the code reads out of bounds. It's generally recommended to avoid reading beyond the length of arrays, and in this case it would indeed have avoided the massive drop in performance. But why though? V8 can handle some of these out-of-bound scenarios with only a minor performance impact. What's so special about this particular case, then?
The out-of-bounds read results in this.primes[i]
being undefined
on this line:
if ((candidate % this.primes[i]) == 0) return true;
And that brings us to the real issue: the %
operator is now being used with non-integer operands!
integer % someOtherInteger
can be computed very efficiently; JavaScript engines can produce highly-optimized machine code for this case.integer % undefined
on the other hand amounts to a way less efficientFloat64Mod
, sinceundefined
is represented as a double.
The code snippet can indeed be improved by changing the <=
into <
on this line:
for (var i = 1; i <= this.prime_count; ++i) {
...not because <=
is somehow a superior operator than <
, but just because this avoids the out-of-bounds read in this particular case.
I work on V8 at Google, and wanted to provide some additional insight on top of the existing answers and comments.
For reference, here's the full code example from the slides:
var iterations = 25000;
function Primes() {
this.prime_count = 0;
this.primes = new Array(iterations);
this.getPrimeCount = function() { return this.prime_count; }
this.getPrime = function(i) { return this.primes[i]; }
this.addPrime = function(i) {
this.primes[this.prime_count++] = i;
}
this.isPrimeDivisible = function(candidate) {
for (var i = 1; i <= this.prime_count; ++i) {
if ((candidate % this.primes[i]) == 0) return true;
}
return false;
}
};
function main() {
var p = new Primes();
var c = 1;
while (p.getPrimeCount() < iterations) {
if (!p.isPrimeDivisible(c)) {
p.addPrime(c);
}
c++;
}
console.log(p.getPrime(p.getPrimeCount() - 1));
}
main();
First and foremost, the performance difference has nothing to do with the <
and <=
operators directly. So please don't jump through hoops just to avoid <=
in your code because you read on Stack Overflow that it's slow --- it isn't!
Second, folks pointed out that the array is "holey". This was not clear from the code snippet in OP's post, but it is clear when you look at the code that initializes this.primes
:
this.primes = new Array(iterations);
This results in an array with a HOLEY
elements kind in V8, even if the array ends up completely filled/packed/contiguous. In general, operations on holey arrays are slower than operations on packed arrays, but in this case the difference is negligible: it amounts to 1 additional Smi (small integer) check (to guard against holes) each time we hit this.primes[i]
in the loop within isPrimeDivisible
. No big deal!
TL;DR The array being HOLEY
is not the problem here.
Others pointed out that the code reads out of bounds. It's generally recommended to avoid reading beyond the length of arrays, and in this case it would indeed have avoided the massive drop in performance. But why though? V8 can handle some of these out-of-bound scenarios with only a minor performance impact. What's so special about this particular case, then?
The out-of-bounds read results in this.primes[i]
being undefined
on this line:
if ((candidate % this.primes[i]) == 0) return true;
And that brings us to the real issue: the %
operator is now being used with non-integer operands!
integer % someOtherInteger
can be computed very efficiently; JavaScript engines can produce highly-optimized machine code for this case.integer % undefined
on the other hand amounts to a way less efficientFloat64Mod
, sinceundefined
is represented as a double.
The code snippet can indeed be improved by changing the <=
into <
on this line:
for (var i = 1; i <= this.prime_count; ++i) {
...not because <=
is somehow a superior operator than <
, but just because this avoids the out-of-bounds read in this particular case.
answered 2 days ago
Mathias Bynens
102k38173221
102k38173221
3
To be 100% complete, the keyed load IC forthis.primes[i]
inisPrimeDivisible
unexpectedly goes megamorphic in V8. That seems like a bug: bugs.chromium.org/p/v8/issues/detail?id=8561
– Mathias Bynens
2 days ago
1
Thanks for your great explanation. Just one question, Can you please explain little more onFloat64Mod since undefined is represented as a double
? So you mean undefined holds a real value in v8 code?
– Hardik Modha
2 days ago
3
Indeed; any JavaScript value needs to be represented somehow behind the scenes. Things likeNaN
,Infinity
, andundefined
are represented as doubles.0
would be a Smi (small integer), but-0
would be a double. Holes in arrays are represented using a special value calledthe_hole
. Etc.
– Mathias Bynens
2 days ago
+1 Thanks Mathias!!! I now have so many questions. (1) Regarding "undefined are represented as doubles" I assume you use tagged/flagged IEEE745 NaN-values as I suggested in my [comment](#comment94188748_53644250) (and another NaN-class for 'the_hole' when in floaty-mode)? (2) What constitutes as a small integer (I understand why-0
is double, and I understand 'integer') so what is the largest value that constitutes as optimisable SmallInteger (I've had this question for a long time ever since an older V8 talk where the lady at the end couldn't answer this question after talking about it).
– GitaarLAB
2 days ago
(3) Did calculating ONEFloat64Mod
(the last one at the end) take more than 6 times the time it took to crunch through all previous values, OR did passing the 'holey array' prevent theisPrimeDivisible
function-code to be optimized to begin with (so compiled usingFloat64Mod
from the start) OR did we get RE-compilation at the end (as soon as it encountered theval%undefined
calculation) and if so, did it then recalculate all work using the newly compiled function now withFloat64Mod
?
– GitaarLAB
2 days ago
|
show 4 more comments
3
To be 100% complete, the keyed load IC forthis.primes[i]
inisPrimeDivisible
unexpectedly goes megamorphic in V8. That seems like a bug: bugs.chromium.org/p/v8/issues/detail?id=8561
– Mathias Bynens
2 days ago
1
Thanks for your great explanation. Just one question, Can you please explain little more onFloat64Mod since undefined is represented as a double
? So you mean undefined holds a real value in v8 code?
– Hardik Modha
2 days ago
3
Indeed; any JavaScript value needs to be represented somehow behind the scenes. Things likeNaN
,Infinity
, andundefined
are represented as doubles.0
would be a Smi (small integer), but-0
would be a double. Holes in arrays are represented using a special value calledthe_hole
. Etc.
– Mathias Bynens
2 days ago
+1 Thanks Mathias!!! I now have so many questions. (1) Regarding "undefined are represented as doubles" I assume you use tagged/flagged IEEE745 NaN-values as I suggested in my [comment](#comment94188748_53644250) (and another NaN-class for 'the_hole' when in floaty-mode)? (2) What constitutes as a small integer (I understand why-0
is double, and I understand 'integer') so what is the largest value that constitutes as optimisable SmallInteger (I've had this question for a long time ever since an older V8 talk where the lady at the end couldn't answer this question after talking about it).
– GitaarLAB
2 days ago
(3) Did calculating ONEFloat64Mod
(the last one at the end) take more than 6 times the time it took to crunch through all previous values, OR did passing the 'holey array' prevent theisPrimeDivisible
function-code to be optimized to begin with (so compiled usingFloat64Mod
from the start) OR did we get RE-compilation at the end (as soon as it encountered theval%undefined
calculation) and if so, did it then recalculate all work using the newly compiled function now withFloat64Mod
?
– GitaarLAB
2 days ago
3
3
To be 100% complete, the keyed load IC for
this.primes[i]
in isPrimeDivisible
unexpectedly goes megamorphic in V8. That seems like a bug: bugs.chromium.org/p/v8/issues/detail?id=8561– Mathias Bynens
2 days ago
To be 100% complete, the keyed load IC for
this.primes[i]
in isPrimeDivisible
unexpectedly goes megamorphic in V8. That seems like a bug: bugs.chromium.org/p/v8/issues/detail?id=8561– Mathias Bynens
2 days ago
1
1
Thanks for your great explanation. Just one question, Can you please explain little more on
Float64Mod since undefined is represented as a double
? So you mean undefined holds a real value in v8 code?– Hardik Modha
2 days ago
Thanks for your great explanation. Just one question, Can you please explain little more on
Float64Mod since undefined is represented as a double
? So you mean undefined holds a real value in v8 code?– Hardik Modha
2 days ago
3
3
Indeed; any JavaScript value needs to be represented somehow behind the scenes. Things like
NaN
, Infinity
, and undefined
are represented as doubles. 0
would be a Smi (small integer), but -0
would be a double. Holes in arrays are represented using a special value called the_hole
. Etc.– Mathias Bynens
2 days ago
Indeed; any JavaScript value needs to be represented somehow behind the scenes. Things like
NaN
, Infinity
, and undefined
are represented as doubles. 0
would be a Smi (small integer), but -0
would be a double. Holes in arrays are represented using a special value called the_hole
. Etc.– Mathias Bynens
2 days ago
+1 Thanks Mathias!!! I now have so many questions. (1) Regarding "undefined are represented as doubles" I assume you use tagged/flagged IEEE745 NaN-values as I suggested in my [comment](#comment94188748_53644250) (and another NaN-class for 'the_hole' when in floaty-mode)? (2) What constitutes as a small integer (I understand why
-0
is double, and I understand 'integer') so what is the largest value that constitutes as optimisable SmallInteger (I've had this question for a long time ever since an older V8 talk where the lady at the end couldn't answer this question after talking about it).– GitaarLAB
2 days ago
+1 Thanks Mathias!!! I now have so many questions. (1) Regarding "undefined are represented as doubles" I assume you use tagged/flagged IEEE745 NaN-values as I suggested in my [comment](#comment94188748_53644250) (and another NaN-class for 'the_hole' when in floaty-mode)? (2) What constitutes as a small integer (I understand why
-0
is double, and I understand 'integer') so what is the largest value that constitutes as optimisable SmallInteger (I've had this question for a long time ever since an older V8 talk where the lady at the end couldn't answer this question after talking about it).– GitaarLAB
2 days ago
(3) Did calculating ONE
Float64Mod
(the last one at the end) take more than 6 times the time it took to crunch through all previous values, OR did passing the 'holey array' prevent the isPrimeDivisible
function-code to be optimized to begin with (so compiled using Float64Mod
from the start) OR did we get RE-compilation at the end (as soon as it encountered the val%undefined
calculation) and if so, did it then recalculate all work using the newly compiled function now with Float64Mod
?– GitaarLAB
2 days ago
(3) Did calculating ONE
Float64Mod
(the last one at the end) take more than 6 times the time it took to crunch through all previous values, OR did passing the 'holey array' prevent the isPrimeDivisible
function-code to be optimized to begin with (so compiled using Float64Mod
from the start) OR did we get RE-compilation at the end (as soon as it encountered the val%undefined
calculation) and if so, did it then recalculate all work using the newly compiled function now with Float64Mod
?– GitaarLAB
2 days ago
|
show 4 more comments
up vote
10
down vote
TL;DR The slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
Someone just has to say this (utterly amazed nobody already did):
There used to be a time when the OP's snippet would be a de-facto example in a beginners programming book intended to outline/emphasize that 'arrays' in javascript are indexed starting at 0, not 1, and as such be used as an example of a common 'beginners mistake' (don't you love how I avoided the phrase 'programing error'
;)
): out-of-bounds Array access. Example 1:
a Dense Array
(being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between <
vs <=
(or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):
- Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
- Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language.
And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results.
A spec conforming engine should give spec conforming results for any defined input.
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
Imagine the following simple function:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number
, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr
...
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations...
We don't even know what kind of TYPE the function RETURN until we are done...
Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
Thus, if you were to compile function sum
JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
All of this takes time!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
Access to nonexistent object properties and out of bounds array elements returns the
undefined
value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.
...
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
- unboxed representations of integers and floating-point numbers;
- absence of runtime type checks;
- absence of garbage collection; and
- efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
http://asmjs.org/spec/latest/
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
Theoretical example based on above information, regarding the OP's snippet:
- Call to isPrimeDivisible
- Compile isPrimeDivisible using general assumptions (like no out of bounds access)
- Do work
- BAM, suddenly array accesses out of bounds (right at the end).
- Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
- Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
- Return result
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
It's pretty tempting to assume that this.primes
is a 'dense array' pure numerical which was either
- Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
- most likely generated using a numerical function filling a pre-sized (
new Array(/*size value*/)
) in ascending sequential order (another long-time known candidate to become a 'real' array).
We also know that the primes
array's length is cached as prime_count
! (indicating it's intent and fixed size).
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
It is therefore reasonable to assume that Array primes
is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array
.
As I have tried to make clear with my sum
function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String
to the sum
function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum
should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
As it seems slightly bonkus to convert the Typed_Array-like primes
Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!
Under these assumptions that leaves 2 options:
- Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
- Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
I now really wonder which of these 2 it is!
2
A good discussion on some of the underlying issues - however you barely explain the answer at all (in the very last sentence). Maybe add a tl;dr to to the very top? e.g. "The slower loop is due to exceeding the bounds array, which forces the engine to re-evaluate the loop without optimizations. Read on to learn why."
– brichins
Dec 7 at 4:19
@brichins: thanks, and thanks for the suggestion, which I have reworded a bit in light of my second additional edit, because the more I now think of it, that statement on the top seems actually correct as well
– GitaarLAB
Dec 7 at 5:39
add a comment |
up vote
10
down vote
TL;DR The slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
Someone just has to say this (utterly amazed nobody already did):
There used to be a time when the OP's snippet would be a de-facto example in a beginners programming book intended to outline/emphasize that 'arrays' in javascript are indexed starting at 0, not 1, and as such be used as an example of a common 'beginners mistake' (don't you love how I avoided the phrase 'programing error'
;)
): out-of-bounds Array access. Example 1:
a Dense Array
(being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between <
vs <=
(or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):
- Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
- Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language.
And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results.
A spec conforming engine should give spec conforming results for any defined input.
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
Imagine the following simple function:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number
, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr
...
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations...
We don't even know what kind of TYPE the function RETURN until we are done...
Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
Thus, if you were to compile function sum
JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
All of this takes time!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
Access to nonexistent object properties and out of bounds array elements returns the
undefined
value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.
...
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
- unboxed representations of integers and floating-point numbers;
- absence of runtime type checks;
- absence of garbage collection; and
- efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
http://asmjs.org/spec/latest/
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
Theoretical example based on above information, regarding the OP's snippet:
- Call to isPrimeDivisible
- Compile isPrimeDivisible using general assumptions (like no out of bounds access)
- Do work
- BAM, suddenly array accesses out of bounds (right at the end).
- Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
- Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
- Return result
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
It's pretty tempting to assume that this.primes
is a 'dense array' pure numerical which was either
- Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
- most likely generated using a numerical function filling a pre-sized (
new Array(/*size value*/)
) in ascending sequential order (another long-time known candidate to become a 'real' array).
We also know that the primes
array's length is cached as prime_count
! (indicating it's intent and fixed size).
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
It is therefore reasonable to assume that Array primes
is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array
.
As I have tried to make clear with my sum
function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String
to the sum
function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum
should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
As it seems slightly bonkus to convert the Typed_Array-like primes
Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!
Under these assumptions that leaves 2 options:
- Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
- Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
I now really wonder which of these 2 it is!
2
A good discussion on some of the underlying issues - however you barely explain the answer at all (in the very last sentence). Maybe add a tl;dr to to the very top? e.g. "The slower loop is due to exceeding the bounds array, which forces the engine to re-evaluate the loop without optimizations. Read on to learn why."
– brichins
Dec 7 at 4:19
@brichins: thanks, and thanks for the suggestion, which I have reworded a bit in light of my second additional edit, because the more I now think of it, that statement on the top seems actually correct as well
– GitaarLAB
Dec 7 at 5:39
add a comment |
up vote
10
down vote
up vote
10
down vote
TL;DR The slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
Someone just has to say this (utterly amazed nobody already did):
There used to be a time when the OP's snippet would be a de-facto example in a beginners programming book intended to outline/emphasize that 'arrays' in javascript are indexed starting at 0, not 1, and as such be used as an example of a common 'beginners mistake' (don't you love how I avoided the phrase 'programing error'
;)
): out-of-bounds Array access. Example 1:
a Dense Array
(being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between <
vs <=
(or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):
- Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
- Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language.
And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results.
A spec conforming engine should give spec conforming results for any defined input.
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
Imagine the following simple function:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number
, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr
...
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations...
We don't even know what kind of TYPE the function RETURN until we are done...
Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
Thus, if you were to compile function sum
JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
All of this takes time!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
Access to nonexistent object properties and out of bounds array elements returns the
undefined
value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.
...
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
- unboxed representations of integers and floating-point numbers;
- absence of runtime type checks;
- absence of garbage collection; and
- efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
http://asmjs.org/spec/latest/
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
Theoretical example based on above information, regarding the OP's snippet:
- Call to isPrimeDivisible
- Compile isPrimeDivisible using general assumptions (like no out of bounds access)
- Do work
- BAM, suddenly array accesses out of bounds (right at the end).
- Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
- Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
- Return result
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
It's pretty tempting to assume that this.primes
is a 'dense array' pure numerical which was either
- Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
- most likely generated using a numerical function filling a pre-sized (
new Array(/*size value*/)
) in ascending sequential order (another long-time known candidate to become a 'real' array).
We also know that the primes
array's length is cached as prime_count
! (indicating it's intent and fixed size).
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
It is therefore reasonable to assume that Array primes
is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array
.
As I have tried to make clear with my sum
function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String
to the sum
function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum
should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
As it seems slightly bonkus to convert the Typed_Array-like primes
Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!
Under these assumptions that leaves 2 options:
- Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
- Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
I now really wonder which of these 2 it is!
TL;DR The slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
Someone just has to say this (utterly amazed nobody already did):
There used to be a time when the OP's snippet would be a de-facto example in a beginners programming book intended to outline/emphasize that 'arrays' in javascript are indexed starting at 0, not 1, and as such be used as an example of a common 'beginners mistake' (don't you love how I avoided the phrase 'programing error'
;)
): out-of-bounds Array access. Example 1:
a Dense Array
(being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between <
vs <=
(or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):
- Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
- Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language.
And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results.
A spec conforming engine should give spec conforming results for any defined input.
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
Imagine the following simple function:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number
, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr
...
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations...
We don't even know what kind of TYPE the function RETURN until we are done...
Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
Thus, if you were to compile function sum
JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
All of this takes time!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
Access to nonexistent object properties and out of bounds array elements returns the
undefined
value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.
...
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
- unboxed representations of integers and floating-point numbers;
- absence of runtime type checks;
- absence of garbage collection; and
- efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
http://asmjs.org/spec/latest/
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
Theoretical example based on above information, regarding the OP's snippet:
- Call to isPrimeDivisible
- Compile isPrimeDivisible using general assumptions (like no out of bounds access)
- Do work
- BAM, suddenly array accesses out of bounds (right at the end).
- Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
- Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
- Return result
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
It's pretty tempting to assume that this.primes
is a 'dense array' pure numerical which was either
- Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
- most likely generated using a numerical function filling a pre-sized (
new Array(/*size value*/)
) in ascending sequential order (another long-time known candidate to become a 'real' array).
We also know that the primes
array's length is cached as prime_count
! (indicating it's intent and fixed size).
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
It is therefore reasonable to assume that Array primes
is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array
.
As I have tried to make clear with my sum
function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String
to the sum
function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum
should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
As it seems slightly bonkus to convert the Typed_Array-like primes
Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!
Under these assumptions that leaves 2 options:
- Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
- Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
I now really wonder which of these 2 it is!
edited Dec 7 at 6:24
answered Dec 7 at 1:28
GitaarLAB
11.4k74368
11.4k74368
2
A good discussion on some of the underlying issues - however you barely explain the answer at all (in the very last sentence). Maybe add a tl;dr to to the very top? e.g. "The slower loop is due to exceeding the bounds array, which forces the engine to re-evaluate the loop without optimizations. Read on to learn why."
– brichins
Dec 7 at 4:19
@brichins: thanks, and thanks for the suggestion, which I have reworded a bit in light of my second additional edit, because the more I now think of it, that statement on the top seems actually correct as well
– GitaarLAB
Dec 7 at 5:39
add a comment |
2
A good discussion on some of the underlying issues - however you barely explain the answer at all (in the very last sentence). Maybe add a tl;dr to to the very top? e.g. "The slower loop is due to exceeding the bounds array, which forces the engine to re-evaluate the loop without optimizations. Read on to learn why."
– brichins
Dec 7 at 4:19
@brichins: thanks, and thanks for the suggestion, which I have reworded a bit in light of my second additional edit, because the more I now think of it, that statement on the top seems actually correct as well
– GitaarLAB
Dec 7 at 5:39
2
2
A good discussion on some of the underlying issues - however you barely explain the answer at all (in the very last sentence). Maybe add a tl;dr to to the very top? e.g. "The slower loop is due to exceeding the bounds array, which forces the engine to re-evaluate the loop without optimizations. Read on to learn why."
– brichins
Dec 7 at 4:19
A good discussion on some of the underlying issues - however you barely explain the answer at all (in the very last sentence). Maybe add a tl;dr to to the very top? e.g. "The slower loop is due to exceeding the bounds array, which forces the engine to re-evaluate the loop without optimizations. Read on to learn why."
– brichins
Dec 7 at 4:19
@brichins: thanks, and thanks for the suggestion, which I have reworded a bit in light of my second additional edit, because the more I now think of it, that statement on the top seems actually correct as well
– GitaarLAB
Dec 7 at 5:39
@brichins: thanks, and thanks for the suggestion, which I have reworded a bit in light of my second additional edit, because the more I now think of it, that statement on the top seems actually correct as well
– GitaarLAB
Dec 7 at 5:39
add a comment |
up vote
3
down vote
To add some scientificness to it, here's a jsperf
https://jsperf.com/ints-values-in-out-of-array-bounds
It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:
- 1. Looping out of bounds
- 2. Holey arrays
- 3. Modular arithmetic against NaNs
- 4. Completely undefined values
- 5. Using a
new Array()
It shows that the first 4 cases are really bad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array()
case is almost as good as the raw array, just a few percent slower.
add a comment |
up vote
3
down vote
To add some scientificness to it, here's a jsperf
https://jsperf.com/ints-values-in-out-of-array-bounds
It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:
- 1. Looping out of bounds
- 2. Holey arrays
- 3. Modular arithmetic against NaNs
- 4. Completely undefined values
- 5. Using a
new Array()
It shows that the first 4 cases are really bad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array()
case is almost as good as the raw array, just a few percent slower.
add a comment |
up vote
3
down vote
up vote
3
down vote
To add some scientificness to it, here's a jsperf
https://jsperf.com/ints-values-in-out-of-array-bounds
It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:
- 1. Looping out of bounds
- 2. Holey arrays
- 3. Modular arithmetic against NaNs
- 4. Completely undefined values
- 5. Using a
new Array()
It shows that the first 4 cases are really bad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array()
case is almost as good as the raw array, just a few percent slower.
To add some scientificness to it, here's a jsperf
https://jsperf.com/ints-values-in-out-of-array-bounds
It tests the control case of an array filled with ints and looping doing modular arithmetic while staying within bounds. It has 5 test cases:
- 1. Looping out of bounds
- 2. Holey arrays
- 3. Modular arithmetic against NaNs
- 4. Completely undefined values
- 5. Using a
new Array()
It shows that the first 4 cases are really bad for performance. Looping out of bounds is a bit better than the other 3, but all 4 are roughly 98% slower than the best case.
The new Array()
case is almost as good as the raw array, just a few percent slower.
edited 2 days ago
answered 2 days ago
Nathan Adams
1107
1107
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53643962%2fwhy-is-slower-than-in-v8%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
8
@DacreDenny The computational difficulty of
<=
and<
is identical, both in theory and in actual implementation in all modern processors (and interpreters).– TypeIA
Dec 6 at 3:11
1
I have read the document, there is a
main
code that call that function in a loop that runs25000
times, so you are doing a lot of less iterations overall doing that change. Also, if an array have a lenght of 5, trying to obtainarray[5]
will go outside his limit giving anundefined
value since arrays start indexing on0
.– Shidersz
Dec 6 at 3:32
1
It would be helpful if this question explained how much of a speed improvement is gained (e.g., 5 times faster) so people don't get thrown off by the extra iteration. I tried to find how fast in the slides but there were a lot and I had trouble finding it, otherwise I would edit it myself.
– Captain Man
Dec 6 at 16:19
@CaptainMan You're right, the exact speed improvement is hard to glean from the slides because they cover several different issues all at once. But in my conversation with the speaker after this talk, he confirmed that it is not just a tiny fraction of a percent as you might expect from the one extra iteration in this test case, but a big difference: several times faster, perhaps an order of magnitude or more. And the reason for it is that V8 falls back (or fell back in those days) to the de-optimized array format when you try to read outside the array bounds.
– Michael Geary
Dec 6 at 23:24
2
It may be useful to compare a version that uses
<=
but otherwise acts identically to the<
version by doingi <= this.prime_count - 1
. This solves both the "extra iteration" issue and the "one past the end of the array" issue.– TheHansinator
Dec 6 at 23:41