Generating and rendering extremely large graphs
up vote
14
down vote
favorite
I'm interested in generating and rendering extremely large graphs, i.e., ones having $10^6 to 10^8$ vertices. (These graphs will ultimately be rendered in PDF or PNG and printed on large, poster-sized paper.) Although the code may work for small graphs, there are inevitably limitations for large graphs due to computer memory (especially when embedding or laying-out the graph, as we'll see).
Consider the small graph generated by the Collatz conjecture with 400000 nodes (see code below, based on the cited problem):
where the GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This can be computed directly on a Mac laptop. But for larger numbers of vertices, the graph cannot be rendered.
Note especially that this is just a representative graph... I need a solution for many different large graphs, ones without simple geometric structure.
The problem arises when one wants to generate and lay out (or "embed") $10^7$ vertices. The problem is not so much the computation of the vertices and edges (which can be done by writing to a large external file), but instead laying out the graph. The Layout
nominally needs all the vertexes in active memory in order to perform the layout, and that is impossible on most machines.
[I've closed all other applications to provide Mathematica with the maximum memory, but even this is not enough for the large graphs I wish to render. Perhaps this entire problem is solved by moving the enormous computation to the Wolfram Cloud... and if this is the case, I guess I'll merely purchase an account and run my code there. But I'm hoping there is a better way... one that can be used on laptops and desktop machines (with large hard memories).]
One might try to layout portions of the overall graph and then "glue" them together, but the local optimization of layout of each portion is likely not appropriate for the full graph: Different branches will ultimately overlap, breaking the rule of the global optimization routine.
My idea is to generate the enormous graph $G$ (but not lay it out). Then select some small random subset of the vertexes, $V_1$, say 1% of those in $G$, to form a smaller, representative graph $G_1$ where the edges in $G_1$ are estimates of the "distance" between each vertex based on their step separation in $G$. Then, layout this representative $G_1$ (which should be no problem because it is small). Then, anchor the vertexes of $G_1$ in place in the final layout and then render all the vertexes of $G$. In this way, the full, final graph $G$ is rendered appropriately, and the memory constraints are not exceeded.
Perhaps there is a way to extract VertexCoordinates and iteratively layout vertices in regions, hopping through the overall graph. (This seems extremely inefficient.)
Or use smart memory management... or another technique.
CollatzSequence[list_] :=
Module[{memory, tmp, chain, result = Internal`Bag},
memory[1] = False;
memory[n_] := (memory[n] = False; True);
Do[chain = Internal`Bag;
tmp = l;
While[memory[tmp], Internal`StuffBag[chain, tmp];
tmp = If[EvenQ[tmp], tmp/2, 3 tmp + 1];];
Internal`StuffBag[chain, tmp];
Internal`StuffBag[result, chain], {l, list}];
Internal`BagPart[#, All] & /@ Internal`BagPart[result, All]];
myCollatzGraph = Rasterize[
Graph[Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence[Range[400000]]],
GraphLayout -> {"PackingLayout" -> "ClosestPacking"},
VertexStyle -> Opacity[0.2, RGBColor[.9, .1, .1]],
EdgeStyle -> RGBColor[.1, .1, .9],
ImageSize -> 2000],
ImageResolution -> 72];
graphs-and-networks memory
add a comment |
up vote
14
down vote
favorite
I'm interested in generating and rendering extremely large graphs, i.e., ones having $10^6 to 10^8$ vertices. (These graphs will ultimately be rendered in PDF or PNG and printed on large, poster-sized paper.) Although the code may work for small graphs, there are inevitably limitations for large graphs due to computer memory (especially when embedding or laying-out the graph, as we'll see).
Consider the small graph generated by the Collatz conjecture with 400000 nodes (see code below, based on the cited problem):
where the GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This can be computed directly on a Mac laptop. But for larger numbers of vertices, the graph cannot be rendered.
Note especially that this is just a representative graph... I need a solution for many different large graphs, ones without simple geometric structure.
The problem arises when one wants to generate and lay out (or "embed") $10^7$ vertices. The problem is not so much the computation of the vertices and edges (which can be done by writing to a large external file), but instead laying out the graph. The Layout
nominally needs all the vertexes in active memory in order to perform the layout, and that is impossible on most machines.
[I've closed all other applications to provide Mathematica with the maximum memory, but even this is not enough for the large graphs I wish to render. Perhaps this entire problem is solved by moving the enormous computation to the Wolfram Cloud... and if this is the case, I guess I'll merely purchase an account and run my code there. But I'm hoping there is a better way... one that can be used on laptops and desktop machines (with large hard memories).]
One might try to layout portions of the overall graph and then "glue" them together, but the local optimization of layout of each portion is likely not appropriate for the full graph: Different branches will ultimately overlap, breaking the rule of the global optimization routine.
My idea is to generate the enormous graph $G$ (but not lay it out). Then select some small random subset of the vertexes, $V_1$, say 1% of those in $G$, to form a smaller, representative graph $G_1$ where the edges in $G_1$ are estimates of the "distance" between each vertex based on their step separation in $G$. Then, layout this representative $G_1$ (which should be no problem because it is small). Then, anchor the vertexes of $G_1$ in place in the final layout and then render all the vertexes of $G$. In this way, the full, final graph $G$ is rendered appropriately, and the memory constraints are not exceeded.
Perhaps there is a way to extract VertexCoordinates and iteratively layout vertices in regions, hopping through the overall graph. (This seems extremely inefficient.)
Or use smart memory management... or another technique.
CollatzSequence[list_] :=
Module[{memory, tmp, chain, result = Internal`Bag},
memory[1] = False;
memory[n_] := (memory[n] = False; True);
Do[chain = Internal`Bag;
tmp = l;
While[memory[tmp], Internal`StuffBag[chain, tmp];
tmp = If[EvenQ[tmp], tmp/2, 3 tmp + 1];];
Internal`StuffBag[chain, tmp];
Internal`StuffBag[result, chain], {l, list}];
Internal`BagPart[#, All] & /@ Internal`BagPart[result, All]];
myCollatzGraph = Rasterize[
Graph[Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence[Range[400000]]],
GraphLayout -> {"PackingLayout" -> "ClosestPacking"},
VertexStyle -> Opacity[0.2, RGBColor[.9, .1, .1]],
EdgeStyle -> RGBColor[.1, .1, .9],
ImageSize -> 2000],
ImageResolution -> 72];
graphs-and-networks memory
1
Isn't this graph a tree? That is what the Collatz conjecture states, right? I mean, this is a structure that one can try to exploit for rendering. Towards rendering: I am afraid that the ultimate constraint will be the poor 32-bit frontend that can render only graphics that are not too complex (or will crash)... One may circumvent the frontend by directly generating code for a vector graphics, though.
– Henrik Schumacher
Dec 3 at 8:29
One additional potential trick would be to useIGSmoothen
from IGraph/M, which replaced chains of edges as in1->2->3->4->5
... with a single weighted edge1->5
. This would reduce the size of the graph by 5-6 fold, and the weights can be used to maintain the layout. However, I noticed thatIGSmoothen
is currently not fast enough. I will look into optimizing it. It should work much faster on graphs of this size.
– Szabolcs
Dec 3 at 9:18
Unbelievably, the IGSmoothen is slow for one single reason:Delete
is slow. I will fix this for the next release (due tonight).
– Szabolcs
Dec 3 at 9:33
add a comment |
up vote
14
down vote
favorite
up vote
14
down vote
favorite
I'm interested in generating and rendering extremely large graphs, i.e., ones having $10^6 to 10^8$ vertices. (These graphs will ultimately be rendered in PDF or PNG and printed on large, poster-sized paper.) Although the code may work for small graphs, there are inevitably limitations for large graphs due to computer memory (especially when embedding or laying-out the graph, as we'll see).
Consider the small graph generated by the Collatz conjecture with 400000 nodes (see code below, based on the cited problem):
where the GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This can be computed directly on a Mac laptop. But for larger numbers of vertices, the graph cannot be rendered.
Note especially that this is just a representative graph... I need a solution for many different large graphs, ones without simple geometric structure.
The problem arises when one wants to generate and lay out (or "embed") $10^7$ vertices. The problem is not so much the computation of the vertices and edges (which can be done by writing to a large external file), but instead laying out the graph. The Layout
nominally needs all the vertexes in active memory in order to perform the layout, and that is impossible on most machines.
[I've closed all other applications to provide Mathematica with the maximum memory, but even this is not enough for the large graphs I wish to render. Perhaps this entire problem is solved by moving the enormous computation to the Wolfram Cloud... and if this is the case, I guess I'll merely purchase an account and run my code there. But I'm hoping there is a better way... one that can be used on laptops and desktop machines (with large hard memories).]
One might try to layout portions of the overall graph and then "glue" them together, but the local optimization of layout of each portion is likely not appropriate for the full graph: Different branches will ultimately overlap, breaking the rule of the global optimization routine.
My idea is to generate the enormous graph $G$ (but not lay it out). Then select some small random subset of the vertexes, $V_1$, say 1% of those in $G$, to form a smaller, representative graph $G_1$ where the edges in $G_1$ are estimates of the "distance" between each vertex based on their step separation in $G$. Then, layout this representative $G_1$ (which should be no problem because it is small). Then, anchor the vertexes of $G_1$ in place in the final layout and then render all the vertexes of $G$. In this way, the full, final graph $G$ is rendered appropriately, and the memory constraints are not exceeded.
Perhaps there is a way to extract VertexCoordinates and iteratively layout vertices in regions, hopping through the overall graph. (This seems extremely inefficient.)
Or use smart memory management... or another technique.
CollatzSequence[list_] :=
Module[{memory, tmp, chain, result = Internal`Bag},
memory[1] = False;
memory[n_] := (memory[n] = False; True);
Do[chain = Internal`Bag;
tmp = l;
While[memory[tmp], Internal`StuffBag[chain, tmp];
tmp = If[EvenQ[tmp], tmp/2, 3 tmp + 1];];
Internal`StuffBag[chain, tmp];
Internal`StuffBag[result, chain], {l, list}];
Internal`BagPart[#, All] & /@ Internal`BagPart[result, All]];
myCollatzGraph = Rasterize[
Graph[Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence[Range[400000]]],
GraphLayout -> {"PackingLayout" -> "ClosestPacking"},
VertexStyle -> Opacity[0.2, RGBColor[.9, .1, .1]],
EdgeStyle -> RGBColor[.1, .1, .9],
ImageSize -> 2000],
ImageResolution -> 72];
graphs-and-networks memory
I'm interested in generating and rendering extremely large graphs, i.e., ones having $10^6 to 10^8$ vertices. (These graphs will ultimately be rendered in PDF or PNG and printed on large, poster-sized paper.) Although the code may work for small graphs, there are inevitably limitations for large graphs due to computer memory (especially when embedding or laying-out the graph, as we'll see).
Consider the small graph generated by the Collatz conjecture with 400000 nodes (see code below, based on the cited problem):
where the GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This can be computed directly on a Mac laptop. But for larger numbers of vertices, the graph cannot be rendered.
Note especially that this is just a representative graph... I need a solution for many different large graphs, ones without simple geometric structure.
The problem arises when one wants to generate and lay out (or "embed") $10^7$ vertices. The problem is not so much the computation of the vertices and edges (which can be done by writing to a large external file), but instead laying out the graph. The Layout
nominally needs all the vertexes in active memory in order to perform the layout, and that is impossible on most machines.
[I've closed all other applications to provide Mathematica with the maximum memory, but even this is not enough for the large graphs I wish to render. Perhaps this entire problem is solved by moving the enormous computation to the Wolfram Cloud... and if this is the case, I guess I'll merely purchase an account and run my code there. But I'm hoping there is a better way... one that can be used on laptops and desktop machines (with large hard memories).]
One might try to layout portions of the overall graph and then "glue" them together, but the local optimization of layout of each portion is likely not appropriate for the full graph: Different branches will ultimately overlap, breaking the rule of the global optimization routine.
My idea is to generate the enormous graph $G$ (but not lay it out). Then select some small random subset of the vertexes, $V_1$, say 1% of those in $G$, to form a smaller, representative graph $G_1$ where the edges in $G_1$ are estimates of the "distance" between each vertex based on their step separation in $G$. Then, layout this representative $G_1$ (which should be no problem because it is small). Then, anchor the vertexes of $G_1$ in place in the final layout and then render all the vertexes of $G$. In this way, the full, final graph $G$ is rendered appropriately, and the memory constraints are not exceeded.
Perhaps there is a way to extract VertexCoordinates and iteratively layout vertices in regions, hopping through the overall graph. (This seems extremely inefficient.)
Or use smart memory management... or another technique.
CollatzSequence[list_] :=
Module[{memory, tmp, chain, result = Internal`Bag},
memory[1] = False;
memory[n_] := (memory[n] = False; True);
Do[chain = Internal`Bag;
tmp = l;
While[memory[tmp], Internal`StuffBag[chain, tmp];
tmp = If[EvenQ[tmp], tmp/2, 3 tmp + 1];];
Internal`StuffBag[chain, tmp];
Internal`StuffBag[result, chain], {l, list}];
Internal`BagPart[#, All] & /@ Internal`BagPart[result, All]];
myCollatzGraph = Rasterize[
Graph[Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence[Range[400000]]],
GraphLayout -> {"PackingLayout" -> "ClosestPacking"},
VertexStyle -> Opacity[0.2, RGBColor[.9, .1, .1]],
EdgeStyle -> RGBColor[.1, .1, .9],
ImageSize -> 2000],
ImageResolution -> 72];
graphs-and-networks memory
graphs-and-networks memory
edited Dec 3 at 6:26
asked Dec 3 at 3:59
David G. Stork
22.7k22050
22.7k22050
1
Isn't this graph a tree? That is what the Collatz conjecture states, right? I mean, this is a structure that one can try to exploit for rendering. Towards rendering: I am afraid that the ultimate constraint will be the poor 32-bit frontend that can render only graphics that are not too complex (or will crash)... One may circumvent the frontend by directly generating code for a vector graphics, though.
– Henrik Schumacher
Dec 3 at 8:29
One additional potential trick would be to useIGSmoothen
from IGraph/M, which replaced chains of edges as in1->2->3->4->5
... with a single weighted edge1->5
. This would reduce the size of the graph by 5-6 fold, and the weights can be used to maintain the layout. However, I noticed thatIGSmoothen
is currently not fast enough. I will look into optimizing it. It should work much faster on graphs of this size.
– Szabolcs
Dec 3 at 9:18
Unbelievably, the IGSmoothen is slow for one single reason:Delete
is slow. I will fix this for the next release (due tonight).
– Szabolcs
Dec 3 at 9:33
add a comment |
1
Isn't this graph a tree? That is what the Collatz conjecture states, right? I mean, this is a structure that one can try to exploit for rendering. Towards rendering: I am afraid that the ultimate constraint will be the poor 32-bit frontend that can render only graphics that are not too complex (or will crash)... One may circumvent the frontend by directly generating code for a vector graphics, though.
– Henrik Schumacher
Dec 3 at 8:29
One additional potential trick would be to useIGSmoothen
from IGraph/M, which replaced chains of edges as in1->2->3->4->5
... with a single weighted edge1->5
. This would reduce the size of the graph by 5-6 fold, and the weights can be used to maintain the layout. However, I noticed thatIGSmoothen
is currently not fast enough. I will look into optimizing it. It should work much faster on graphs of this size.
– Szabolcs
Dec 3 at 9:18
Unbelievably, the IGSmoothen is slow for one single reason:Delete
is slow. I will fix this for the next release (due tonight).
– Szabolcs
Dec 3 at 9:33
1
1
Isn't this graph a tree? That is what the Collatz conjecture states, right? I mean, this is a structure that one can try to exploit for rendering. Towards rendering: I am afraid that the ultimate constraint will be the poor 32-bit frontend that can render only graphics that are not too complex (or will crash)... One may circumvent the frontend by directly generating code for a vector graphics, though.
– Henrik Schumacher
Dec 3 at 8:29
Isn't this graph a tree? That is what the Collatz conjecture states, right? I mean, this is a structure that one can try to exploit for rendering. Towards rendering: I am afraid that the ultimate constraint will be the poor 32-bit frontend that can render only graphics that are not too complex (or will crash)... One may circumvent the frontend by directly generating code for a vector graphics, though.
– Henrik Schumacher
Dec 3 at 8:29
One additional potential trick would be to use
IGSmoothen
from IGraph/M, which replaced chains of edges as in 1->2->3->4->5
... with a single weighted edge 1->5
. This would reduce the size of the graph by 5-6 fold, and the weights can be used to maintain the layout. However, I noticed that IGSmoothen
is currently not fast enough. I will look into optimizing it. It should work much faster on graphs of this size.– Szabolcs
Dec 3 at 9:18
One additional potential trick would be to use
IGSmoothen
from IGraph/M, which replaced chains of edges as in 1->2->3->4->5
... with a single weighted edge 1->5
. This would reduce the size of the graph by 5-6 fold, and the weights can be used to maintain the layout. However, I noticed that IGSmoothen
is currently not fast enough. I will look into optimizing it. It should work much faster on graphs of this size.– Szabolcs
Dec 3 at 9:18
Unbelievably, the IGSmoothen is slow for one single reason:
Delete
is slow. I will fix this for the next release (due tonight).– Szabolcs
Dec 3 at 9:33
Unbelievably, the IGSmoothen is slow for one single reason:
Delete
is slow. I will fix this for the next release (due tonight).– Szabolcs
Dec 3 at 9:33
add a comment |
1 Answer
1
active
oldest
votes
up vote
12
down vote
I suggest not to use Graph
to render, as it is slow. Instead, compute the vertex coordinates, then render manually.
The rough steps are as follows:
Create an "index graph", i.e. a graph where the vertex names coincide with the vertex indices:
n = 10000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
Compute the vertex coordinates:
coord = GraphEmbedding[g, "SpringElectricalEmbedding"];
This is going to be the most expensive step (computationally), but it will likely not take too much memory. Here, you will need to experiment to find a layout that produces a good enough result and finishes in time. "SpringElectricalLayout"
will be slow, but it has many sub-options which can be used to find a good compromise between computation time and layout quality. I would start experimenting with Tolerance
and MaxIteration
.
Finally, render the graphics:
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Be sure to use GraphicsComplex and merge all lines and all points into a single Line and Point expression. Finally, we rasterize the result so the front end does not need to keep the vector graphics representation in memory.
Example:
n = 400000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
VertexCount[g]
(* 869322 *)
coord = GraphEmbedding[
g, {"SpringElectricalEmbedding", "Tolerance" -> 1,
"MaxIteration" -> 5}]; // AbsoluteTiming
(* {48.8368, Null} *)
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Image@Graphics[{GraphicsComplex[
coord, {Line[List @@@ EdgeList[g]], Red, AbsolutePointSize[1],
Point[coord]}]}, ImageSize -> 1000]
The 32-bit macOS front end with absolutely not run out of memory while rendering graphics of this size, if specified efficiently, as here.
The bottleneck is the computation of coordinates, where you need to find a good compromise between computation time and output quality. This example took just under a minute on my machine. The time would likely stay under 5 minutes if MaxIteration is increased 5-fold (along with a reduction of Tolerance, which also put a limit on how many steps are computed).
Note:
I notice you used GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This makes no sense for a connected graph like this. The PackingLayout controls how connected components are assembled after each one has been laid out. This is a connected graph. It has only one component.
Thanks so very much (+1, at the very least). Let me play around with your solution to see just how large a final graph I can create... whether I can reach the needed $10^8$ vertices. I'm really not worried about time... just size. I'd be happy indeed if this approach can render $10^8$ vertices, even if it took several hours on my hardware. (Alas, I'm off to China in a few hours and won't get back to this for a few days...)
– David G. Stork
Dec 3 at 9:50
@DavidG.Stork I just fixed IGSmoothen and will try to expand on the answer later. It will be useful if you are fine with not explicitly rendering vertices (only edges). For such a large graph it often makes sense to omit vertices.
– Szabolcs
Dec 3 at 9:53
Your code didn't complete for $n=10^6$ after five hours. Let me know if IGSmoothen works. (I may be out of contact for a few days... but am very interested in a working solution to this problem.
– David G. Stork
Dec 3 at 14:31
@DavidG.Stork You need to adjust the MaxIteration to get a good tradeoff between timing and quality. I suggest you also experiment withIGLayoutFruchtermanReingold
from IGraph/M. It has an option to continue the layout from the existing coordinates. So you can run it a bit, take a look, and run more if needed. I did not have time to do that, but here's one with IGSmoothen, today's IGraph/M release needed! dropbox.com/s/lwclsetrm89lkib/collatz-graph.nb?dl=0
– Szabolcs
Dec 3 at 14:45
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
12
down vote
I suggest not to use Graph
to render, as it is slow. Instead, compute the vertex coordinates, then render manually.
The rough steps are as follows:
Create an "index graph", i.e. a graph where the vertex names coincide with the vertex indices:
n = 10000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
Compute the vertex coordinates:
coord = GraphEmbedding[g, "SpringElectricalEmbedding"];
This is going to be the most expensive step (computationally), but it will likely not take too much memory. Here, you will need to experiment to find a layout that produces a good enough result and finishes in time. "SpringElectricalLayout"
will be slow, but it has many sub-options which can be used to find a good compromise between computation time and layout quality. I would start experimenting with Tolerance
and MaxIteration
.
Finally, render the graphics:
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Be sure to use GraphicsComplex and merge all lines and all points into a single Line and Point expression. Finally, we rasterize the result so the front end does not need to keep the vector graphics representation in memory.
Example:
n = 400000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
VertexCount[g]
(* 869322 *)
coord = GraphEmbedding[
g, {"SpringElectricalEmbedding", "Tolerance" -> 1,
"MaxIteration" -> 5}]; // AbsoluteTiming
(* {48.8368, Null} *)
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Image@Graphics[{GraphicsComplex[
coord, {Line[List @@@ EdgeList[g]], Red, AbsolutePointSize[1],
Point[coord]}]}, ImageSize -> 1000]
The 32-bit macOS front end with absolutely not run out of memory while rendering graphics of this size, if specified efficiently, as here.
The bottleneck is the computation of coordinates, where you need to find a good compromise between computation time and output quality. This example took just under a minute on my machine. The time would likely stay under 5 minutes if MaxIteration is increased 5-fold (along with a reduction of Tolerance, which also put a limit on how many steps are computed).
Note:
I notice you used GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This makes no sense for a connected graph like this. The PackingLayout controls how connected components are assembled after each one has been laid out. This is a connected graph. It has only one component.
Thanks so very much (+1, at the very least). Let me play around with your solution to see just how large a final graph I can create... whether I can reach the needed $10^8$ vertices. I'm really not worried about time... just size. I'd be happy indeed if this approach can render $10^8$ vertices, even if it took several hours on my hardware. (Alas, I'm off to China in a few hours and won't get back to this for a few days...)
– David G. Stork
Dec 3 at 9:50
@DavidG.Stork I just fixed IGSmoothen and will try to expand on the answer later. It will be useful if you are fine with not explicitly rendering vertices (only edges). For such a large graph it often makes sense to omit vertices.
– Szabolcs
Dec 3 at 9:53
Your code didn't complete for $n=10^6$ after five hours. Let me know if IGSmoothen works. (I may be out of contact for a few days... but am very interested in a working solution to this problem.
– David G. Stork
Dec 3 at 14:31
@DavidG.Stork You need to adjust the MaxIteration to get a good tradeoff between timing and quality. I suggest you also experiment withIGLayoutFruchtermanReingold
from IGraph/M. It has an option to continue the layout from the existing coordinates. So you can run it a bit, take a look, and run more if needed. I did not have time to do that, but here's one with IGSmoothen, today's IGraph/M release needed! dropbox.com/s/lwclsetrm89lkib/collatz-graph.nb?dl=0
– Szabolcs
Dec 3 at 14:45
add a comment |
up vote
12
down vote
I suggest not to use Graph
to render, as it is slow. Instead, compute the vertex coordinates, then render manually.
The rough steps are as follows:
Create an "index graph", i.e. a graph where the vertex names coincide with the vertex indices:
n = 10000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
Compute the vertex coordinates:
coord = GraphEmbedding[g, "SpringElectricalEmbedding"];
This is going to be the most expensive step (computationally), but it will likely not take too much memory. Here, you will need to experiment to find a layout that produces a good enough result and finishes in time. "SpringElectricalLayout"
will be slow, but it has many sub-options which can be used to find a good compromise between computation time and layout quality. I would start experimenting with Tolerance
and MaxIteration
.
Finally, render the graphics:
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Be sure to use GraphicsComplex and merge all lines and all points into a single Line and Point expression. Finally, we rasterize the result so the front end does not need to keep the vector graphics representation in memory.
Example:
n = 400000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
VertexCount[g]
(* 869322 *)
coord = GraphEmbedding[
g, {"SpringElectricalEmbedding", "Tolerance" -> 1,
"MaxIteration" -> 5}]; // AbsoluteTiming
(* {48.8368, Null} *)
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Image@Graphics[{GraphicsComplex[
coord, {Line[List @@@ EdgeList[g]], Red, AbsolutePointSize[1],
Point[coord]}]}, ImageSize -> 1000]
The 32-bit macOS front end with absolutely not run out of memory while rendering graphics of this size, if specified efficiently, as here.
The bottleneck is the computation of coordinates, where you need to find a good compromise between computation time and output quality. This example took just under a minute on my machine. The time would likely stay under 5 minutes if MaxIteration is increased 5-fold (along with a reduction of Tolerance, which also put a limit on how many steps are computed).
Note:
I notice you used GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This makes no sense for a connected graph like this. The PackingLayout controls how connected components are assembled after each one has been laid out. This is a connected graph. It has only one component.
Thanks so very much (+1, at the very least). Let me play around with your solution to see just how large a final graph I can create... whether I can reach the needed $10^8$ vertices. I'm really not worried about time... just size. I'd be happy indeed if this approach can render $10^8$ vertices, even if it took several hours on my hardware. (Alas, I'm off to China in a few hours and won't get back to this for a few days...)
– David G. Stork
Dec 3 at 9:50
@DavidG.Stork I just fixed IGSmoothen and will try to expand on the answer later. It will be useful if you are fine with not explicitly rendering vertices (only edges). For such a large graph it often makes sense to omit vertices.
– Szabolcs
Dec 3 at 9:53
Your code didn't complete for $n=10^6$ after five hours. Let me know if IGSmoothen works. (I may be out of contact for a few days... but am very interested in a working solution to this problem.
– David G. Stork
Dec 3 at 14:31
@DavidG.Stork You need to adjust the MaxIteration to get a good tradeoff between timing and quality. I suggest you also experiment withIGLayoutFruchtermanReingold
from IGraph/M. It has an option to continue the layout from the existing coordinates. So you can run it a bit, take a look, and run more if needed. I did not have time to do that, but here's one with IGSmoothen, today's IGraph/M release needed! dropbox.com/s/lwclsetrm89lkib/collatz-graph.nb?dl=0
– Szabolcs
Dec 3 at 14:45
add a comment |
up vote
12
down vote
up vote
12
down vote
I suggest not to use Graph
to render, as it is slow. Instead, compute the vertex coordinates, then render manually.
The rough steps are as follows:
Create an "index graph", i.e. a graph where the vertex names coincide with the vertex indices:
n = 10000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
Compute the vertex coordinates:
coord = GraphEmbedding[g, "SpringElectricalEmbedding"];
This is going to be the most expensive step (computationally), but it will likely not take too much memory. Here, you will need to experiment to find a layout that produces a good enough result and finishes in time. "SpringElectricalLayout"
will be slow, but it has many sub-options which can be used to find a good compromise between computation time and layout quality. I would start experimenting with Tolerance
and MaxIteration
.
Finally, render the graphics:
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Be sure to use GraphicsComplex and merge all lines and all points into a single Line and Point expression. Finally, we rasterize the result so the front end does not need to keep the vector graphics representation in memory.
Example:
n = 400000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
VertexCount[g]
(* 869322 *)
coord = GraphEmbedding[
g, {"SpringElectricalEmbedding", "Tolerance" -> 1,
"MaxIteration" -> 5}]; // AbsoluteTiming
(* {48.8368, Null} *)
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Image@Graphics[{GraphicsComplex[
coord, {Line[List @@@ EdgeList[g]], Red, AbsolutePointSize[1],
Point[coord]}]}, ImageSize -> 1000]
The 32-bit macOS front end with absolutely not run out of memory while rendering graphics of this size, if specified efficiently, as here.
The bottleneck is the computation of coordinates, where you need to find a good compromise between computation time and output quality. This example took just under a minute on my machine. The time would likely stay under 5 minutes if MaxIteration is increased 5-fold (along with a reduction of Tolerance, which also put a limit on how many steps are computed).
Note:
I notice you used GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This makes no sense for a connected graph like this. The PackingLayout controls how connected components are assembled after each one has been laid out. This is a connected graph. It has only one component.
I suggest not to use Graph
to render, as it is slow. Instead, compute the vertex coordinates, then render manually.
The rough steps are as follows:
Create an "index graph", i.e. a graph where the vertex names coincide with the vertex indices:
n = 10000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
Compute the vertex coordinates:
coord = GraphEmbedding[g, "SpringElectricalEmbedding"];
This is going to be the most expensive step (computationally), but it will likely not take too much memory. Here, you will need to experiment to find a layout that produces a good enough result and finishes in time. "SpringElectricalLayout"
will be slow, but it has many sub-options which can be used to find a good compromise between computation time and layout quality. I would start experimenting with Tolerance
and MaxIteration
.
Finally, render the graphics:
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Be sure to use GraphicsComplex and merge all lines and all points into a single Line and Point expression. Finally, we rasterize the result so the front end does not need to keep the vector graphics representation in memory.
Example:
n = 400000;
g = IndexGraph@Graph[
Flatten[(Rule @@@ Partition[#, 2, 1]) & /@
CollatzSequence@Range[n]]
];
VertexCount[g]
(* 869322 *)
coord = GraphEmbedding[
g, {"SpringElectricalEmbedding", "Tolerance" -> 1,
"MaxIteration" -> 5}]; // AbsoluteTiming
(* {48.8368, Null} *)
Image@Graphics@GraphicsComplex[coord, Line[List @@@ EdgeList[g]]]
Image@Graphics[{GraphicsComplex[
coord, {Line[List @@@ EdgeList[g]], Red, AbsolutePointSize[1],
Point[coord]}]}, ImageSize -> 1000]
The 32-bit macOS front end with absolutely not run out of memory while rendering graphics of this size, if specified efficiently, as here.
The bottleneck is the computation of coordinates, where you need to find a good compromise between computation time and output quality. This example took just under a minute on my machine. The time would likely stay under 5 minutes if MaxIteration is increased 5-fold (along with a reduction of Tolerance, which also put a limit on how many steps are computed).
Note:
I notice you used GraphLayout -> {"PackingLayout" -> "ClosestPacking"}
. This makes no sense for a connected graph like this. The PackingLayout controls how connected components are assembled after each one has been laid out. This is a connected graph. It has only one component.
edited Dec 3 at 9:27
b3m2a1
25.8k256151
25.8k256151
answered Dec 3 at 9:04
Szabolcs
158k13430924
158k13430924
Thanks so very much (+1, at the very least). Let me play around with your solution to see just how large a final graph I can create... whether I can reach the needed $10^8$ vertices. I'm really not worried about time... just size. I'd be happy indeed if this approach can render $10^8$ vertices, even if it took several hours on my hardware. (Alas, I'm off to China in a few hours and won't get back to this for a few days...)
– David G. Stork
Dec 3 at 9:50
@DavidG.Stork I just fixed IGSmoothen and will try to expand on the answer later. It will be useful if you are fine with not explicitly rendering vertices (only edges). For such a large graph it often makes sense to omit vertices.
– Szabolcs
Dec 3 at 9:53
Your code didn't complete for $n=10^6$ after five hours. Let me know if IGSmoothen works. (I may be out of contact for a few days... but am very interested in a working solution to this problem.
– David G. Stork
Dec 3 at 14:31
@DavidG.Stork You need to adjust the MaxIteration to get a good tradeoff between timing and quality. I suggest you also experiment withIGLayoutFruchtermanReingold
from IGraph/M. It has an option to continue the layout from the existing coordinates. So you can run it a bit, take a look, and run more if needed. I did not have time to do that, but here's one with IGSmoothen, today's IGraph/M release needed! dropbox.com/s/lwclsetrm89lkib/collatz-graph.nb?dl=0
– Szabolcs
Dec 3 at 14:45
add a comment |
Thanks so very much (+1, at the very least). Let me play around with your solution to see just how large a final graph I can create... whether I can reach the needed $10^8$ vertices. I'm really not worried about time... just size. I'd be happy indeed if this approach can render $10^8$ vertices, even if it took several hours on my hardware. (Alas, I'm off to China in a few hours and won't get back to this for a few days...)
– David G. Stork
Dec 3 at 9:50
@DavidG.Stork I just fixed IGSmoothen and will try to expand on the answer later. It will be useful if you are fine with not explicitly rendering vertices (only edges). For such a large graph it often makes sense to omit vertices.
– Szabolcs
Dec 3 at 9:53
Your code didn't complete for $n=10^6$ after five hours. Let me know if IGSmoothen works. (I may be out of contact for a few days... but am very interested in a working solution to this problem.
– David G. Stork
Dec 3 at 14:31
@DavidG.Stork You need to adjust the MaxIteration to get a good tradeoff between timing and quality. I suggest you also experiment withIGLayoutFruchtermanReingold
from IGraph/M. It has an option to continue the layout from the existing coordinates. So you can run it a bit, take a look, and run more if needed. I did not have time to do that, but here's one with IGSmoothen, today's IGraph/M release needed! dropbox.com/s/lwclsetrm89lkib/collatz-graph.nb?dl=0
– Szabolcs
Dec 3 at 14:45
Thanks so very much (+1, at the very least). Let me play around with your solution to see just how large a final graph I can create... whether I can reach the needed $10^8$ vertices. I'm really not worried about time... just size. I'd be happy indeed if this approach can render $10^8$ vertices, even if it took several hours on my hardware. (Alas, I'm off to China in a few hours and won't get back to this for a few days...)
– David G. Stork
Dec 3 at 9:50
Thanks so very much (+1, at the very least). Let me play around with your solution to see just how large a final graph I can create... whether I can reach the needed $10^8$ vertices. I'm really not worried about time... just size. I'd be happy indeed if this approach can render $10^8$ vertices, even if it took several hours on my hardware. (Alas, I'm off to China in a few hours and won't get back to this for a few days...)
– David G. Stork
Dec 3 at 9:50
@DavidG.Stork I just fixed IGSmoothen and will try to expand on the answer later. It will be useful if you are fine with not explicitly rendering vertices (only edges). For such a large graph it often makes sense to omit vertices.
– Szabolcs
Dec 3 at 9:53
@DavidG.Stork I just fixed IGSmoothen and will try to expand on the answer later. It will be useful if you are fine with not explicitly rendering vertices (only edges). For such a large graph it often makes sense to omit vertices.
– Szabolcs
Dec 3 at 9:53
Your code didn't complete for $n=10^6$ after five hours. Let me know if IGSmoothen works. (I may be out of contact for a few days... but am very interested in a working solution to this problem.
– David G. Stork
Dec 3 at 14:31
Your code didn't complete for $n=10^6$ after five hours. Let me know if IGSmoothen works. (I may be out of contact for a few days... but am very interested in a working solution to this problem.
– David G. Stork
Dec 3 at 14:31
@DavidG.Stork You need to adjust the MaxIteration to get a good tradeoff between timing and quality. I suggest you also experiment with
IGLayoutFruchtermanReingold
from IGraph/M. It has an option to continue the layout from the existing coordinates. So you can run it a bit, take a look, and run more if needed. I did not have time to do that, but here's one with IGSmoothen, today's IGraph/M release needed! dropbox.com/s/lwclsetrm89lkib/collatz-graph.nb?dl=0– Szabolcs
Dec 3 at 14:45
@DavidG.Stork You need to adjust the MaxIteration to get a good tradeoff between timing and quality. I suggest you also experiment with
IGLayoutFruchtermanReingold
from IGraph/M. It has an option to continue the layout from the existing coordinates. So you can run it a bit, take a look, and run more if needed. I did not have time to do that, but here's one with IGSmoothen, today's IGraph/M release needed! dropbox.com/s/lwclsetrm89lkib/collatz-graph.nb?dl=0– Szabolcs
Dec 3 at 14:45
add a comment |
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f187195%2fgenerating-and-rendering-extremely-large-graphs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Isn't this graph a tree? That is what the Collatz conjecture states, right? I mean, this is a structure that one can try to exploit for rendering. Towards rendering: I am afraid that the ultimate constraint will be the poor 32-bit frontend that can render only graphics that are not too complex (or will crash)... One may circumvent the frontend by directly generating code for a vector graphics, though.
– Henrik Schumacher
Dec 3 at 8:29
One additional potential trick would be to use
IGSmoothen
from IGraph/M, which replaced chains of edges as in1->2->3->4->5
... with a single weighted edge1->5
. This would reduce the size of the graph by 5-6 fold, and the weights can be used to maintain the layout. However, I noticed thatIGSmoothen
is currently not fast enough. I will look into optimizing it. It should work much faster on graphs of this size.– Szabolcs
Dec 3 at 9:18
Unbelievably, the IGSmoothen is slow for one single reason:
Delete
is slow. I will fix this for the next release (due tonight).– Szabolcs
Dec 3 at 9:33