Testing a Collatz Conjecture Conjecture (Python)












3












$begingroup$


I had an intuition that the maximal trajectory lengths for collatz conjecture numbers would be produced by numbers of the form (2**n)-1 so I wrote a program to test it (I was wrong). I thought I would put it up for review to catch any coding sins I committed, and see how others would improve my code.



Orginal Code



'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''

import math, pprint

f = open("CollatzConjecture.txt")
text = f.read()

def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None


def conjecture(text):
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return pprint.pformat(result)

print(conjecture(text))


"CollatzConjecture.txt" is a text file containing number trajectoryLength on separate lines. I wanted to find the number corresponding to the maximum trajectory length between successive powers of 2 in order to test my hypothesis.










share|improve this question











$endgroup$












  • $begingroup$
    Can you add an example input textfile?
    $endgroup$
    – Ludisposed
    yesterday








  • 1




    $begingroup$
    Where is the key function actually used?
    $endgroup$
    – Graipher
    yesterday










  • $begingroup$
    @Ludisposed I'll link the actual text file I used: oeis.org/A006577/b006577.txt
    $endgroup$
    – Tobi Alafin
    yesterday








  • 1




    $begingroup$
    Now that you have improved code it might make sense to ask a new question with the updated code, linking back to this one, as stated in the link provided by Mast.
    $endgroup$
    – Graipher
    yesterday








  • 3




    $begingroup$
    @TobiAlafin: Hm, what a nice edit mess :) I'll let you figure it out, but if I were you, I would just roll it back to your first first revision (again) and then forget about this question. Instead spend the effort on making the follow-up question look good.
    $endgroup$
    – Graipher
    yesterday


















3












$begingroup$


I had an intuition that the maximal trajectory lengths for collatz conjecture numbers would be produced by numbers of the form (2**n)-1 so I wrote a program to test it (I was wrong). I thought I would put it up for review to catch any coding sins I committed, and see how others would improve my code.



Orginal Code



'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''

import math, pprint

f = open("CollatzConjecture.txt")
text = f.read()

def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None


def conjecture(text):
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return pprint.pformat(result)

print(conjecture(text))


"CollatzConjecture.txt" is a text file containing number trajectoryLength on separate lines. I wanted to find the number corresponding to the maximum trajectory length between successive powers of 2 in order to test my hypothesis.










share|improve this question











$endgroup$












  • $begingroup$
    Can you add an example input textfile?
    $endgroup$
    – Ludisposed
    yesterday








  • 1




    $begingroup$
    Where is the key function actually used?
    $endgroup$
    – Graipher
    yesterday










  • $begingroup$
    @Ludisposed I'll link the actual text file I used: oeis.org/A006577/b006577.txt
    $endgroup$
    – Tobi Alafin
    yesterday








  • 1




    $begingroup$
    Now that you have improved code it might make sense to ask a new question with the updated code, linking back to this one, as stated in the link provided by Mast.
    $endgroup$
    – Graipher
    yesterday








  • 3




    $begingroup$
    @TobiAlafin: Hm, what a nice edit mess :) I'll let you figure it out, but if I were you, I would just roll it back to your first first revision (again) and then forget about this question. Instead spend the effort on making the follow-up question look good.
    $endgroup$
    – Graipher
    yesterday
















3












3








3





$begingroup$


I had an intuition that the maximal trajectory lengths for collatz conjecture numbers would be produced by numbers of the form (2**n)-1 so I wrote a program to test it (I was wrong). I thought I would put it up for review to catch any coding sins I committed, and see how others would improve my code.



Orginal Code



'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''

import math, pprint

f = open("CollatzConjecture.txt")
text = f.read()

def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None


def conjecture(text):
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return pprint.pformat(result)

print(conjecture(text))


"CollatzConjecture.txt" is a text file containing number trajectoryLength on separate lines. I wanted to find the number corresponding to the maximum trajectory length between successive powers of 2 in order to test my hypothesis.










share|improve this question











$endgroup$




I had an intuition that the maximal trajectory lengths for collatz conjecture numbers would be produced by numbers of the form (2**n)-1 so I wrote a program to test it (I was wrong). I thought I would put it up for review to catch any coding sins I committed, and see how others would improve my code.



Orginal Code



'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''

import math, pprint

f = open("CollatzConjecture.txt")
text = f.read()

def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None


def conjecture(text):
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return pprint.pformat(result)

print(conjecture(text))


"CollatzConjecture.txt" is a text file containing number trajectoryLength on separate lines. I wanted to find the number corresponding to the maximum trajectory length between successive powers of 2 in order to test my hypothesis.







python beginner collatz-sequence






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday







Tobi Alafin

















asked yesterday









Tobi AlafinTobi Alafin

23017




23017












  • $begingroup$
    Can you add an example input textfile?
    $endgroup$
    – Ludisposed
    yesterday








  • 1




    $begingroup$
    Where is the key function actually used?
    $endgroup$
    – Graipher
    yesterday










  • $begingroup$
    @Ludisposed I'll link the actual text file I used: oeis.org/A006577/b006577.txt
    $endgroup$
    – Tobi Alafin
    yesterday








  • 1




    $begingroup$
    Now that you have improved code it might make sense to ask a new question with the updated code, linking back to this one, as stated in the link provided by Mast.
    $endgroup$
    – Graipher
    yesterday








  • 3




    $begingroup$
    @TobiAlafin: Hm, what a nice edit mess :) I'll let you figure it out, but if I were you, I would just roll it back to your first first revision (again) and then forget about this question. Instead spend the effort on making the follow-up question look good.
    $endgroup$
    – Graipher
    yesterday




















  • $begingroup$
    Can you add an example input textfile?
    $endgroup$
    – Ludisposed
    yesterday








  • 1




    $begingroup$
    Where is the key function actually used?
    $endgroup$
    – Graipher
    yesterday










  • $begingroup$
    @Ludisposed I'll link the actual text file I used: oeis.org/A006577/b006577.txt
    $endgroup$
    – Tobi Alafin
    yesterday








  • 1




    $begingroup$
    Now that you have improved code it might make sense to ask a new question with the updated code, linking back to this one, as stated in the link provided by Mast.
    $endgroup$
    – Graipher
    yesterday








  • 3




    $begingroup$
    @TobiAlafin: Hm, what a nice edit mess :) I'll let you figure it out, but if I were you, I would just roll it back to your first first revision (again) and then forget about this question. Instead spend the effort on making the follow-up question look good.
    $endgroup$
    – Graipher
    yesterday


















$begingroup$
Can you add an example input textfile?
$endgroup$
– Ludisposed
yesterday






$begingroup$
Can you add an example input textfile?
$endgroup$
– Ludisposed
yesterday






1




1




$begingroup$
Where is the key function actually used?
$endgroup$
– Graipher
yesterday




$begingroup$
Where is the key function actually used?
$endgroup$
– Graipher
yesterday












$begingroup$
@Ludisposed I'll link the actual text file I used: oeis.org/A006577/b006577.txt
$endgroup$
– Tobi Alafin
yesterday






$begingroup$
@Ludisposed I'll link the actual text file I used: oeis.org/A006577/b006577.txt
$endgroup$
– Tobi Alafin
yesterday






1




1




$begingroup$
Now that you have improved code it might make sense to ask a new question with the updated code, linking back to this one, as stated in the link provided by Mast.
$endgroup$
– Graipher
yesterday






$begingroup$
Now that you have improved code it might make sense to ask a new question with the updated code, linking back to this one, as stated in the link provided by Mast.
$endgroup$
– Graipher
yesterday






3




3




$begingroup$
@TobiAlafin: Hm, what a nice edit mess :) I'll let you figure it out, but if I were you, I would just roll it back to your first first revision (again) and then forget about this question. Instead spend the effort on making the follow-up question look good.
$endgroup$
– Graipher
yesterday






$begingroup$
@TobiAlafin: Hm, what a nice edit mess :) I'll let you figure it out, but if I were you, I would just roll it back to your first first revision (again) and then forget about this question. Instead spend the effort on making the follow-up question look good.
$endgroup$
– Graipher
yesterday












4 Answers
4






active

oldest

votes


















9












$begingroup$


def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None



This is really inefficient and not how dictionaries should be used.




  • The point of a dict is O(1) lookup, this is essentially O(n) and it doesn't make sense

  • You should remove your blockcomments, and create a nice docstring. That will make the purpose of the code clearer

  • You don't close the file, a nice addition is using the with statement to automatically open/close files PEP343






share|improve this answer











$endgroup$













  • $begingroup$
    key(value, dct) gets the key corresponding to a value. dict.get() gets the value corresponding to a key. I'll edit my code to make it clearer.
    $endgroup$
    – Tobi Alafin
    yesterday






  • 4




    $begingroup$
    @TobiAlafin: If you need to map both ways (A -> B and B -> A), use two dictionaries to get O(1) lookups in each direction. A linear search O(n) in one direction vs. an O(1) in the other, doesn't make sense, unless that direction is almost never used, and you can't afford to double the storage cost by keeping a reverse dictionary. Or there are probably custom data structures that can index on both sets of data without duplicating the storage... like what a relational database would use for a table with 2 columns, and an index on each column. For short keys two dicts is totally fine.
    $endgroup$
    – Peter Cordes
    yesterday












  • $begingroup$
    Peter Cordes thank you. I had not thought of that, I'll apply that in the future.
    $endgroup$
    – Tobi Alafin
    yesterday



















4












$begingroup$

First of all, thumbs up on including a documentation block at the beginning.



When trying to run the segment, I get



Traceback (most recent call last):
File "conjecture.py", line 28, in <module>
print(conjecture(text))
File "conjecture.py", line 19, in conjecture
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
File "conjecture.py", line 19, in <dictcomp>
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
IndexError: list index out of range


Turns out that when I saved the output of the link, there were empty trailing lines. After fixing that, I could get it to run.



Furthermore, I see several issues with this implementation:




  1. The conjecture function is doing a lot of the work, with several lines that are quite filled with operations, meaning it's hard to find out what it's doing.

  2. A dictionary can contain a value multiple times, and in fact does. For instance, both 5 and 32 have a length of 5 (5 -> 16 -> ...) and (32 -> 16 -> ...). For longer lengths, there will be more ways in which it can be obtained.

  3. Floating point arithmetic in a fully integral problem seems like an issue.

  4. The file is opened, and then not immediately closed. For this it's not really an issue, but in general one should use a contextmanager to clean up used resources as quickly as possible.

  5. The semantics of the output isn't really clear to me, but I'll probably figure out a bit during picking it apart.

  6. It doesn't use a __main__ block, which would be beneficial for re-usable code.

  7. The conjecture function returns a string, instead of a dictionary.


So let's start with your implementation:



'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''

import math, pprint

f = open("CollatzConjecture.txt")
text = f.read()

def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None


def conjecture(text):
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return pprint.pformat(result)

print(conjecture(text))


Use a if __name__ == '__main__' segment.



One of the best places to start with such a script (especially if you want to make it reusable!) is to place the "main" code (file-mangling) inside a main block:



... function definitions ...

if __name__ == '__main__':
f = open("CollatzConjecture.txt")
text = f.read()
print(conjecture(text))


Move parsing the trajectories out of conjecture.



Your conjecture has nothing to do with the piece of text, but about the length of the trajectories. We shouldn't care how these lengths are generated, calculated or parsed inside the conjecture function. But the current implementation does.



By moving out the text to traj transformation, we make conjecture more general.



...

def conjecture(traj):
result = {}
bound = math.floor(math.log(len(traj),2))
...

if __name__ == '__main__':
f = open("CollatzConjecture.txt")
text = f.read()
traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
print(conjecture(traj))


And we can then split the parsing into a separate function, where we use a context manager.



...

def parse_lengths(filename):
with open(filename) as fp:
#Processes the text file into a dict mapping numbers to their trajectory length.
return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}

...

if __name__ == '__main__':
traj = parse_lengths("CollatzConjecture.txt")
print(conjecture(traj))


And similarly, move the pformat out of conjecture:



def conjecture(text):
result = {}
...
return result

if __name__ == '__main__':
traj = parse_lengths("CollatzConjecture.txt")
pprint.pprint(conjecture(traj))


Simplify conjecture.



Now that we're left with a conjecture method that does only things related to the conjecture, we can look at it a bit more:



def conjecture(traj):
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return result


Starting with this expression: int((math.log(st,2)-1)**2), what does it even do?! First it takes the 2-logarithm, substracts 1, and then squares it. At first I suspected that you wanted the previous power of two (and tested it). But, what actually happens is that 64 (26) becomes 25 (52), and also 2048 (211) becomes 100 (102). My basic guess is that this is incorrect, and you mean 64 => 32, 128 => 64, and so forth. Assuming that's what you want, you can just divide by two.



def conjecture(traj):
result = {}
bound = math.floor(math.log(len(traj),2))
traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
slce = list(traj2.items())[st // 2:st] #Slices "traj2" into powers of 2.
result[st] = key(max([n[1] for n in slce]), traj2)
return result


Surprisingly, though, this does not alter the output.



Next up is the fact that I see slices being taken of dictionaries (after casting to a list). In older Python versions (before 3.6) that might cause surprising results, as dictionaries are not necessarily ordered by value, though in this case it does turn out to be that way it seems. It's better to explicitly filter out the values that you want.



Final:



In the end, I found myself settling on this.



'''
* I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
* "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
'''

import pprint

def key(value, dct):
for key in dct.keys():
if dct[key] == value:
return key
return None


def parse_lengths(filename):
with open(filename) as fp:
#Processes the text file into a dict mapping numbers to their trajectory length.
return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}


def conjecture(traj):
result = {}
traj_size = len(traj)
lower, higher = 1, 2
while True:
if higher > traj_size:
# We don't have all the data for this range available.
break

max_in_slice = max(length for num, length in traj.items() if lower <= num <= higher)

result[higher] = key(max_in_slice, traj)

# Prepare data for next iteration.
lower, higher = lower * 2, higher * 2

return result

if __name__ == '__main__':
traj = parse_lengths('CollatzConjecture.txt')
pprint.pprint(conjecture(traj))


I'm also not quite happy with this yet, as the "key" function is still bothering me as well (we could iterate over .items() instead of .keys() to simplify).



I hope this at least helps a bit.






share|improve this answer











$endgroup$





















    3












    $begingroup$

    It's desirable to try to break one's code into small, more reusable segments. For example, it would be preferable (in my opinion) to create a small function to open the text file, extract its contents, and return the dict of trajectories you desire:



    def load_trajectories(file_name):
    traj = {}
    with open(file_name, 'r') as f:
    for line in f.readlines():
    S = line.split(' ')
    num, traj_len = int(S[0]), int(S[1])
    traj[num] = traj_len
    return traj


    Note here that I used the context manager with open(file_name, 'r') as f:, since in Python we need to first open a file, do things with it, and then close it. The context manager handles the opening and closing of the file for us, and within it we can access the file.



    From my searching there doesn't seem to be a universally accepted way to find the "inverse" of a value in a dictionary, but I like yours for the most part. As one change, I would iterate over the key:value pairs to simplify things a bit:



    def find_key(target_value, dct):
    for key, val in dct.items():
    if val == target_value:
    return key
    raise Exception('Value not found!')


    With this, it really only remains to clean up the conjecture function. Here, we can pass the traj dictionary as an argument, as well as the (un-exponentiated) bound.



    In the parlance of your code, note that st is always of the form 2**x, so when we compute math.log(st, 2), it always evaluates to x, so one line of your code reads (equivalently)



    slce = list(traj2.items())[(x-1)**2:st]


    which isn't the 'slicing into power of 2' that you desire. Moreover, it's not (necessarily) guaranteed that traj2.items() will be turned into a list in the way you desire, so it'll be better to be more explicit:



    slce = [key, val for key, val in traj.items() if key in range(2**(x-1), 2**x)]


    Together, along with splitting our imports onto different lines, adding a if __name__ == '__main__': guard, and some other minor reorganization yields the following code:



    import math
    import pprint

    def find_key(target_value, dct):
    for key, val in dct.items():
    if val == target_value:
    return key
    raise Exception('Value not found!')

    def load_trajectories(file_name):
    traj = {}
    with open(file_name, 'r') as f:
    for line in f.readlines():
    S = line.split(' ')
    num, traj_len = int(S[0]), int(S[1])
    traj[num] = traj_len
    return traj

    def conjecture(traj):
    """Checks the conjecture that the maximum 'Collatz length' among the integers in [2**n, 2**(n+1)) is 2**n - 1.

    The conjecture is false.
    """
    bound = math.floor(math.log(len(traj),2))
    exp_bound = 2**bound

    traj = {key:val for key, val in traj.items() if key < exp_bound} # Only include numbers of interest.
    result = {}

    for n in range(1,bound+1):
    exp_n = 2**n
    slce = [key, val for key, val in traj.items() if key in range(2**(n-1), exp_n)]
    result[exp_n] = find_key(max([val for key, val in slce]), traj)

    return pprint(pformat(result))

    if __name__ == "__main__":

    file_name = "CollatzConjecture.txt"
    print(conjecture(file_name))





    share|improve this answer









    $endgroup$













    • $begingroup$
      "I would iterate over the key:value pairs to simplify things a bit": it's not just a question of simplifying. It's also more efficient, because advancing the iterator doesn't involve any search so is as cheap as possible, whereas the lookup potentially has to resolve hash collisions.
      $endgroup$
      – Peter Taylor
      21 hours ago



















    1












    $begingroup$

    The existing answers cover most of the issues that I spotted. There's just one thing that I think it's useful to add:




        for st in (2**x for x in range(1, bound+1)):        #Generator for powers of 2.
    slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
    result[st] = key(max([n[1] for n in slce]), traj2)



    This converts the dict into a list every time through the loop. The suggestions, which don't rely on undocumented behaviour, instead filter the dict each time through the loop. But you can instead invert the loop:



        best = [0] * (bound + 1)
    for key, value in traj2.items():
    x = int(math.log(key, 2)) + 1
    if value > best[x] or (value == best[x] and key < result[2**x]):
    best[x] = value
    result[2**x] = key


    This gives straightforward linear running time and allows you explicit control over tie-breaking. (I'm not sure whether I've got the tie-breaking correct for your conjecture, but you can easily fix that).






    share|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "196"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212377%2ftesting-a-collatz-conjecture-conjecture-python%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      9












      $begingroup$


      def key(value, dct):
      for key in dct.keys():
      if dct[key] == value:
      return key
      return None



      This is really inefficient and not how dictionaries should be used.




      • The point of a dict is O(1) lookup, this is essentially O(n) and it doesn't make sense

      • You should remove your blockcomments, and create a nice docstring. That will make the purpose of the code clearer

      • You don't close the file, a nice addition is using the with statement to automatically open/close files PEP343






      share|improve this answer











      $endgroup$













      • $begingroup$
        key(value, dct) gets the key corresponding to a value. dict.get() gets the value corresponding to a key. I'll edit my code to make it clearer.
        $endgroup$
        – Tobi Alafin
        yesterday






      • 4




        $begingroup$
        @TobiAlafin: If you need to map both ways (A -> B and B -> A), use two dictionaries to get O(1) lookups in each direction. A linear search O(n) in one direction vs. an O(1) in the other, doesn't make sense, unless that direction is almost never used, and you can't afford to double the storage cost by keeping a reverse dictionary. Or there are probably custom data structures that can index on both sets of data without duplicating the storage... like what a relational database would use for a table with 2 columns, and an index on each column. For short keys two dicts is totally fine.
        $endgroup$
        – Peter Cordes
        yesterday












      • $begingroup$
        Peter Cordes thank you. I had not thought of that, I'll apply that in the future.
        $endgroup$
        – Tobi Alafin
        yesterday
















      9












      $begingroup$


      def key(value, dct):
      for key in dct.keys():
      if dct[key] == value:
      return key
      return None



      This is really inefficient and not how dictionaries should be used.




      • The point of a dict is O(1) lookup, this is essentially O(n) and it doesn't make sense

      • You should remove your blockcomments, and create a nice docstring. That will make the purpose of the code clearer

      • You don't close the file, a nice addition is using the with statement to automatically open/close files PEP343






      share|improve this answer











      $endgroup$













      • $begingroup$
        key(value, dct) gets the key corresponding to a value. dict.get() gets the value corresponding to a key. I'll edit my code to make it clearer.
        $endgroup$
        – Tobi Alafin
        yesterday






      • 4




        $begingroup$
        @TobiAlafin: If you need to map both ways (A -> B and B -> A), use two dictionaries to get O(1) lookups in each direction. A linear search O(n) in one direction vs. an O(1) in the other, doesn't make sense, unless that direction is almost never used, and you can't afford to double the storage cost by keeping a reverse dictionary. Or there are probably custom data structures that can index on both sets of data without duplicating the storage... like what a relational database would use for a table with 2 columns, and an index on each column. For short keys two dicts is totally fine.
        $endgroup$
        – Peter Cordes
        yesterday












      • $begingroup$
        Peter Cordes thank you. I had not thought of that, I'll apply that in the future.
        $endgroup$
        – Tobi Alafin
        yesterday














      9












      9








      9





      $begingroup$


      def key(value, dct):
      for key in dct.keys():
      if dct[key] == value:
      return key
      return None



      This is really inefficient and not how dictionaries should be used.




      • The point of a dict is O(1) lookup, this is essentially O(n) and it doesn't make sense

      • You should remove your blockcomments, and create a nice docstring. That will make the purpose of the code clearer

      • You don't close the file, a nice addition is using the with statement to automatically open/close files PEP343






      share|improve this answer











      $endgroup$




      def key(value, dct):
      for key in dct.keys():
      if dct[key] == value:
      return key
      return None



      This is really inefficient and not how dictionaries should be used.




      • The point of a dict is O(1) lookup, this is essentially O(n) and it doesn't make sense

      • You should remove your blockcomments, and create a nice docstring. That will make the purpose of the code clearer

      • You don't close the file, a nice addition is using the with statement to automatically open/close files PEP343







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited yesterday

























      answered yesterday









      LudisposedLudisposed

      7,74221959




      7,74221959












      • $begingroup$
        key(value, dct) gets the key corresponding to a value. dict.get() gets the value corresponding to a key. I'll edit my code to make it clearer.
        $endgroup$
        – Tobi Alafin
        yesterday






      • 4




        $begingroup$
        @TobiAlafin: If you need to map both ways (A -> B and B -> A), use two dictionaries to get O(1) lookups in each direction. A linear search O(n) in one direction vs. an O(1) in the other, doesn't make sense, unless that direction is almost never used, and you can't afford to double the storage cost by keeping a reverse dictionary. Or there are probably custom data structures that can index on both sets of data without duplicating the storage... like what a relational database would use for a table with 2 columns, and an index on each column. For short keys two dicts is totally fine.
        $endgroup$
        – Peter Cordes
        yesterday












      • $begingroup$
        Peter Cordes thank you. I had not thought of that, I'll apply that in the future.
        $endgroup$
        – Tobi Alafin
        yesterday


















      • $begingroup$
        key(value, dct) gets the key corresponding to a value. dict.get() gets the value corresponding to a key. I'll edit my code to make it clearer.
        $endgroup$
        – Tobi Alafin
        yesterday






      • 4




        $begingroup$
        @TobiAlafin: If you need to map both ways (A -> B and B -> A), use two dictionaries to get O(1) lookups in each direction. A linear search O(n) in one direction vs. an O(1) in the other, doesn't make sense, unless that direction is almost never used, and you can't afford to double the storage cost by keeping a reverse dictionary. Or there are probably custom data structures that can index on both sets of data without duplicating the storage... like what a relational database would use for a table with 2 columns, and an index on each column. For short keys two dicts is totally fine.
        $endgroup$
        – Peter Cordes
        yesterday












      • $begingroup$
        Peter Cordes thank you. I had not thought of that, I'll apply that in the future.
        $endgroup$
        – Tobi Alafin
        yesterday
















      $begingroup$
      key(value, dct) gets the key corresponding to a value. dict.get() gets the value corresponding to a key. I'll edit my code to make it clearer.
      $endgroup$
      – Tobi Alafin
      yesterday




      $begingroup$
      key(value, dct) gets the key corresponding to a value. dict.get() gets the value corresponding to a key. I'll edit my code to make it clearer.
      $endgroup$
      – Tobi Alafin
      yesterday




      4




      4




      $begingroup$
      @TobiAlafin: If you need to map both ways (A -> B and B -> A), use two dictionaries to get O(1) lookups in each direction. A linear search O(n) in one direction vs. an O(1) in the other, doesn't make sense, unless that direction is almost never used, and you can't afford to double the storage cost by keeping a reverse dictionary. Or there are probably custom data structures that can index on both sets of data without duplicating the storage... like what a relational database would use for a table with 2 columns, and an index on each column. For short keys two dicts is totally fine.
      $endgroup$
      – Peter Cordes
      yesterday






      $begingroup$
      @TobiAlafin: If you need to map both ways (A -> B and B -> A), use two dictionaries to get O(1) lookups in each direction. A linear search O(n) in one direction vs. an O(1) in the other, doesn't make sense, unless that direction is almost never used, and you can't afford to double the storage cost by keeping a reverse dictionary. Or there are probably custom data structures that can index on both sets of data without duplicating the storage... like what a relational database would use for a table with 2 columns, and an index on each column. For short keys two dicts is totally fine.
      $endgroup$
      – Peter Cordes
      yesterday














      $begingroup$
      Peter Cordes thank you. I had not thought of that, I'll apply that in the future.
      $endgroup$
      – Tobi Alafin
      yesterday




      $begingroup$
      Peter Cordes thank you. I had not thought of that, I'll apply that in the future.
      $endgroup$
      – Tobi Alafin
      yesterday













      4












      $begingroup$

      First of all, thumbs up on including a documentation block at the beginning.



      When trying to run the segment, I get



      Traceback (most recent call last):
      File "conjecture.py", line 28, in <module>
      print(conjecture(text))
      File "conjecture.py", line 19, in conjecture
      traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
      File "conjecture.py", line 19, in <dictcomp>
      traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
      IndexError: list index out of range


      Turns out that when I saved the output of the link, there were empty trailing lines. After fixing that, I could get it to run.



      Furthermore, I see several issues with this implementation:




      1. The conjecture function is doing a lot of the work, with several lines that are quite filled with operations, meaning it's hard to find out what it's doing.

      2. A dictionary can contain a value multiple times, and in fact does. For instance, both 5 and 32 have a length of 5 (5 -> 16 -> ...) and (32 -> 16 -> ...). For longer lengths, there will be more ways in which it can be obtained.

      3. Floating point arithmetic in a fully integral problem seems like an issue.

      4. The file is opened, and then not immediately closed. For this it's not really an issue, but in general one should use a contextmanager to clean up used resources as quickly as possible.

      5. The semantics of the output isn't really clear to me, but I'll probably figure out a bit during picking it apart.

      6. It doesn't use a __main__ block, which would be beneficial for re-usable code.

      7. The conjecture function returns a string, instead of a dictionary.


      So let's start with your implementation:



      '''
      * I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
      * "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
      '''

      import math, pprint

      f = open("CollatzConjecture.txt")
      text = f.read()

      def key(value, dct):
      for key in dct.keys():
      if dct[key] == value:
      return key
      return None


      def conjecture(text):
      traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
      result = {}
      bound = math.floor(math.log(len(traj),2))
      traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
      for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
      slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
      result[st] = key(max([n[1] for n in slce]), traj2)
      return pprint.pformat(result)

      print(conjecture(text))


      Use a if __name__ == '__main__' segment.



      One of the best places to start with such a script (especially if you want to make it reusable!) is to place the "main" code (file-mangling) inside a main block:



      ... function definitions ...

      if __name__ == '__main__':
      f = open("CollatzConjecture.txt")
      text = f.read()
      print(conjecture(text))


      Move parsing the trajectories out of conjecture.



      Your conjecture has nothing to do with the piece of text, but about the length of the trajectories. We shouldn't care how these lengths are generated, calculated or parsed inside the conjecture function. But the current implementation does.



      By moving out the text to traj transformation, we make conjecture more general.



      ...

      def conjecture(traj):
      result = {}
      bound = math.floor(math.log(len(traj),2))
      ...

      if __name__ == '__main__':
      f = open("CollatzConjecture.txt")
      text = f.read()
      traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
      print(conjecture(traj))


      And we can then split the parsing into a separate function, where we use a context manager.



      ...

      def parse_lengths(filename):
      with open(filename) as fp:
      #Processes the text file into a dict mapping numbers to their trajectory length.
      return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}

      ...

      if __name__ == '__main__':
      traj = parse_lengths("CollatzConjecture.txt")
      print(conjecture(traj))


      And similarly, move the pformat out of conjecture:



      def conjecture(text):
      result = {}
      ...
      return result

      if __name__ == '__main__':
      traj = parse_lengths("CollatzConjecture.txt")
      pprint.pprint(conjecture(traj))


      Simplify conjecture.



      Now that we're left with a conjecture method that does only things related to the conjecture, we can look at it a bit more:



      def conjecture(traj):
      result = {}
      bound = math.floor(math.log(len(traj),2))
      traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
      for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
      slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
      result[st] = key(max([n[1] for n in slce]), traj2)
      return result


      Starting with this expression: int((math.log(st,2)-1)**2), what does it even do?! First it takes the 2-logarithm, substracts 1, and then squares it. At first I suspected that you wanted the previous power of two (and tested it). But, what actually happens is that 64 (26) becomes 25 (52), and also 2048 (211) becomes 100 (102). My basic guess is that this is incorrect, and you mean 64 => 32, 128 => 64, and so forth. Assuming that's what you want, you can just divide by two.



      def conjecture(traj):
      result = {}
      bound = math.floor(math.log(len(traj),2))
      traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
      for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
      slce = list(traj2.items())[st // 2:st] #Slices "traj2" into powers of 2.
      result[st] = key(max([n[1] for n in slce]), traj2)
      return result


      Surprisingly, though, this does not alter the output.



      Next up is the fact that I see slices being taken of dictionaries (after casting to a list). In older Python versions (before 3.6) that might cause surprising results, as dictionaries are not necessarily ordered by value, though in this case it does turn out to be that way it seems. It's better to explicitly filter out the values that you want.



      Final:



      In the end, I found myself settling on this.



      '''
      * I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
      * "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
      '''

      import pprint

      def key(value, dct):
      for key in dct.keys():
      if dct[key] == value:
      return key
      return None


      def parse_lengths(filename):
      with open(filename) as fp:
      #Processes the text file into a dict mapping numbers to their trajectory length.
      return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}


      def conjecture(traj):
      result = {}
      traj_size = len(traj)
      lower, higher = 1, 2
      while True:
      if higher > traj_size:
      # We don't have all the data for this range available.
      break

      max_in_slice = max(length for num, length in traj.items() if lower <= num <= higher)

      result[higher] = key(max_in_slice, traj)

      # Prepare data for next iteration.
      lower, higher = lower * 2, higher * 2

      return result

      if __name__ == '__main__':
      traj = parse_lengths('CollatzConjecture.txt')
      pprint.pprint(conjecture(traj))


      I'm also not quite happy with this yet, as the "key" function is still bothering me as well (we could iterate over .items() instead of .keys() to simplify).



      I hope this at least helps a bit.






      share|improve this answer











      $endgroup$


















        4












        $begingroup$

        First of all, thumbs up on including a documentation block at the beginning.



        When trying to run the segment, I get



        Traceback (most recent call last):
        File "conjecture.py", line 28, in <module>
        print(conjecture(text))
        File "conjecture.py", line 19, in conjecture
        traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
        File "conjecture.py", line 19, in <dictcomp>
        traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
        IndexError: list index out of range


        Turns out that when I saved the output of the link, there were empty trailing lines. After fixing that, I could get it to run.



        Furthermore, I see several issues with this implementation:




        1. The conjecture function is doing a lot of the work, with several lines that are quite filled with operations, meaning it's hard to find out what it's doing.

        2. A dictionary can contain a value multiple times, and in fact does. For instance, both 5 and 32 have a length of 5 (5 -> 16 -> ...) and (32 -> 16 -> ...). For longer lengths, there will be more ways in which it can be obtained.

        3. Floating point arithmetic in a fully integral problem seems like an issue.

        4. The file is opened, and then not immediately closed. For this it's not really an issue, but in general one should use a contextmanager to clean up used resources as quickly as possible.

        5. The semantics of the output isn't really clear to me, but I'll probably figure out a bit during picking it apart.

        6. It doesn't use a __main__ block, which would be beneficial for re-usable code.

        7. The conjecture function returns a string, instead of a dictionary.


        So let's start with your implementation:



        '''
        * I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
        * "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
        '''

        import math, pprint

        f = open("CollatzConjecture.txt")
        text = f.read()

        def key(value, dct):
        for key in dct.keys():
        if dct[key] == value:
        return key
        return None


        def conjecture(text):
        traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
        result = {}
        bound = math.floor(math.log(len(traj),2))
        traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
        for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
        slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
        result[st] = key(max([n[1] for n in slce]), traj2)
        return pprint.pformat(result)

        print(conjecture(text))


        Use a if __name__ == '__main__' segment.



        One of the best places to start with such a script (especially if you want to make it reusable!) is to place the "main" code (file-mangling) inside a main block:



        ... function definitions ...

        if __name__ == '__main__':
        f = open("CollatzConjecture.txt")
        text = f.read()
        print(conjecture(text))


        Move parsing the trajectories out of conjecture.



        Your conjecture has nothing to do with the piece of text, but about the length of the trajectories. We shouldn't care how these lengths are generated, calculated or parsed inside the conjecture function. But the current implementation does.



        By moving out the text to traj transformation, we make conjecture more general.



        ...

        def conjecture(traj):
        result = {}
        bound = math.floor(math.log(len(traj),2))
        ...

        if __name__ == '__main__':
        f = open("CollatzConjecture.txt")
        text = f.read()
        traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
        print(conjecture(traj))


        And we can then split the parsing into a separate function, where we use a context manager.



        ...

        def parse_lengths(filename):
        with open(filename) as fp:
        #Processes the text file into a dict mapping numbers to their trajectory length.
        return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}

        ...

        if __name__ == '__main__':
        traj = parse_lengths("CollatzConjecture.txt")
        print(conjecture(traj))


        And similarly, move the pformat out of conjecture:



        def conjecture(text):
        result = {}
        ...
        return result

        if __name__ == '__main__':
        traj = parse_lengths("CollatzConjecture.txt")
        pprint.pprint(conjecture(traj))


        Simplify conjecture.



        Now that we're left with a conjecture method that does only things related to the conjecture, we can look at it a bit more:



        def conjecture(traj):
        result = {}
        bound = math.floor(math.log(len(traj),2))
        traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
        for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
        slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
        result[st] = key(max([n[1] for n in slce]), traj2)
        return result


        Starting with this expression: int((math.log(st,2)-1)**2), what does it even do?! First it takes the 2-logarithm, substracts 1, and then squares it. At first I suspected that you wanted the previous power of two (and tested it). But, what actually happens is that 64 (26) becomes 25 (52), and also 2048 (211) becomes 100 (102). My basic guess is that this is incorrect, and you mean 64 => 32, 128 => 64, and so forth. Assuming that's what you want, you can just divide by two.



        def conjecture(traj):
        result = {}
        bound = math.floor(math.log(len(traj),2))
        traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
        for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
        slce = list(traj2.items())[st // 2:st] #Slices "traj2" into powers of 2.
        result[st] = key(max([n[1] for n in slce]), traj2)
        return result


        Surprisingly, though, this does not alter the output.



        Next up is the fact that I see slices being taken of dictionaries (after casting to a list). In older Python versions (before 3.6) that might cause surprising results, as dictionaries are not necessarily ordered by value, though in this case it does turn out to be that way it seems. It's better to explicitly filter out the values that you want.



        Final:



        In the end, I found myself settling on this.



        '''
        * I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
        * "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
        '''

        import pprint

        def key(value, dct):
        for key in dct.keys():
        if dct[key] == value:
        return key
        return None


        def parse_lengths(filename):
        with open(filename) as fp:
        #Processes the text file into a dict mapping numbers to their trajectory length.
        return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}


        def conjecture(traj):
        result = {}
        traj_size = len(traj)
        lower, higher = 1, 2
        while True:
        if higher > traj_size:
        # We don't have all the data for this range available.
        break

        max_in_slice = max(length for num, length in traj.items() if lower <= num <= higher)

        result[higher] = key(max_in_slice, traj)

        # Prepare data for next iteration.
        lower, higher = lower * 2, higher * 2

        return result

        if __name__ == '__main__':
        traj = parse_lengths('CollatzConjecture.txt')
        pprint.pprint(conjecture(traj))


        I'm also not quite happy with this yet, as the "key" function is still bothering me as well (we could iterate over .items() instead of .keys() to simplify).



        I hope this at least helps a bit.






        share|improve this answer











        $endgroup$
















          4












          4








          4





          $begingroup$

          First of all, thumbs up on including a documentation block at the beginning.



          When trying to run the segment, I get



          Traceback (most recent call last):
          File "conjecture.py", line 28, in <module>
          print(conjecture(text))
          File "conjecture.py", line 19, in conjecture
          traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
          File "conjecture.py", line 19, in <dictcomp>
          traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
          IndexError: list index out of range


          Turns out that when I saved the output of the link, there were empty trailing lines. After fixing that, I could get it to run.



          Furthermore, I see several issues with this implementation:




          1. The conjecture function is doing a lot of the work, with several lines that are quite filled with operations, meaning it's hard to find out what it's doing.

          2. A dictionary can contain a value multiple times, and in fact does. For instance, both 5 and 32 have a length of 5 (5 -> 16 -> ...) and (32 -> 16 -> ...). For longer lengths, there will be more ways in which it can be obtained.

          3. Floating point arithmetic in a fully integral problem seems like an issue.

          4. The file is opened, and then not immediately closed. For this it's not really an issue, but in general one should use a contextmanager to clean up used resources as quickly as possible.

          5. The semantics of the output isn't really clear to me, but I'll probably figure out a bit during picking it apart.

          6. It doesn't use a __main__ block, which would be beneficial for re-usable code.

          7. The conjecture function returns a string, instead of a dictionary.


          So let's start with your implementation:



          '''
          * I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
          * "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
          '''

          import math, pprint

          f = open("CollatzConjecture.txt")
          text = f.read()

          def key(value, dct):
          for key in dct.keys():
          if dct[key] == value:
          return key
          return None


          def conjecture(text):
          traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
          result = {}
          bound = math.floor(math.log(len(traj),2))
          traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
          for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
          slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
          result[st] = key(max([n[1] for n in slce]), traj2)
          return pprint.pformat(result)

          print(conjecture(text))


          Use a if __name__ == '__main__' segment.



          One of the best places to start with such a script (especially if you want to make it reusable!) is to place the "main" code (file-mangling) inside a main block:



          ... function definitions ...

          if __name__ == '__main__':
          f = open("CollatzConjecture.txt")
          text = f.read()
          print(conjecture(text))


          Move parsing the trajectories out of conjecture.



          Your conjecture has nothing to do with the piece of text, but about the length of the trajectories. We shouldn't care how these lengths are generated, calculated or parsed inside the conjecture function. But the current implementation does.



          By moving out the text to traj transformation, we make conjecture more general.



          ...

          def conjecture(traj):
          result = {}
          bound = math.floor(math.log(len(traj),2))
          ...

          if __name__ == '__main__':
          f = open("CollatzConjecture.txt")
          text = f.read()
          traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
          print(conjecture(traj))


          And we can then split the parsing into a separate function, where we use a context manager.



          ...

          def parse_lengths(filename):
          with open(filename) as fp:
          #Processes the text file into a dict mapping numbers to their trajectory length.
          return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}

          ...

          if __name__ == '__main__':
          traj = parse_lengths("CollatzConjecture.txt")
          print(conjecture(traj))


          And similarly, move the pformat out of conjecture:



          def conjecture(text):
          result = {}
          ...
          return result

          if __name__ == '__main__':
          traj = parse_lengths("CollatzConjecture.txt")
          pprint.pprint(conjecture(traj))


          Simplify conjecture.



          Now that we're left with a conjecture method that does only things related to the conjecture, we can look at it a bit more:



          def conjecture(traj):
          result = {}
          bound = math.floor(math.log(len(traj),2))
          traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
          for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
          slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
          result[st] = key(max([n[1] for n in slce]), traj2)
          return result


          Starting with this expression: int((math.log(st,2)-1)**2), what does it even do?! First it takes the 2-logarithm, substracts 1, and then squares it. At first I suspected that you wanted the previous power of two (and tested it). But, what actually happens is that 64 (26) becomes 25 (52), and also 2048 (211) becomes 100 (102). My basic guess is that this is incorrect, and you mean 64 => 32, 128 => 64, and so forth. Assuming that's what you want, you can just divide by two.



          def conjecture(traj):
          result = {}
          bound = math.floor(math.log(len(traj),2))
          traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
          for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
          slce = list(traj2.items())[st // 2:st] #Slices "traj2" into powers of 2.
          result[st] = key(max([n[1] for n in slce]), traj2)
          return result


          Surprisingly, though, this does not alter the output.



          Next up is the fact that I see slices being taken of dictionaries (after casting to a list). In older Python versions (before 3.6) that might cause surprising results, as dictionaries are not necessarily ordered by value, though in this case it does turn out to be that way it seems. It's better to explicitly filter out the values that you want.



          Final:



          In the end, I found myself settling on this.



          '''
          * I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
          * "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
          '''

          import pprint

          def key(value, dct):
          for key in dct.keys():
          if dct[key] == value:
          return key
          return None


          def parse_lengths(filename):
          with open(filename) as fp:
          #Processes the text file into a dict mapping numbers to their trajectory length.
          return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}


          def conjecture(traj):
          result = {}
          traj_size = len(traj)
          lower, higher = 1, 2
          while True:
          if higher > traj_size:
          # We don't have all the data for this range available.
          break

          max_in_slice = max(length for num, length in traj.items() if lower <= num <= higher)

          result[higher] = key(max_in_slice, traj)

          # Prepare data for next iteration.
          lower, higher = lower * 2, higher * 2

          return result

          if __name__ == '__main__':
          traj = parse_lengths('CollatzConjecture.txt')
          pprint.pprint(conjecture(traj))


          I'm also not quite happy with this yet, as the "key" function is still bothering me as well (we could iterate over .items() instead of .keys() to simplify).



          I hope this at least helps a bit.






          share|improve this answer











          $endgroup$



          First of all, thumbs up on including a documentation block at the beginning.



          When trying to run the segment, I get



          Traceback (most recent call last):
          File "conjecture.py", line 28, in <module>
          print(conjecture(text))
          File "conjecture.py", line 19, in conjecture
          traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
          File "conjecture.py", line 19, in <dictcomp>
          traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
          IndexError: list index out of range


          Turns out that when I saved the output of the link, there were empty trailing lines. After fixing that, I could get it to run.



          Furthermore, I see several issues with this implementation:




          1. The conjecture function is doing a lot of the work, with several lines that are quite filled with operations, meaning it's hard to find out what it's doing.

          2. A dictionary can contain a value multiple times, and in fact does. For instance, both 5 and 32 have a length of 5 (5 -> 16 -> ...) and (32 -> 16 -> ...). For longer lengths, there will be more ways in which it can be obtained.

          3. Floating point arithmetic in a fully integral problem seems like an issue.

          4. The file is opened, and then not immediately closed. For this it's not really an issue, but in general one should use a contextmanager to clean up used resources as quickly as possible.

          5. The semantics of the output isn't really clear to me, but I'll probably figure out a bit during picking it apart.

          6. It doesn't use a __main__ block, which would be beneficial for re-usable code.

          7. The conjecture function returns a string, instead of a dictionary.


          So let's start with your implementation:



          '''
          * I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
          * "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
          '''

          import math, pprint

          f = open("CollatzConjecture.txt")
          text = f.read()

          def key(value, dct):
          for key in dct.keys():
          if dct[key] == value:
          return key
          return None


          def conjecture(text):
          traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
          result = {}
          bound = math.floor(math.log(len(traj),2))
          traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
          for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
          slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
          result[st] = key(max([n[1] for n in slce]), traj2)
          return pprint.pformat(result)

          print(conjecture(text))


          Use a if __name__ == '__main__' segment.



          One of the best places to start with such a script (especially if you want to make it reusable!) is to place the "main" code (file-mangling) inside a main block:



          ... function definitions ...

          if __name__ == '__main__':
          f = open("CollatzConjecture.txt")
          text = f.read()
          print(conjecture(text))


          Move parsing the trajectories out of conjecture.



          Your conjecture has nothing to do with the piece of text, but about the length of the trajectories. We shouldn't care how these lengths are generated, calculated or parsed inside the conjecture function. But the current implementation does.



          By moving out the text to traj transformation, we make conjecture more general.



          ...

          def conjecture(traj):
          result = {}
          bound = math.floor(math.log(len(traj),2))
          ...

          if __name__ == '__main__':
          f = open("CollatzConjecture.txt")
          text = f.read()
          traj = {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in text.split("n")} #Processes the text file into a dict mapping numbers to their trajectory length.
          print(conjecture(traj))


          And we can then split the parsing into a separate function, where we use a context manager.



          ...

          def parse_lengths(filename):
          with open(filename) as fp:
          #Processes the text file into a dict mapping numbers to their trajectory length.
          return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}

          ...

          if __name__ == '__main__':
          traj = parse_lengths("CollatzConjecture.txt")
          print(conjecture(traj))


          And similarly, move the pformat out of conjecture:



          def conjecture(text):
          result = {}
          ...
          return result

          if __name__ == '__main__':
          traj = parse_lengths("CollatzConjecture.txt")
          pprint.pprint(conjecture(traj))


          Simplify conjecture.



          Now that we're left with a conjecture method that does only things related to the conjecture, we can look at it a bit more:



          def conjecture(traj):
          result = {}
          bound = math.floor(math.log(len(traj),2))
          traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
          for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
          slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
          result[st] = key(max([n[1] for n in slce]), traj2)
          return result


          Starting with this expression: int((math.log(st,2)-1)**2), what does it even do?! First it takes the 2-logarithm, substracts 1, and then squares it. At first I suspected that you wanted the previous power of two (and tested it). But, what actually happens is that 64 (26) becomes 25 (52), and also 2048 (211) becomes 100 (102). My basic guess is that this is incorrect, and you mean 64 => 32, 128 => 64, and so forth. Assuming that's what you want, you can just divide by two.



          def conjecture(traj):
          result = {}
          bound = math.floor(math.log(len(traj),2))
          traj2 = {num: traj[num] for num in list(traj.keys())[:2**bound+1]} #Slices the dictionary to receive the section of interest.
          for st in (2**x for x in range(1, bound+1)): #Generator for powers of 2.
          slce = list(traj2.items())[st // 2:st] #Slices "traj2" into powers of 2.
          result[st] = key(max([n[1] for n in slce]), traj2)
          return result


          Surprisingly, though, this does not alter the output.



          Next up is the fact that I see slices being taken of dictionaries (after casting to a list). In older Python versions (before 3.6) that might cause surprising results, as dictionaries are not necessarily ordered by value, though in this case it does turn out to be that way it seems. It's better to explicitly filter out the values that you want.



          Final:



          In the end, I found myself settling on this.



          '''
          * I had a conjecture that maximal trajectory lengths for the collatz conjecture were yielded by numbers of the form (2**n)-1. This program attempts to test it.
          * "conjecture()" finds the number with the highest trajectory lengths in a given range. The ranges are of the form [2**n, 2**(n+1)).
          '''

          import pprint

          def key(value, dct):
          for key in dct.keys():
          if dct[key] == value:
          return key
          return None


          def parse_lengths(filename):
          with open(filename) as fp:
          #Processes the text file into a dict mapping numbers to their trajectory length.
          return {int(j.split(" ")[0]): int(j.split(" ")[1]) for j in fp}


          def conjecture(traj):
          result = {}
          traj_size = len(traj)
          lower, higher = 1, 2
          while True:
          if higher > traj_size:
          # We don't have all the data for this range available.
          break

          max_in_slice = max(length for num, length in traj.items() if lower <= num <= higher)

          result[higher] = key(max_in_slice, traj)

          # Prepare data for next iteration.
          lower, higher = lower * 2, higher * 2

          return result

          if __name__ == '__main__':
          traj = parse_lengths('CollatzConjecture.txt')
          pprint.pprint(conjecture(traj))


          I'm also not quite happy with this yet, as the "key" function is still bothering me as well (we could iterate over .items() instead of .keys() to simplify).



          I hope this at least helps a bit.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 22 hours ago









          Peter Taylor

          16.1k2759




          16.1k2759










          answered yesterday









          Sjoerd Job PostmusSjoerd Job Postmus

          3,2941016




          3,2941016























              3












              $begingroup$

              It's desirable to try to break one's code into small, more reusable segments. For example, it would be preferable (in my opinion) to create a small function to open the text file, extract its contents, and return the dict of trajectories you desire:



              def load_trajectories(file_name):
              traj = {}
              with open(file_name, 'r') as f:
              for line in f.readlines():
              S = line.split(' ')
              num, traj_len = int(S[0]), int(S[1])
              traj[num] = traj_len
              return traj


              Note here that I used the context manager with open(file_name, 'r') as f:, since in Python we need to first open a file, do things with it, and then close it. The context manager handles the opening and closing of the file for us, and within it we can access the file.



              From my searching there doesn't seem to be a universally accepted way to find the "inverse" of a value in a dictionary, but I like yours for the most part. As one change, I would iterate over the key:value pairs to simplify things a bit:



              def find_key(target_value, dct):
              for key, val in dct.items():
              if val == target_value:
              return key
              raise Exception('Value not found!')


              With this, it really only remains to clean up the conjecture function. Here, we can pass the traj dictionary as an argument, as well as the (un-exponentiated) bound.



              In the parlance of your code, note that st is always of the form 2**x, so when we compute math.log(st, 2), it always evaluates to x, so one line of your code reads (equivalently)



              slce = list(traj2.items())[(x-1)**2:st]


              which isn't the 'slicing into power of 2' that you desire. Moreover, it's not (necessarily) guaranteed that traj2.items() will be turned into a list in the way you desire, so it'll be better to be more explicit:



              slce = [key, val for key, val in traj.items() if key in range(2**(x-1), 2**x)]


              Together, along with splitting our imports onto different lines, adding a if __name__ == '__main__': guard, and some other minor reorganization yields the following code:



              import math
              import pprint

              def find_key(target_value, dct):
              for key, val in dct.items():
              if val == target_value:
              return key
              raise Exception('Value not found!')

              def load_trajectories(file_name):
              traj = {}
              with open(file_name, 'r') as f:
              for line in f.readlines():
              S = line.split(' ')
              num, traj_len = int(S[0]), int(S[1])
              traj[num] = traj_len
              return traj

              def conjecture(traj):
              """Checks the conjecture that the maximum 'Collatz length' among the integers in [2**n, 2**(n+1)) is 2**n - 1.

              The conjecture is false.
              """
              bound = math.floor(math.log(len(traj),2))
              exp_bound = 2**bound

              traj = {key:val for key, val in traj.items() if key < exp_bound} # Only include numbers of interest.
              result = {}

              for n in range(1,bound+1):
              exp_n = 2**n
              slce = [key, val for key, val in traj.items() if key in range(2**(n-1), exp_n)]
              result[exp_n] = find_key(max([val for key, val in slce]), traj)

              return pprint(pformat(result))

              if __name__ == "__main__":

              file_name = "CollatzConjecture.txt"
              print(conjecture(file_name))





              share|improve this answer









              $endgroup$













              • $begingroup$
                "I would iterate over the key:value pairs to simplify things a bit": it's not just a question of simplifying. It's also more efficient, because advancing the iterator doesn't involve any search so is as cheap as possible, whereas the lookup potentially has to resolve hash collisions.
                $endgroup$
                – Peter Taylor
                21 hours ago
















              3












              $begingroup$

              It's desirable to try to break one's code into small, more reusable segments. For example, it would be preferable (in my opinion) to create a small function to open the text file, extract its contents, and return the dict of trajectories you desire:



              def load_trajectories(file_name):
              traj = {}
              with open(file_name, 'r') as f:
              for line in f.readlines():
              S = line.split(' ')
              num, traj_len = int(S[0]), int(S[1])
              traj[num] = traj_len
              return traj


              Note here that I used the context manager with open(file_name, 'r') as f:, since in Python we need to first open a file, do things with it, and then close it. The context manager handles the opening and closing of the file for us, and within it we can access the file.



              From my searching there doesn't seem to be a universally accepted way to find the "inverse" of a value in a dictionary, but I like yours for the most part. As one change, I would iterate over the key:value pairs to simplify things a bit:



              def find_key(target_value, dct):
              for key, val in dct.items():
              if val == target_value:
              return key
              raise Exception('Value not found!')


              With this, it really only remains to clean up the conjecture function. Here, we can pass the traj dictionary as an argument, as well as the (un-exponentiated) bound.



              In the parlance of your code, note that st is always of the form 2**x, so when we compute math.log(st, 2), it always evaluates to x, so one line of your code reads (equivalently)



              slce = list(traj2.items())[(x-1)**2:st]


              which isn't the 'slicing into power of 2' that you desire. Moreover, it's not (necessarily) guaranteed that traj2.items() will be turned into a list in the way you desire, so it'll be better to be more explicit:



              slce = [key, val for key, val in traj.items() if key in range(2**(x-1), 2**x)]


              Together, along with splitting our imports onto different lines, adding a if __name__ == '__main__': guard, and some other minor reorganization yields the following code:



              import math
              import pprint

              def find_key(target_value, dct):
              for key, val in dct.items():
              if val == target_value:
              return key
              raise Exception('Value not found!')

              def load_trajectories(file_name):
              traj = {}
              with open(file_name, 'r') as f:
              for line in f.readlines():
              S = line.split(' ')
              num, traj_len = int(S[0]), int(S[1])
              traj[num] = traj_len
              return traj

              def conjecture(traj):
              """Checks the conjecture that the maximum 'Collatz length' among the integers in [2**n, 2**(n+1)) is 2**n - 1.

              The conjecture is false.
              """
              bound = math.floor(math.log(len(traj),2))
              exp_bound = 2**bound

              traj = {key:val for key, val in traj.items() if key < exp_bound} # Only include numbers of interest.
              result = {}

              for n in range(1,bound+1):
              exp_n = 2**n
              slce = [key, val for key, val in traj.items() if key in range(2**(n-1), exp_n)]
              result[exp_n] = find_key(max([val for key, val in slce]), traj)

              return pprint(pformat(result))

              if __name__ == "__main__":

              file_name = "CollatzConjecture.txt"
              print(conjecture(file_name))





              share|improve this answer









              $endgroup$













              • $begingroup$
                "I would iterate over the key:value pairs to simplify things a bit": it's not just a question of simplifying. It's also more efficient, because advancing the iterator doesn't involve any search so is as cheap as possible, whereas the lookup potentially has to resolve hash collisions.
                $endgroup$
                – Peter Taylor
                21 hours ago














              3












              3








              3





              $begingroup$

              It's desirable to try to break one's code into small, more reusable segments. For example, it would be preferable (in my opinion) to create a small function to open the text file, extract its contents, and return the dict of trajectories you desire:



              def load_trajectories(file_name):
              traj = {}
              with open(file_name, 'r') as f:
              for line in f.readlines():
              S = line.split(' ')
              num, traj_len = int(S[0]), int(S[1])
              traj[num] = traj_len
              return traj


              Note here that I used the context manager with open(file_name, 'r') as f:, since in Python we need to first open a file, do things with it, and then close it. The context manager handles the opening and closing of the file for us, and within it we can access the file.



              From my searching there doesn't seem to be a universally accepted way to find the "inverse" of a value in a dictionary, but I like yours for the most part. As one change, I would iterate over the key:value pairs to simplify things a bit:



              def find_key(target_value, dct):
              for key, val in dct.items():
              if val == target_value:
              return key
              raise Exception('Value not found!')


              With this, it really only remains to clean up the conjecture function. Here, we can pass the traj dictionary as an argument, as well as the (un-exponentiated) bound.



              In the parlance of your code, note that st is always of the form 2**x, so when we compute math.log(st, 2), it always evaluates to x, so one line of your code reads (equivalently)



              slce = list(traj2.items())[(x-1)**2:st]


              which isn't the 'slicing into power of 2' that you desire. Moreover, it's not (necessarily) guaranteed that traj2.items() will be turned into a list in the way you desire, so it'll be better to be more explicit:



              slce = [key, val for key, val in traj.items() if key in range(2**(x-1), 2**x)]


              Together, along with splitting our imports onto different lines, adding a if __name__ == '__main__': guard, and some other minor reorganization yields the following code:



              import math
              import pprint

              def find_key(target_value, dct):
              for key, val in dct.items():
              if val == target_value:
              return key
              raise Exception('Value not found!')

              def load_trajectories(file_name):
              traj = {}
              with open(file_name, 'r') as f:
              for line in f.readlines():
              S = line.split(' ')
              num, traj_len = int(S[0]), int(S[1])
              traj[num] = traj_len
              return traj

              def conjecture(traj):
              """Checks the conjecture that the maximum 'Collatz length' among the integers in [2**n, 2**(n+1)) is 2**n - 1.

              The conjecture is false.
              """
              bound = math.floor(math.log(len(traj),2))
              exp_bound = 2**bound

              traj = {key:val for key, val in traj.items() if key < exp_bound} # Only include numbers of interest.
              result = {}

              for n in range(1,bound+1):
              exp_n = 2**n
              slce = [key, val for key, val in traj.items() if key in range(2**(n-1), exp_n)]
              result[exp_n] = find_key(max([val for key, val in slce]), traj)

              return pprint(pformat(result))

              if __name__ == "__main__":

              file_name = "CollatzConjecture.txt"
              print(conjecture(file_name))





              share|improve this answer









              $endgroup$



              It's desirable to try to break one's code into small, more reusable segments. For example, it would be preferable (in my opinion) to create a small function to open the text file, extract its contents, and return the dict of trajectories you desire:



              def load_trajectories(file_name):
              traj = {}
              with open(file_name, 'r') as f:
              for line in f.readlines():
              S = line.split(' ')
              num, traj_len = int(S[0]), int(S[1])
              traj[num] = traj_len
              return traj


              Note here that I used the context manager with open(file_name, 'r') as f:, since in Python we need to first open a file, do things with it, and then close it. The context manager handles the opening and closing of the file for us, and within it we can access the file.



              From my searching there doesn't seem to be a universally accepted way to find the "inverse" of a value in a dictionary, but I like yours for the most part. As one change, I would iterate over the key:value pairs to simplify things a bit:



              def find_key(target_value, dct):
              for key, val in dct.items():
              if val == target_value:
              return key
              raise Exception('Value not found!')


              With this, it really only remains to clean up the conjecture function. Here, we can pass the traj dictionary as an argument, as well as the (un-exponentiated) bound.



              In the parlance of your code, note that st is always of the form 2**x, so when we compute math.log(st, 2), it always evaluates to x, so one line of your code reads (equivalently)



              slce = list(traj2.items())[(x-1)**2:st]


              which isn't the 'slicing into power of 2' that you desire. Moreover, it's not (necessarily) guaranteed that traj2.items() will be turned into a list in the way you desire, so it'll be better to be more explicit:



              slce = [key, val for key, val in traj.items() if key in range(2**(x-1), 2**x)]


              Together, along with splitting our imports onto different lines, adding a if __name__ == '__main__': guard, and some other minor reorganization yields the following code:



              import math
              import pprint

              def find_key(target_value, dct):
              for key, val in dct.items():
              if val == target_value:
              return key
              raise Exception('Value not found!')

              def load_trajectories(file_name):
              traj = {}
              with open(file_name, 'r') as f:
              for line in f.readlines():
              S = line.split(' ')
              num, traj_len = int(S[0]), int(S[1])
              traj[num] = traj_len
              return traj

              def conjecture(traj):
              """Checks the conjecture that the maximum 'Collatz length' among the integers in [2**n, 2**(n+1)) is 2**n - 1.

              The conjecture is false.
              """
              bound = math.floor(math.log(len(traj),2))
              exp_bound = 2**bound

              traj = {key:val for key, val in traj.items() if key < exp_bound} # Only include numbers of interest.
              result = {}

              for n in range(1,bound+1):
              exp_n = 2**n
              slce = [key, val for key, val in traj.items() if key in range(2**(n-1), exp_n)]
              result[exp_n] = find_key(max([val for key, val in slce]), traj)

              return pprint(pformat(result))

              if __name__ == "__main__":

              file_name = "CollatzConjecture.txt"
              print(conjecture(file_name))






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered yesterday









              Michael EngenMichael Engen

              1286




              1286












              • $begingroup$
                "I would iterate over the key:value pairs to simplify things a bit": it's not just a question of simplifying. It's also more efficient, because advancing the iterator doesn't involve any search so is as cheap as possible, whereas the lookup potentially has to resolve hash collisions.
                $endgroup$
                – Peter Taylor
                21 hours ago


















              • $begingroup$
                "I would iterate over the key:value pairs to simplify things a bit": it's not just a question of simplifying. It's also more efficient, because advancing the iterator doesn't involve any search so is as cheap as possible, whereas the lookup potentially has to resolve hash collisions.
                $endgroup$
                – Peter Taylor
                21 hours ago
















              $begingroup$
              "I would iterate over the key:value pairs to simplify things a bit": it's not just a question of simplifying. It's also more efficient, because advancing the iterator doesn't involve any search so is as cheap as possible, whereas the lookup potentially has to resolve hash collisions.
              $endgroup$
              – Peter Taylor
              21 hours ago




              $begingroup$
              "I would iterate over the key:value pairs to simplify things a bit": it's not just a question of simplifying. It's also more efficient, because advancing the iterator doesn't involve any search so is as cheap as possible, whereas the lookup potentially has to resolve hash collisions.
              $endgroup$
              – Peter Taylor
              21 hours ago











              1












              $begingroup$

              The existing answers cover most of the issues that I spotted. There's just one thing that I think it's useful to add:




                  for st in (2**x for x in range(1, bound+1)):        #Generator for powers of 2.
              slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
              result[st] = key(max([n[1] for n in slce]), traj2)



              This converts the dict into a list every time through the loop. The suggestions, which don't rely on undocumented behaviour, instead filter the dict each time through the loop. But you can instead invert the loop:



                  best = [0] * (bound + 1)
              for key, value in traj2.items():
              x = int(math.log(key, 2)) + 1
              if value > best[x] or (value == best[x] and key < result[2**x]):
              best[x] = value
              result[2**x] = key


              This gives straightforward linear running time and allows you explicit control over tie-breaking. (I'm not sure whether I've got the tie-breaking correct for your conjecture, but you can easily fix that).






              share|improve this answer









              $endgroup$


















                1












                $begingroup$

                The existing answers cover most of the issues that I spotted. There's just one thing that I think it's useful to add:




                    for st in (2**x for x in range(1, bound+1)):        #Generator for powers of 2.
                slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
                result[st] = key(max([n[1] for n in slce]), traj2)



                This converts the dict into a list every time through the loop. The suggestions, which don't rely on undocumented behaviour, instead filter the dict each time through the loop. But you can instead invert the loop:



                    best = [0] * (bound + 1)
                for key, value in traj2.items():
                x = int(math.log(key, 2)) + 1
                if value > best[x] or (value == best[x] and key < result[2**x]):
                best[x] = value
                result[2**x] = key


                This gives straightforward linear running time and allows you explicit control over tie-breaking. (I'm not sure whether I've got the tie-breaking correct for your conjecture, but you can easily fix that).






                share|improve this answer









                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  The existing answers cover most of the issues that I spotted. There's just one thing that I think it's useful to add:




                      for st in (2**x for x in range(1, bound+1)):        #Generator for powers of 2.
                  slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
                  result[st] = key(max([n[1] for n in slce]), traj2)



                  This converts the dict into a list every time through the loop. The suggestions, which don't rely on undocumented behaviour, instead filter the dict each time through the loop. But you can instead invert the loop:



                      best = [0] * (bound + 1)
                  for key, value in traj2.items():
                  x = int(math.log(key, 2)) + 1
                  if value > best[x] or (value == best[x] and key < result[2**x]):
                  best[x] = value
                  result[2**x] = key


                  This gives straightforward linear running time and allows you explicit control over tie-breaking. (I'm not sure whether I've got the tie-breaking correct for your conjecture, but you can easily fix that).






                  share|improve this answer









                  $endgroup$



                  The existing answers cover most of the issues that I spotted. There's just one thing that I think it's useful to add:




                      for st in (2**x for x in range(1, bound+1)):        #Generator for powers of 2.
                  slce = list(traj2.items())[int((math.log(st,2)-1)**2):st] #Slices "traj2" into powers of 2.
                  result[st] = key(max([n[1] for n in slce]), traj2)



                  This converts the dict into a list every time through the loop. The suggestions, which don't rely on undocumented behaviour, instead filter the dict each time through the loop. But you can instead invert the loop:



                      best = [0] * (bound + 1)
                  for key, value in traj2.items():
                  x = int(math.log(key, 2)) + 1
                  if value > best[x] or (value == best[x] and key < result[2**x]):
                  best[x] = value
                  result[2**x] = key


                  This gives straightforward linear running time and allows you explicit control over tie-breaking. (I'm not sure whether I've got the tie-breaking correct for your conjecture, but you can easily fix that).







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 21 hours ago









                  Peter TaylorPeter Taylor

                  16.1k2759




                  16.1k2759






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Code Review Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f212377%2ftesting-a-collatz-conjecture-conjecture-python%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                      Alcedinidae

                      RAC Tourist Trophy