My Solution to Google Code Jam Round 1B 2010: ‘File Fix-it’

April 12, 2012

The problem is stated here.

I will first attempt explain the challenge simply, since from my recent experience, it seems that a lot of software engineering exercises have as much to do with reading comprehension as with actual programming skills.

Then I will describe my approach and also present an alternative quick solution.

Finally I will paste the annotated code and make the script available for download.

You get a 2 lists of directories – one is a list of “existing” directories, the second is a list of directories we want to “create”. The file system in question is imaginary, as we are dealing simply with the logical aspect of creating directories as per the unix directory structure standard (“dir/subdir/subdir”).
Once the program understands the existing tree, it will then calculate how many “mkdir”‘s it will take to satisfy the list of “to create” dirs.

Coming from a SysAdmin background, my initial solution was very straightforward, and used basic GNU tools to get the job done, however it had very little to do with programming. It was more like mkdir -pv /test/dir/subdir/ | wc -l. With a bit of wrangling that basic piece of logic should actually be able to handle the whole thing.

However, as the problem also includes the input parsing, and in fact, what was at question here are meaningful engineering skills, rather than creative workarounds (those have their place as well don’t get me wrong), I went on to write the following script:

#!/usr/bin/env python

### Google Code Challenge "File Fix-it" ###
### Solution by Tom Goren ###

# Input: T - No. of test cases
# N - No. of lines that pertain to dirs already existing
# M - No. of lines that are dirs we want to create
# Output: Case No. and no. of moves it takes to create

##############
# Example:   #
# Input      #
#            #
# 2          #   T = 2
# 0 2        #   N=0, M=2
# /fiz/boo   #   No dirs 'already existing', need to create 2 (/fiz and /fiz/boo)
# /fiz/bot   #   /fiz already created in last move, now only need to create /fiz/bot
# 1 1        #   N=1, M=1
# /ox/shuk   #   /ox/shuk 'already exists'            
# /ox/shmi   #   only need to create /ox/shmi - 1 move
##############
# Output     #
#            #
# Case #1: 3 #
# Case #2: 1 #
#            #
#            #
##############

import sys

# define helpful function, for nice output
printf = sys.stdout.write

# Read input file

#input_file = "A-large-practice.in" # Google's large set
input_file = "A-small-practice.in" # Google's small set
infile = open(input_file)

def tree_build(base_tree,dirs):
    """ helper function for building the dir tree """
    move_count = 0
    tree = base_tree # python dictionaries are special in assignment (this is the trick)
    for subdir in dirs:
        if subdir not in tree:
            tree[subdir] = {} # dir is top level
            move_count += 1
        tree = tree[subdir] # next traverse will be inside element
    return move_count
   
def mkdir_moves(existing, to_create):
    """ Build multi-dimensional array with dictionaries for the 'existing' dirs,
        and then check what it will take to create 'non-existent' ones.
        Return no. of moves """

   
    base_tree = {}
    for exist in existing:
        exist = exist.strip().split("/") # cleanup
        exist.pop(0) # more cleanup
        tree_build(base_tree,exist) # pass empty dict and dir in form of list
       
    move_count = 0
    for create in to_create:
        create = create.strip().split("/")
        create.pop(0)
        move_count += tree_build(base_tree,create)

    return move_count

# start parsing
N = 0
M = 0
case_index = 0
line_counter = 0
case_counter = 1
existing_lines = []
create_lines = []
for line in infile:
    if line_counter == 0 and int(line) > 0:
        no_of_cases = line
    line_spl = line.split()
    if len(line_spl) > 1: # splitting by spaces - only case definitions should have a space in them ('N M')
        N, M = int(line_spl[0]), int(line_spl[1])
        if N > 100 or N < 0 or M > 100 or M < 0:
            #printf("bad input for line: %s (bad case definitions)" % line_counter)
            #sys.exit(1)
            continue
        case_index = line_counter # for lines within each case
        # printf("DEBUG N: %s, M: %s\n" % (N, M))
        printf("\nCase #%s: "% case_counter) # Output
        existing_lines = [] # reset list
        create_lines = [] # reset list
        case_counter += 1 # advance to next case counter
    elif len(line) > 100:
        #printf("bad input for line: %s (longer than 100 chars)" % line_counter)
        #sys.exit(1)
        continue
    else:
        case_length = N + M
        # printf("DEBUG case length: %s\n" % case_length)
        delta = line_counter - case_index
        if delta <= N:
            """ lines that exist """
            existing_lines += [line]
        elif delta > N and delta < case_length:
            """ lines to create """
            create_lines += [line]
        elif delta == case_length:
            create_lines += [line]
            get_moves = mkdir_moves(existing_lines, create_lines)
            # printf("DEBUG existing_lines: %s\n create_lines: %s\n" % (existing_lines, create_lines) )
            printf("%s" % get_moves)
    line_counter += 1

You may notice that there are two meaningless clauses checking the validity of the parsed lines (see “bad input”). This is due to some seeming incongruity between the challenge definition and the provided test cases.
If these clauses are ignored, the script solves the problem to their checking mechanism’s satisfaction – feel free to comment them out and test it.
Also let me know if there is something I misunderstood in the definition that “No path will have more than 100 characters in it.“.

Anyhow, here is the script for download, along with the provided test cases.

4 Comments to "My Solution to Google Code Jam Round 1B 2010: ‘File Fix-it’"

  1. Hi wrote:

    Thanks for your post, this was my attempt:

    class Test:
        def __init__(self):
            self.existingdirs = []
            self.newdirs = []
            self.mkdirs = 0
           
    class Dir:
        def __init__(self, fullpath):
            self.fullpath = fullpath
           
        def parent(self):
            return self.fullpath[:self.fullpath.rfind('/')]


    #inputfile = 'A-small-practice.in'
    inputfile = 'A-large-practice.in'


    with open(inputfile) as f:
        theinput = f.readlines()

    numtests = int(theinput.pop(0))


    for i in range(0,numtests):
        t = Test()
       
        numexistingdirs, numnewdirs = theinput.pop(0).split()
       
        for i1 in range(0,int(numexistingdirs)):
            t.existingdirs.append(theinput.pop(0).strip())
       
        for i1 in range(0,int(numnewdirs)):
            d = Dir(theinput.pop(0).strip())
            t.newdirs.append(d)
       
        for currentdir in t.newdirs:
            direxists = False
           
            if not currentdir.fullpath in t.existingdirs:
               
                while currentdir.parent() and not direxists:
                    if currentdir.parent() in t.existingdirs:
                        direxists = True
                    else:
                        t.mkdirs += 1
                        t.existingdirs.append(currentdir.fullpath)
                        currentdir = Dir(currentdir.parent())
                       
                if currentdir.fullpath not in t.existingdirs:
                    t.mkdirs += 1
                    t.existingdirs.append(currentdir.fullpath)
                   
               
        print('Case #%s: %s' % (i + 1, t.mkdirs))
  2. Hi wrote:

    Ok tab spaces are ignored…oh well

  3. tom wrote:

    Fixed that for you and added formatting [didn't do anything else :) ]

  4. tom wrote:

    I like it! Concise, readable and object oriented. Awesome job, thanks for sharing!

Leave Your Comment

 
Powered by Wordpress. Theme by Shlomi Noach, openark.org
Hosted by Evolution On-line