Quantcast
Channel: CodeSection,代码区,网络安全 - CodeSec
Viewing all articles
Browse latest Browse all 12749

war story: caching

0
0

There was that one time I used strace and a Ruby script to bypass a really long step in a build pipeline. The trick was figuring out the inputs and outputs by running the process under strace and then looking at all the files that were opened, read, and written to and then combining those results with a Ruby script that could compute aggregate hashes and using those hashes as keys for caching outputs. The core of the script was a small utility class with some convenience methods for computing hashes

class TaggedInput @@allowed_tags = [:tar, :symlink, :file] attr_reader :tag, :path definitialize(tag, path) unlesstagin @@allowed_tags raiseStandardError, "Unknown input tag: #{tag}, #{path}" end @tag, @path = tag, path end defshasum sha = case @tag when :tar debug_file = @path.gsub('/', '_') + '.debug.tar' path = File.join(@path, '/') `find #{path} -type f -print0 | sort -z > tar_files` `tar -P --mtime='1970-01-01' --null --format=ustar --files-from=tar_files -cf - | tee #{debug_file} | shasum -` when :symlink `readlink #@path | shasum -` when :file `shasum #@path` end.split(' ')[0] if sha.nil? || sha.length < 10 raiseStandardError, "Could not properly compute sha: path = #@path, tag = #@tag" end sha end end deftagger(t, p); TaggedInput.new(t, p); end deftar(p); typer(:tar, p); end defsymlink(p); typer(:symlink, p); end deffile(p); typer(:file, p); end

The above was used in conjunction with some other files in various folders that were evaled at runtime

# ... # The inputs that will be hashed and the combined to serve as the key. Evaling # the file should result in an array that makes use of `tar`, `symlink`, and `file` methods inputs = eval(File.read(File.join(Dir.pwd, opts[:'input-file'])), binding) # Now generate the aggregate shasum to use as the key for the output cache input_hashes = inputs.map(&:shasum) aggregate_hash = Digest::SHA1.hexdigest(input_hashes.join(':')) # ...

After generating the aggregate hash it was used to generate a tar file of the required outputs

# ... # Similar to evaling inputs this file should result in an array that we can join with spaces # and pass to the tar command outputs = eval(File.read(File.join(Dir.pwd, opts[:'output-file']))) cache_file = "#{aggregate_hash}.txz" `envXZ_OPT=-1 tar -P -cJf #{cache_file} #{outputs.join(' ')} 2>&1` # ...

The resulting cache file was then moved around and shared across however many hosts was required so that whoever else needed the outputs could just compute the aggregate hash and then just download and unpack the file. The end result was the build times for that step were reduced from 5-30 minutes to less than 10 seconds on average and given how many times throughout the day we ran that step it was quite a bit of savings.


Viewing all articles
Browse latest Browse all 12749