YAML Parser Tuning
YAML files are good. They are [likely] human readable, the syntax in more or less minimalistic and they are nearly a standard for configuration files in ruby world. One day, though, everybody faces up a necessity to read YAML in some unusual, often weird manner.
Yesterday I was participating in answering a question on StackOverflow. The YAML file was to be parsed as usual, but with a tiny improvement: instead of leaves values there should be placed hashes like:
{ value: value, line: line }
where line is a line in original YAML file this leaf was met. The technique below actually is not stuck with this particular case; it demonstrates the common approach on how to parse YAML in a non-standard way.
The default parser in Ruby is Psych
. It is a good old AST builder. To improve
(read: change) it’s behaviour, one needs to bring three things on the table.
Node
Patching the node is pretty straightforward. We would store a line, so here we go:
class Psych::Nodes::Node
attr_accessor :line
end
TreeBuilder
TreeBuilder
uses visitor pattern to build a syntax tree. In general, it has
the only method of interest, TreeBuilder#scalar
, which is invoked on every
node. Lets’s deal with it a bit.
class EnchancedBuilder < Psych::TreeBuilder
# Line numbers are available to parser, not to builder; we need a backreference
attr_accessor :parser
# Main handler in TreeBuilder
# @param value [String] the value met
# @style [Integer] the type of entity met (scalar/int/array/etc)
def scalar value, anchor, tag, plain, quoted, style
s = super
# using the mark from a previous hit to handle multilined values
s.line = @line || 1
@line = parser.mark.line + 1 # marks are zero-based
s
end
end
Here we set the prepared Node.line
attribute and store the current value
of line of current entity.
ToRuby
The only thing left is to spit the newly introduced line
attribute to
generated ruby properly.
class Psych::Visitors::ToRuby
# There may be problems with Yaml mappings that have tags.
# @author @matt
def revive_hash hash, o
o.children.each_slice(2) { |k,v|
key = accept(k)
val = accept(v)
# This is the important bit. If the value is a scalar,
# we replace it with the desired hash.
if v.is_a? ::Psych::Nodes::Scalar
val = { "value" => val, "line" => v.line }
end
# Code dealing with << (for merging hashes) omitted.
# If you need this you will probably need to copy it
# in here. See the method:
# https://github.com/tenderlove/psych/blob/v2.0.13/lib/psych/visitors/to_ruby.rb#L333-L365
hash[key] = val
}
hash
end
end
That’s it. Now we are able to produce hashes as shown below from YAML.
key1: value1
key2:
- value21
- value22
would become
hash = {
'key1' => { 'value' => 'value1', 'line' => 1 },
'key2' => [
'value' => 'value21', 'line' => 3,
'value' => 'value22', 'line' => 4
]
}