This describes the internals of the uweb program. It also serves as an example on how to write files using it. See the uweb manual page for a description of the command line flags, input format, etc.
The program requires very few gems, most notably erb. It also supports being compiled into a standalone binary using rubyscript2exe.
require 'set' require 'getoptlong' require 'erb' begin require 'rubyscript2exe' rescue LoadError # Not using rubyscript2exe. end
The main program is pretty boring: parse command line, load templates, process input, done.
def main
parse_options
load_templates
$did_embed_chunks = false
process_file($input)
Chunk.verify_all_used if $warnings
end
def die(message)
abort("#{$in_path}(#{$at_line_number}): #{message}")
end
def parse_options
$template_dir = defined?(RUBYSCRIPT2EXE) ? RUBYSCRIPT2EXE.exedir \
: File.dirname($0)
$warnings = true
GetoptLong.new([ '--help', '-h', GetoptLong::NO_ARGUMENT ],
[ '--template-dir', '-t', GetoptLong::REQUIRED_ARGUMENT ],
[ '--output-file', '-o', GetoptLong::REQUIRED_ARGUMENT ],
[ '--no-warnings', '-n', GetoptLong::NO_ARGUMENT ]) \
.each do |option, argument|
case option
when '--help'
usage
when '--template-dir'
$template_dir = argument
when '--output-file'
$stdout = File.open(argument, 'w')
when '--no-warnings'
$warnings = false
else
abort("Unknown option \"#{option}\"")
end
end
case ARGV.length
when 0
$input = '-'
when 1
$input = ARGV[0]
else
usage
end
end
Since RDoc::usage is no longer with us, we provide our own version. The output is in valid AsciiDoc format and can be passed to asciidoctor to generate a manual page or an HTML file.
# Since RDoc::usage is no more. def usage print $usage exit(0) end
We let erb do the heavy lifting for us here.
def load_templates
$chunk_template = load_template('uweb.chunk')
$nest_template = load_template('uweb.nest')
end
def load_template(path)
path = $template_dir + '/' + path
abort("Missing template file \"#{path}\"") unless File.exist?(path)
return ERB.new(IO.read(path))
end
We simply scan the input one line at a time, comparing them with the pre-established patterns.
def process_file(file)
each_file_line(file) do |line|
case line
when $source_pattern
scan_source($source_pattern.extract(line))
when $chunk_pattern
embed_chunk($chunk_pattern.extract(line))
when $include_pattern
process_file($include_pattern.extract(line))
else
puts line
end
end
end
def each_file_line(path)
$in_path ||= nil
save_path = $in_path
$in_path = path
$at_line_number ||= nil
save_line_number = $at_line_number
$at_line_number = 0
in_file = path == '-' ? $stdin : in_file = File.open($in_path, 'r')
in_file.each_line do |line|
line.chomp!
$at_line_number += 1
yield line
end
$in_path = save_path
$at_line_number = save_line_number
end
Scanning sources
The patterns we look for in the input X/HTML files are:
$source_pattern =
Pattern.new("<link rel='source'> tag",
/ <
\s* link
\s .* rel
\s* =
\s* ['"]? source ['"]?
/ix,
/ ^
\s* <
\s* link
\s+ rel
\s* =
\s* ['"] source ['"]
\s+ href
\s* =
\s* ['"] (.+) ['"]
\s* \/?
\s* >
(?: \s* <
\s* link
\s* \/
\s* > )?
\s* $
/ix)
$include_pattern =
Pattern.new("<a rel='include'> tag",
/ <
\s* a
\s .* rel
\s* =
\s* ['"] include ['"]
/ix,
/ ^
\s* <
\s* a
\s+ rel
\s* =
\s* ['"] include ['"]
\s+ name
\s* =
\s* ['"] (.+) ['"]
\s* \/?
\s* >
(?: \s* <
\s* a
\s* \/
\s* > )?
\s* $
/ix)
$chunk_pattern =
Pattern.new("<a rel='chunk'> tag",
/ <
\s* a
\s .* rel
\s* =
\s* ['"] chunk ['"]
/ix,
/ ^
\s* <
\s* a
\s+ rel \s* =
\s* ['"] chunk ['"]
\s+ name
\s* =
\s* ['"] (.+) ['"]
\s* \/?
\s* >
(?: \s* <
\s* a
\s* \/
\s* > )?
\s* $
/ix)
We also scan any sources the X/HTML input links to:
$is_in_unscanned_chunk = false
def scan_source(path)
die("<link rel='source'> after <a rel='chunk'>") if $did_embed_chunks
instances = []
each_file_line(path) do |line|
if $end_pattern === line
name = $end_pattern.extract(line)
if not $is_in_unscanned_chunk
die('End chunk outside any chunk') unless instances.size > 0
die('End chunk "' + name \
+ '" does not match start chunk "' + instances.last.chunk.name + '"') \
if name and name != instances.last.chunk.name
instances.last.end_scan
instances.pop
elsif name == '!uweb'
$is_in_unscanned_chunk = false
end
next
end
if !$is_in_unscanned_chunk and $begin_pattern === line
name = $begin_pattern.extract(line)
$is_in_unscanned_chunk = name == '!uweb'
unless $is_in_unscanned_chunk
die("Chunk #{name} contains itself") \
if instances.any? { |instance| instance.chunk.name == name }
parent = instances.last
instance = Chunk.begin_scan(name)
instances.push(instance)
parent.add(instance) if parent
end
next
end
instances.last.add(line) if instances.last
end
die("Missing end of chunk \"#{instances.last.chunk.name}\"") \
if instances.size > 0
end
The patterns we look for in the source files are:
$begin_pattern =
Pattern.new('Begin chunk',
/ (?: \{\{\{
| \# \s* region )
/ix,
/ ^
(?: \W* \{\{\{
| \s* \# \s* region )
\s+ ( .+? )?
(?: \s* \W+ )?
\s* $
/ix)
$end_pattern =
Pattern.new('End chunk',
/ (?: \}\}\}
| \# \s* endregion )
/ix,
/ ^
(?: \W* \}\}\}
| \s* \# \s* region )
(?: \s+ ( .+? ) )?
(?: \s* \W+ )?
\s* $
/ix)
A pattern is just a glorified regular expression. There is some trickiness in using global state to decide whether to complain or ignore "bad" lines, used to implement the special !uweb section.
class Pattern
def initialize(name, detect_regexp, extract_regexp)
@name = name
@detect_regexp = detect_regexp
@extract_regexp = extract_regexp
end
def ===(line)
return line =~ @detect_regexp
end
def extract(line)
match = @extract_regexp.match(line)
die('Invalid ' + @name) unless $is_in_unscanned_chunk or match
return match && match[1]
end
end
The whole point of the program is to embed chunks of sources into the generated X/HTML:
def embed_chunk(name)
$did_embed_chunks = true
chunk = Chunk.by_name(name)
chunk.is_used = true
# TRICKY: Captured by the binding.
name = name = chunk.name
refers_to = refers_to = chunk.refers_to.size == 0 ? nil : chunk.refers_to.entries.sort
nested_in = nested_in = chunk.nested_in.size == 0 ? nil : chunk.nested_in.entries.sort
locations = locations =
chunk.instances.entries \
.map { |instance| instance.location } \
.sort
lines = lines = \
chunk.instances.entries[0].content \
.map { |content| content.class == Instance ? nest_chunk(chunk, content) \
: content.escape_xhtml }
$chunk_template.run(binding)
end
def nest_chunk(from_chunk, to_instance)
# TRICKY: Captured by the binding.
indent = indent = to_instance.indentation || ''
from = from = from_chunk.name
to = to = to_instance.chunk.name
return $nest_template.result(binding).chomp
end
To support the above, we need to track all the chunks we extracted in the source files:
class Chunk
def Chunk.begin_scan(name)
@@chunk_by_name ||= {}
return Instance.new(@@chunk_by_name[name] ||= Chunk.new(name))
end
def Chunk.by_name(name)
@@chunk_by_name ||= {}
chunk = @@chunk_by_name[name]
die("Unknown chunk \"#{name}\"") unless chunk
return chunk
end
def Chunk.verify_all_used
exist_unused = false
@@chunk_by_name.keys.sort.each do |name|
next if @@chunk_by_name[name].is_used
$stderr.puts("Chunk \"#{name}\" was not used")
exist_unused = true
end
exit(1) if exist_unused
end
attr_reader :name, :instances, :refers_to, :nested_in
attr_accessor :is_used
def initialize(name)
@name = name
@instances = Set.new
@refers_to = Set.new
@nested_in = Set.new
@is_used = false
end
def add(new_instance)
old_instance = @instances.entries.last
abort('Chunk "' + @name \
+ '" instance at file "' + new_instance.location.path \
+ '" line ' + new_instance.location.first_line.to_s \
+ ' has different content than instance at ' \
+ 'file "' + old_instance.location.path \
+ '" line ' + old_instance.location.first_line.to_s) \
if old_instance and not old_instance.has_same_content?(new_instance)
@instances.add(new_instance)
end
end
To generate a nice output and better error messages, we also need to track, for each chunk, the location(s) it appeared in the source:
class Location
attr_reader :path, :first_line, :last_line
def initialize
@path = $in_path
@first_line = $at_line_number
end
def done
@last_line = $at_line_number
end
def <=>(other)
return @path == other.path ? @first_line <=> other.first_line \
: @path <=> other.path
end
end
Since a chunk can appear in multiple locations, we have a distinct notion of a chunk instance:
class Instance
attr_reader :chunk, :location, :content, :indentation
def initialize(chunk)
@chunk = chunk
@location = Location.new
@content = []
@indentation = nil
end
def add(content)
@content.push(content)
return unless content.class == Instance
@chunk.refers_to.add(content.chunk.name)
content.chunk.nested_in.add(@chunk.name)
end
def end_scan
@indentation = @content.map { |c| c.indentation }.compact.min.clone || ''
@location.done
@content.each do |content|
if content.class == Instance
content.indentation.sub!(@indentation.clone, '') \
if @indentation.size > 0
else
content.chomp!
content.sub!(@indentation, '') if @indentation.size > 0
end
end
@chunk.add(self)
end
def has_same_content?(other)
return false unless @chunk == other.chunk
return false unless @location.first_line - @location.last_line \
== other.location.first_line - other.location.last_line
return false unless @content.size == other.content.size
@content.each_index do |i|
content = @content[i]
other_content = other.content[i]
return false unless content.class == other_content.class
if content.class == Instance
return false unless content.has_same_content?(other_content)
return false unless (content.indentation == '') \
== (other_content.indentation == '')
else
return false unless content == other_content
end
end
return true
end
end
To support all the above, we monkey-patch the Array and String classes, adding useful methods to them:
class Array
def map_with_index!
each_with_index do |entry, index|
self[index] = yield(entry, index)
end
end
def map_with_index(&block)
dup.map_with_index!(&block)
end
end
class String
def indentation
return nil if self == ''
return /^\s*/.match(self)[0]
end
def escape_xhtml
return self.gsub(/&/, '&').gsub(/</, '<').gsub(/>/, '>')
end
def idify
return self.gsub(/\W+/, '-')
end
end