Architecture of the Existing (110.97) Match Compiler

================================================================================
These are early notes made during the "discovery" phase while trying
to understand the workings of the Aitken match compiler
[amc1992]. They are largely concerned with working out the types and
the roles played by different types or components.
Most of these notes are not relevant to the new match compiler [mc2020].
================================================================================

This version of the match compiler was written by Bill Aitken in the
summer or 1992 and appeared somewhere between version 0.74
(1991.10.10) and 0.93 (1992.02.15).

1. Types in MCCommon

The result of OR expanding a rule is an _OR-family_, a set of patterns with
a common rhs lexp, and a common set of bound pattern variables


type ruleno = int
   index, zero-based, of a rule in a match (pre- or post- ?) OR expansion
   E.g. "allRules" defined in matchComp, which is the list after OR expansion

type ruleset = ruleno list = int list
   a list (ordered set) of rule indices, maintained in strictly ascending order
   should be an abstract type?
   
type dconinfo = datacon * tyvar list
   a dataconstructor (occurring in a pattern) with an "associated"? list of tyvars
   [Q: what is the relation between the datacon and the tyvars?]

* datacons and constants appearing in patterns  

  datatype pcon
    DATApcon of dconinfo --  a dataconstructor annotated with tyvars
    INTpcon, WORDpcon -- integer and word constants, with value (int IntConst.t)
    STRINGpcon of string -- a string constant
    VLENpcon of int * ty -- vector construction, with length and element type

* destructuring paths
    paths for accessing the position corresponding to a pattern-bound variable
    not "linear" because of the RECORDPATH constructor

  datatype path
    RECORDPATH of path list    -- ???
    PIPATH of int * path       -- a path to the ith record component?
    VPIPATH of int * ty * path -- a path to the ith vector component, of type ty
    DELTAPATH of pcon * path   -- a path after stripping off the given pcon,
                                 where the pcon must be DATApcon, or possibly VLENpcon [Q:]
    ROOTPATH                   -- path corresponding to top node

* andor trees
    AND-OR trees where all children (subtrees) of an AND note have to be matched, 
    at least one of the children (cases) of an OR note must be matched.  LEAF nodes
    correspond to atomic matches of either constants or varialbes [Q:]
    Each node has a list of _bindings_.
    Each node has a list of _constraints_.
    [Q: explain the nature and role of bindings and constraints.]
      [Q: do constraints arrise from layered patterns?]
    OR nodes involve matching one of the dataconstructors of a datatype, whose "shape"
      is characterized by sign: DA.consig
    Produced by the call of makeAndor in matchComp.

  datatype andor
    AND   -- all subtrees must match
      subtrees : andor list 
      bindings : (ruleno * var) list
   [obs -- constraints : constraint list ]
    
    OR    -- one of the andor-cases must be matched, determined by pcon
      cases : andor_case list
      sign : DA.consig
      bindings : (ruleno * var) list  -- variables bound at this node
                                      -- and the rule in which they are bound
         [obs -- constraints : constraint list ]

    LEAF  -- analysis complete (except what to do with constraints)
      bindings : (ruleno * var) list
   [obs -- constraints : constraint list ]

   [obs --
    constraint
      dcon : dconinfo
      rules : ruleset
      andorOp : andor option ]
      
    what are bindings?
       binding = ruleno * var
         a variable bound at a node and the rule it belongs to

    what are constraints? where do they come from? and what do they mean?
       constraints used to arrise when processing general layered
       patterns where the left pattern was structured (not a var)
    should the common bindings, [constraints] fields be factored out of the datatype?

* decision

  datatype decision
    CASEDEC
    BINDDEC

* dectree (decision tree)
    decision trees are the penultimate product, finally translated into an lexp
    produced by genDecisionTree,
    consumed by translation to lexp (codeTy) by generate

  datatype dectree
    CASETEST  -- switch on a datacon
    BIND      -- bind a variable and continue matching (layering)
    RHS       -- dispatch to corresponding RHS


* preProcessRule: preprocessing the match rules
  input: (type translation), rule (pat * lexp)
  output: multiRule

  type multiRule =
     {pats: (pat * path list) list,
        -- one for each orFamily pattern, thus just one if no OR;
        -- for each pat, path list gives access paths for the pattern
           bound vars
        -- INVARIANT: all path list components have same length = # of
           bound vars	      
      fname: lvar,
        -- a newly generated lvar uses as an "identifier" for the rule
           family, maintains connection between patterns and the shared rhs lexp
      rhsFun: Plambda.lexp}
        -- the previously translated shared RHS expression (lexp),
           abstracted over pat-bound variables (or over unit if none)
     

  val preProcessRule : toLtyTy    (* type to FLINT type translators *)
                       -> Absyn.pat * Plambda.lexp    (* a rule *)
                       -> multiRule

  type matchRep = multiRule list
    -- type of multiRules in matchComp, defined by mapping
       preProcessRule over original match rules

  type matchRepTy = ((path * pat) list * path list * lvar) list


* makeAndor: building AND-OR tree from expanded set of lhs patterns
  input: list of LHS patterns after OR-expansion
  output: andor

* flattenAndor: andor * path * ruleset -> decision list
  input: andor (produced by makeAndor), ruleset
  output: (path? * decision list) list
  -- called once in matchComp, result bound to "flattened"

* flattenAndors : (path * andor) list * ruleset -> (path list * decision list) list

* fireConstraint: path                                -- initially ROOTPATH
                  * (path list * decision list) list  -- from flattenAndors (needPaths, decisions)
                  * decision list                     -- ready: initially nil
		  * (path * decision list) list       -- delayed: initially nil
                  -> decision list                    -- new ready
		  * (path * decision list) list       -- new delayed
  input: path?, (path? * decision) list (from flattenAndors),
         (ready: decision list; accum), (delayed: (path * decision list) list; accum)
  output: ready, delayed
  called in matchComp, result bound to "ready_delayed"

* genDecisionTree: decision list                      -- "ready"
                * (path * decision list) list         -- "delayed"
		* ruleset                             -- initially allRules
		-> dectree
  input: (decision list (ready) * (path * decision list) list (delayed)),
         ruleSet (allRules)
  output: dectree (decision tree)
  construct decision tree from output of fireConstraint

ready : decision list
delayed : (path * decision list) list

makeAndor:      pat list  ---------------->  andor  (new version)

makeAndor:      matchRep               ---------------->  [(path,andor)]
flattenAndors : [(path,andor)],ruleset ---------------->  decision list * (path * decision list) list
fireConstraint: path,(paths,decisions),ready,delayed -->  ready, delayed
genDecisionTree: ready,delayed,ruleset ---------------->  decisiontree

-------------------------------------------------------------------------


Old version of transVB that produces exp -> exp function
(excerpt from matchcomp/transmatch.sml)

(* transVB : AS.vb * AS.exp -> AS.exp
    -- verion that takes a body, repaced by dec -> dec version below
(* -- can we get away without a d (DB depth) parameter? Leaving it to Translate?
 * -- looks like we can get away with never dealing with an internal svar in the match.
 * -- we need to access or (re)construct the type of the pat.
 *      Could store this as a field of VB.
 * -- do we need an absyn equivalent to mkPE, say transPolyExp? We don't have an equivalent
 *      to TFN in absyn -- yet! *)
and transVB ((VB{pat, exp, typ, boundtvs, tyvars}), body) =
    (* match compile [(pat,exp)] if pat is nontrivial (not a var);
     * -- check what the match compiler does with (single) irrefutable patterns
     *    DONE -- it does the right thing.
     * -- body is the "scope" of the VB binding *)
    (case AU.stripPatMarks pat
       of (VARpat var | CONSTRAINTpat(VARpat var, _)) =>
	  simpleLet(var, exp, body)
	| pat =>
	  let val patvars = AU.patternVars pat
	      val patvarstuple = EU.TUPLEexp (map (fn var => VARexp(ref var, nil)) patvars)
	      val patvarstuplety = Tuples.mkTUPLEtype (map V.varType patvars)
	      val (matchExp, matchVar) =  (* matchVar will be bound to trans of exp *)
		  MC.matchComp([RULE(pat,patvarstuple)], typ, EU.getBindExn())
	      val topVar = V.VALvar{path = SP.SPATH [S.varSymbol "topVar"],  (* old "newvar" *)
				    typ = ref(patvarstuplety),
				    btvs = ref(boundtvs),
				    access = A.LVAR(LambdaVar.mkLvar()),
				    prim = PrimopId.NonPrim}
	      fun wrapLets([], _, body) = body
		| wrapLets (pvar::pvars, n, body) =
		    simpleLet(pvar, SELECTexp(topVar,n,true), body)
		    (* defining a pattern var by (record) selection from a "related" var.
                     * related in the sense that the btvs or the first is a subset of
		     * the btvs of the second. *)
	  in simpleLet(topVar,
		       simpleLet(matchVar, exp, matchExp),
		          (* binds topvar to tuple of values matching the pattern vars *)
		       wrapLets(patvars, 0, body)) (* rebinding orig pattern variables *)
	  end)
 *)
		  


(* OBS	  (case dec
	     of VALdec [VB{pat = VARpat topVar, exp = bvarsExp, boundtvs = boundtvs, ...}] =>
	        (if null boundtvs
		 then mkDec (dec,d) (mkExp0 body)  (* nonpolymorphic case *)
		 else  (* boundtvs <> nil => polymorphism, derived from binding for an irrefutable pattern *)
		     let fun transSelections exp =
			     (case exp
			       of LETexp (VALdec [VB{pat = VARpat bvar,
						     exp = SELECTexp(topVar, i, true),...}], body') =>
				  (case bvar
				    of V.VALvar{access = DA.LVAR bvarLvar, btvs, ...} =>
				       let val btvs = !btvs
					   val defnExp =
					       if null btvs (* nonpolymorphic pattern variable *)
					       then SELECT(i,TAPP(VAR(V.varLvar topVar),
								  map (fn _ => LT.tcc_void) boundtvs))
					       else let val btvsArity = length btvs
							val indices = List.tabulate(btvsArity, (fn x => x))
							val tvToIndex = ListPair.zip(btvs,indices)
							fun lookup (tv: Types.tyvar, nil) = NONE
							  | lookup (tv, (tv',k)::r) =
							    if tv = tv' then SOME k else lookup tv r
							val targs = map (fn tv => case lookup tv tvToIndex
										   of NONE => LT.tcc_void
										    | SOME k => LT.tcc_var(1,k))
									boundtvs
						    in TFN(LT.tkc_arg (btvsArity),  (* tuple of "mono" kinds *)
							   SELECT(i, TAPP(VAR(V.varLvar topVar),targs)))
						    end
				       in LET(bvarLvar, defnExp, tranSelections body')
				       end)
				| _ => exp)
			 val finalBody = transSelections body
		      in LET(topVar, transPolyExp (bvarsExp, d, boundtvs), mkExp0 finalBody)
		     end)
	      | _ =>  mkDec (dec, d) (mkExp0 body)
	   (* end case *))
*)
