Lazy initialization for recursive bindings#4283
Lazy initialization for recursive bindings#4283JordanMartinez merged 1 commit intopurescript:masterfrom
Conversation
4bce1d2 to
d9324a9
Compare
|
Thanks! (and wow... this looks complicated.) I'm going to take a look at this tomorrow. |
|
Ha, hopefully it isn't as bad as it seems! (The main file is 60% comments, which I suppose is its own sort of red flag....) I forgot to mention: re-running the benchmark and analysis I ran in #4179 (comment) produced few meaningful differences, with the one exception that the number of bindings made lazy has dropped from 2% then (142/6547) to just under 1% (65/6611). (The denominators are different because the package set libraries have changed!) Minified bundle sizes are still almost 4% larger with the laziness patch though, so those per-module runtime functions seem to be contributing the bulk of the bloat. If 4% is something we can live with for this release, we can design and implement an actual runtime module after 0.15 without changing any semantics, I think. |
Given that |
|
This looks great, and the comments are very helpful. I don't think the volume of them is pointing to a bad thing, the concept is simple enough, there's just a bunch of details involved in making it a reality 😉 (the infinite graph being the main culprit for that, understandably). |
|
Here's my summary of this doc comment: https://github.com/purescript/purescript/pull/4283/files#diff-052739b45547126f69f413fa5df905734d356e26eee8bf58e9cabd4eadd6799fR67-R77 Delay = # of lambdas enclosing a var (always known)
Force = # of args being applied to expression (sometimes known)
My own syntax reference
I'm trying to understand how the rules work. Below is my current understanding, but I'm not sure I'm getting it right:
|
|
As an aside, I have to assume that somewhere someone has come up with this sort of delay/force analysis before now (probably with different names); if any of you know of a standard paper I could reference or even set of terminology I could use instead of developing the concepts from scratch, I figure that would be a big help. I like your syntax annotations. I'll propose a slight extension:
Force is a measure of ‘how much’ an expression like
Similarly,
I would write this as
See my |
Ah! That clears up probably my biggest misunderstanding. Let me try redefining the rules in light of this clarification. -- D-Init - "a Var with delay 0 needs to be initialized
-- before the enclosing expression is evaluated"
let
binding = (x#0 + 1)!0
in ...
-- Translation:
-- `x` needs to be initialized before the enclosing expression, `x + 1`, is evaluated
-- D-Delay - "a Var with more delay can wait"
let
binding = (\_ -> x#1 + 1)!1
in ...
-- Translation:
--`x` does not need to be initialized
-- F-Init0 - "a Var with force 0 only needs to have its
-- delay-0 references initialized in order to be ready for use"
let
foo = (1)!0
bar = (foo!0#0)!0
in ...
-- Translation:
-- `bar` cannot be _used_ until `foo` has been initialized
-- However, `bar` _can_ be initialized before `foo` has been initialized
-- F-Init1 - "a Var with force 1 needs to have all of its
-- delay-1 references initialized as well"
let
foo = (1)!0
lazyFoo = (\_ -> foo#1)!1
bar = (lazyFoo!1#0 true!0#0)!0
in ...
-- Translation:
-- For `lazyFoo`, `foo` does not need to be initialized due to the rule, D-Delay.
-- However, `bar` cannot be _initialized_ until after `foo` has been initialized
-- because the force needed to activate `foo` in `lazyFoo` is being applied.
-- Inferred rule based on `f!*` comment.
-- F-InitStar - "a Var with force * needs to have all of its
-- references (regardless of delay) initialized as well"
foreign import addFFI :: (Unit -> Int) -> Int -> Int
let
foo = (1)!0
lazyFoo = (\_ -> foo#1)!1
bar = (addFFI!2 lazyFoo!*#0 2!0#0)!0
in ...
-- Translation:
-- Normally, `foo` in `lazyFoo` does not need to be initialized via rule, D-Delay.
-- However, `bar` cannot be _initialized_ until after all entities referenced
-- by `lazyFoo` (i.e. `foo`) have been initialized because we don't know
-- what `addFFI` will do with `lazyFoo`.
-- Since `lazyFoo` _might_ be used, we have to assume the force
-- needed to activate all of `lazyFoo`'s references is being applied.
-- Without knowing how much force is needed to fully activate all entities
-- referenced by `lazyFoo`, we use infinity as the amount of force. |
This should be Now, to head off a different potential misunderstanding, I also said that in the bindings of a let
f = (...)!0
g = let
x = (...)!*
y = (...)!*
in (...)!0
in ...That is, assuming that
F-Init0 doesn't really apply to But D-Init also applies to this situation. The
I'll revise those annotations: let
foo = (1)!0
lazyFoo = (\_ -> foo#1)!0
bar = (lazyFoo!1#0 true!*#0)!0
in ...But your translation is correct!
Again, fully correct translation, but the annotations should be: let
foo = (1)!0
lazyFoo = (\_ -> foo#1)!0
bar = (addFFI!2 lazyFoo!*#0 2!*#0)!0
in ...This is all great feedback for me, by the way—in particular I'm learning that I should consolidate the language that describes the judgments about initialization order that can be made given delay and force information. In addition to the rules you've named so far, there's one more big one that I reveal a little later in the file:
This generalizes to:
The example from my comments, with relevant references newly annotated, is: f = g#0!2 1 2
g x = h#1!2 3 x
h x y z = f#3!0To initialize |
Perhaps we should instead use the rules you've defined on the golden tests themselves? Since those are the "pretty tricky" examples that are otherwise hard to come up with? |
Yeah, right on. I just pushed a fixup with a rewording of the delay/force explanation. The new explanation uses three explicit rules: purescript/src/Language/PureScript/CoreFn/Laziness.hs Lines 160 to 164 in 9d0fdcb purescript/src/Language/PureScript/CoreFn/Laziness.hs Lines 197 to 201 in 9d0fdcb And here's one of the more interesting golden tests, with top-level references annotated: alpha :: Int -> Int -> Number
alpha = case _ of
0 -> bravo#1!0
1 -> charlie#1!0
2 -> \y -> if y > 0 then bravo#2!1 y else charlie#2!1 y
x -> \y -> delta#2!2 y x
bravo :: Int -> Number
bravo = (\_ -> alpha#0!1) {} 3
charlie :: Int -> Number
charlie = (\_ -> alpha#0!1) {} 4
delta :: Int -> Int -> Number
delta =
let b = (\_ -> bravo#0!*) {}
in \x y -> if x == y then b 0 else 1.0
USE-USE says that neither Finally, initializing So the expected result is: var alpha = ...;
var $lazy_bravo = ...;
var $lazy_charlie = ...;
var bravo = $lazy_bravo(...);
var charlie = $lazy_charlie(...);
var $lazy_delta = ...;
var delta = $lazy_delta(...); |
|
I'll be looking over more of your comment and the new changes after posting this comment, but I wanted to at least get this out. I noticed that I was using In the doc comments you pushed, perhaps it would be beneficial to use this The initial reading of
|
|
Using your example above, here's all the possible control flows alpha 0 3alpha 1 4alpha 2 3alpha 2 0alpha 0 0alpha 8 0 |
|
I found it easier to understand these rules when I made the following changes:
In other words, Since a binding can ultimately reference itself (e.g. per your example, When To better understand the relationship between Calculation Table (Sorted by `f`, `d`, and `g`)
Calculation Table (sorted by `h`)
This rule works with Initializing |
|
Does code still compile if an invalid recursive binding is defined but never used? For example... foo = 1
where
f = g 1 2
g x = h 3 x
h x y z = f |
Your rewrite of USE-USE is missing an important conditional which these tables should also incorporate: ‘provided
In general, there's nothing in this PR that changes which programs compile. Think of the cycle detection done here as an optimization—with this PR, the default semantics of a recursive binding group is for all the bindings to be lazy with run-time initialization checking, and the analysis just removes the overhead of laziness where it can be proven superfluous. PureScript currently considers that example invalid anyway, because the reference to |
Ah... Gotcha. Let me revise that. |
Sorry for the delay. I forgot about this part, but you previously stated that Force is a non-negative integer, which makes sense since it represents the number of args passed to a function and we can't have a negative number of args. Moreover, "provided In other words, I believe the
Is that rendering correct? |
|
If so, here are updated tables (that also include an example with an explanation): Calculation Table (Sorted by `f`, `d`, and `g`)
Calculation Table (sorted by `h`)
|
|
I'm pretty sure I got the |
|
Thanks. That makes sense! |
| cutLoops :: (Int, force) -> MaxRoseTree m force -> MaxRoseTree m force | ||
| cutLoops (i, force) = go | ||
| where | ||
| go = (=<<) . IM.traverseWithKey $ \i' (MaxRoseNode force' inner) -> | ||
| MaxRoseNode force' <$> if i == i' then guard (force >= force') $> pure IM.empty else pure $ go inner |
There was a problem hiding this comment.
Just explaining this to myself as it took me a while to understand how and why this worked.
This function can be rewritten to a more pointful version:
-- Since `cutLoops` is used in an applicative context (`cutLoops <*> ...`),
-- we need to return back a function in some context (i.e. `f (a -> b)`).
-- Hence, the final `(\aToMB ma -> aToMB =<< ma)` part.
-- The result of the `IM.traverseWithKey` is passed to the function,
-- so we end up with: `\ma -> ma >>= IM.traverseWithKey ...`
go = (\aToMB ma -> aToMB =<< ma) . IM.traverseWithKey $ \i' (MaxRoseNode force' inner) ->
-- Identifiers are mapped to an integer. So, this searches
-- for the specified identifier. If found, runs the next part
-- or keeps searching within the nested maps.
if i == i' then do
-- We found the identifier in the current map.
-- This is a Maybe monad context.
-- If a force (i.e. `force'`) is found that is
-- greater than the current force (i.e. `force`)
-- or `force < force'`, this short-circuits the computation
-- by passing `Nothing` to `bind`.
-- Conceptually, this aborts the search
guard (force >= force')
-- If we get past the guard check,
-- then we rewrite the map to an empty one.
pure $ MaxRoseNode force' (pure IM.empty)
else do
-- keep searching for the identifier in the graph
-- by traversing to the next map
pure $ MaxRoseNode force' (go inner)In short, the infinite graph is defined via mem and each of its nodes are memoized via memoizedNodes. Then, for each identifier, cutLoops is called which rewrites references to that identifier in mem to either abort (i.e. Nothing) or be an empty map (i.e. Just IM.empty). Because all keys in the maps comprising the graph correspond to one of the identifiers and because each identifier is called with cutLoops, eventually, all identifiers will be rewritten. In other words, what appears to be an infinite loop in IM.traverseWithKey \_ -> if match then rewrite else loop eventually terminates because IM.traverseWithKey will be called with an empty map. Or at least, that's my current understanding.
There was a problem hiding this comment.
You got the internals correct; not sure about your first comment block. Remember that MaxRoseTree m force is a type alias for m (MonoidalIntMap (MaxRoseNode m force)); the outer (=<<) manages that m, since traverseWithKey gives us a Kleisli arrow (simplifying the types) Map -> m Map, and we want go to be a function m Map -> m Map (= Tree -> Tree).
| -- Now do the first delay/force traversal of all the bindings to find | ||
| -- references to other names in this binding group. | ||
| -- | ||
| -- The parts of this type mean: | ||
| -- D is the maximum force (or Nothing if unknown) with which the identifier C | ||
| -- is referenced in any delay-B position inside the expression A. | ||
| -- | ||
| -- where A, B, C, and D are as below: | ||
| -- A B (keys) C (keys) D | ||
| findReferences :: Expr Ann -> IM.MonoidalIntMap (IM.MonoidalIntMap (Ap Maybe (Max Int))) | ||
| findReferences = (getConst .) . onVarsWithDelayAndForce $ \delay force _ -> \case | ||
| Qualified mn' ident | all (== mn) mn', Just i <- ident `S.lookupIndex` names | ||
| -> Const . IM.singleton delay . IM.singleton i $ coerceForce force | ||
| _ -> Const IM.empty |
There was a problem hiding this comment.
To clarify this with our current syntax of # (delay) and ! (force), A, B, C, and D refer to this, right?
C = ...
A = C#B!D
and produces a map of B -> C -> D for each C-B-D combination.
There was a problem hiding this comment.
Oh wait... that's not quite right. A is an expression and part of it contains C#B!D.
There was a problem hiding this comment.
Yeah, I'm not sure how helpful the syntactical annotations are for this one, but an example would be
exprA = ... refC1#B1!D1 ... refC1#B1!D2 ... refC1#B2!D3 ... refC2#B1!D4 ... refC2#B1!* ...
findReferences <initializer of exprA> =
IM.fromList [(B1, IM.fromList [(C1, Just (D1 `max` D2))
,(C2, Nothing)
])
,(B2, IM.fromList [(C1, Just D3)])
]
| -- Using the approach explained above, traverse the reference graph generated | ||
| -- by `refsByIndex` and find all reachable names. | ||
| -- | ||
| -- The parts of this type mean: | ||
| -- D is the maximum force with which the identifier C is referenced, | ||
| -- directly or indirectly, during the initialization of identifier A. B is | ||
| -- Nothing if the analysis of A was inconclusive and A might need the entire | ||
| -- binding group. | ||
| -- | ||
| -- where A, B, C, and D are as below: | ||
| -- A B C (keys) D | ||
| reachablesByIndex :: A.Array Int (Maybe (IM.MonoidalIntMap (Max Int))) | ||
| reachablesByIndex = searchReachable maxIdx $ \(i, force) -> | ||
| getAp . flip IM.foldMapWithKey (dropKeysAbove force $ refsByIndex A.! i) $ \delay -> | ||
| IM.foldMapWithKey $ \i' force' -> | ||
| Ap $ IM.singleton i' . Max . (force - delay +) <$> uncoerceForce force' |
There was a problem hiding this comment.
This function calculates h = f - d + g, but also accounts for expressions where an identifier occurs multiple times with different force. For example:
-- given `C` is some identifier
C = ...
-- and `C` is used multiple times in the initialization
-- of `A` with forces, `x`, `y`, and `z`...
A = C#_!x ... C#_!y ... C#_!z
-- Then `D` is the maximum of those force values
-- (e.g. `max [x, y, z]`)
Or put differently, who cares if C is called with force 1 or force 2 because if C is called with force 5, that's the case we need to care about.
There was a problem hiding this comment.
Correct, but note that this covers not just the places that C appears in the initializer of A as you illustrated, but also all the other places that C occurs in the binding group that could be indirectly used during the evaluation of the initializer of A.
| -- If `reachablesByIndex` is a sort of labeled relation, this function | ||
| -- produces part of the reverse relation, but only for the edges from the | ||
| -- given vertex. | ||
| -- | ||
| -- The parts of this type mean: | ||
| -- The identifier A is reachable from the identifier B with maximum force C | ||
| -- (B is also the index provided to the function). | ||
| -- | ||
| -- where A, B, and C are as below: | ||
| -- (B) A B (singleton key) C | ||
| reverseReachablesFor :: Int -> IM.MonoidalIntMap (IM.MonoidalIntMap (Ap Maybe (Max Int))) | ||
| reverseReachablesFor i = case reachablesByIndex A.! i of | ||
| Nothing -> IM.fromAscList $ (, IM.singleton i $ Ap Nothing) <$> [0..maxIdx] | ||
| Just im -> IM.singleton i . Ap . Just <$> im |
There was a problem hiding this comment.
Using the delay/force syntax, this is...
A = ...
B = A#_!x ... A#_!y ... A#_!z
C = max [x, y, z]
There was a problem hiding this comment.
Yup, again with the comment about indirect use, which this inherits from reachablesByIndex.
|
One concern I have with this analysis is whether this is something that really needs to be spec'ed as a language guarantee, and inserted as corefn. #4179 shows that our "escape hatch" for instance cycles is unsound under strictness. This cycle analysis happens well before corefn. The analysis in this PR happens as part of JS code generation in order to restore that soundness for the JS backend. Does that not mean that all backends which have strict initialization (which is likely most of them) will suffer the same issue and need to implement the same analysis? Is the analysis for ordering an absolute requirement, or do we only need to spec that mutual binding groups must be lazily initialized to be sound? |
|
What I want to avoid, as a reference backend, is backend "optimizations" that core code will end up relying on for soundness, wherein if a backend does not implement the same optimization, code will flat out not work. I'm thinking of previous issues with the function composition inliner. Would it be reasonable to add this to corefn, such as an extra |
Yes, but shouldn't they be able to reuse this module since it operates on CoreFn? The backend just needs to provide an appropriate implementation of
No, I think with a natively-lazy backend, or a backend that unconditionally makes recursive initializers lazy, we don't have soundness problems. The language spec would say that it must be at least a run-time error to dereference a variable before it has initialized, and that variables must be initialized in a run-time order that prevents such errors if one exists, but that an implementation may raise such errors at compile time if it can prove that no such order exists, and/or elide the run-time code that does the initialization check if it can statically determine such an order. A backend could theoretically do more advanced analysis than what I've implemented here, in which case even fewer bindings would need to be lazy. Also, with external CoreFn transformations, we would want the JS backend not to emit unsound code if a transformation changes a dependency. That's why I lean toward making this a post-CoreFn analysis. |
This assumes backends are implemented in Haskell.
This is what concerns me, so I just need clarification that this is correct (which I think is what you're saying):
If that's the case, I still think this sort of thing would be extremely helpful to have in corefn as an annotation (for more casual backend implementors), but I don't think that should block this PR. |
|
I also think that even though this is fixing a bug, it does have downstream effects on backends if we are going to exploit this in passes before codegen (such as in corefn optimizations that rely on lazy recursive semantics). So it may be worth noting as a breaking semantic change for backend implementors. |
|
I also think it's worth pointing out the alternative, which is to take out the instance escape hatch. Is there a reason we aren't considering that as a way to solve #4179? That would mean that laziness is then not necessarily required for soundness (for anything the compiler produces), so it wouldn't be a breaking semantic change. That's not to say this PR isn't still useful (I think it is), I just don't want to mask the underlying issue. |
Ah... yes, so it does.
That's my claim.
Agreed that we should probably communicate something to backend implementors. I don't know if we should call that something a breaking semantic change? This feels more like a clarification of a previously-implicit semantic assumption—recursive bindings have always been able to be referenced before initialization, as long as you can get past the cycle checking in the binding group desugarer and the type checker. It's never been explicitly stated what code has the responsibility of preventing the consequent soundness issues.
#4179 (comment) shows that the problem is more fundamental than the instance escape hatch. There are basically two pre-CoreFn checks that minimized the impact of our failing to deal with recursive initialization: the binding group desugaring pass and an assertion in the type checker. Both can be circumvented with techniques that include but aren't limited to the instance escape hatch; enclosing a reference in a lambda, even if that lambda is then immediately used, causes both checks to ignore that reference. I think reconsidering that escape hatch, as well as revisiting the type checker assertion, would be good things to discuss going forward (both of which would more dramatically change which programs are accepted by the compiler). But I think the underlying issue is actually the problem corrected here: that recursive bindings are assumed to be available to each other during initialization, and a strict initialization order doesn't always fulfill that assumption. |
Right, I think most backends follow the reference implementation's lead in cases like this (since we are a standard defined by it's implementation), is all I'm really referring to. I think it's likely too that by clearing up this hole, that the compiler will probably take advantage of it in the future. It's not really "breaking" since it was always a mess, but backend implementors may find more and more things broken in the future 😆. Thanks for the clarifications! I agree with your assessment. Do you have an objection to potentially adding it as a CoreFn annotation in the future? It wasn't clear to me if there was something painful about that. |
The two reasons I mentioned in #4283 (comment): it'd be an advisory annotation, since backends could choose to do more or less than what the frontend recommends and the frontend shouldn't offer a contract on what gets marked lazy; and for the purposes of #3339/#4092, we'd have to decide whether to trust the annotations on the CoreFn provided to the compiler (effectively making another invariant that CoreFn transformations would need to preserve) or ignore them and run the analysis a second time. Neither problem is a dealbreaker but it suggests to me that the fit with the CoreFn protocol is awkward. I don't have a better idea for how to make doing this analysis less miserable for non-Haskell backends, though. |
One thing that may be useful (again not for this), is to analyze the number of runtime_lazy calls needed for more conservative approaches, such as:
If it's the case that either of those gets you to a significant reduction (which I think is likely), I think that's likely a good enough "less miserable" option 😆. |
|
Could you elaborate on what you mean by those approaches? |
Sorry, I'm just saying I think would be useful in general to know how far various naive approaches would get you in removing calls to a lazy runtime initialization call. For example, lots of mutual bindings are just functions that call each other recursively. If all the bindings are just functions, there's no need for a runtime call. Maybe that gets you to 50% of bindings. That's pretty good for something that's a single case on the RHS. You could extend that to if it's a call to |
|
I know you said this could wait but I was curious too, so here's a comparison of a few naive approaches for the preliminary 0.15 package set.
This PR doesn't do anything special with either |
|
Nice! That's great information. Thanks for doing that. Yeah, I think as long as @JordanMartinez feels good with his review, I have no problem with this moving forward 👍. |
|
I think I'll spend tomorrow morning going through this again one last time. AFAICT, this PR does as Ryan explains in his above comments. The only thing that stood out to me was the usage of |
|
Yup! As long as that inliner remains downstream of this transformation, we don't pick up any dependency on |
|
Since there's been a lot of back and forth between Ryan and myself as I try to understand this, I'm writing this post to summarize things. This post may still be wrong in some areas (I'm guessing Ryan will correct me in various places in a follow up comment), but at least we can direct people to here rather than forcing them to read the entire thread above. SummaryGeneral GoalThis PR's primary concern is with the correct initialization order of recursive binding groups (e.g. bindings that reference one another). The easiest way to implement this is to initialize all such bindings lazily and defer to the runtime for correct initialization checking (e.g. if Key Concept: Delay and ForceTwo concepts, delay and force, are central to the analysis of recursive bindings. Delay and force are properties of expressions. Delay = a non-negative number of lambdas enclosing a var (always known)
Force = a non-negative number of args being applied to expression (sometimes known, sometimes unknown)
To refer to these concepts, we use the following syntax:
Taken together, we get:
Revising the above table, we get the following
Key Concept: Argument PropagationArgument Propagation in non-Recursive Binding GroupsLet's consider the following non-recursive binding group to highlight an important point: "argument propagation". let
a = b 3 4
b = \_ -> c 1 2
c one two four five = one + two + four + fiveIf we were to annotate the above binding group expressions with delay and force annotations, it might look like this: let
a = b#0!2 3 4
b = \_ -> c#1!2 1 2
c one two four five = one + two + four + fiveTo initialize Looking at just In order for In other words, in order for
Put differently, we start with the number of args passed to Thus,
Fortunately, since the above binding group is not recursive, this level of analysis is not needed. We can determine initialization order just by looking at what each thing references. Since Argument Propagation in Recursive Binding GroupsHowever, this "argument propagation" analysis is needed when dealing with recursive binding groups. Let's use the below annotated example for illustration purposes: If we call This highlights a problem that can occur in recursive binding groups which is impossible in non-recursive binding groups: because bindings in a recursive binding group reference one another, it is possible to define a recursive binding group where an entry point into the group will cause arguments to propagate to references in the group infinitely. However, using the delay-force analysis, we can 1) detect whether and when this occurs and 2) provide a better error message if/when it does. Unfortunately, to account for this "infinite argument propagation" possibility, the analysis must model the problem as though every reference in a recursive binding group can be called with 0 to infinity args. Key Concept: Unknown ForceWhy is a force only sometimes known? Because of As a (probably non-compiling) example of {-
bravo :: Int -> Number -}
bravo = (\_ -> alpha#0!1) {} 3
{-
alpha :: Int -> Int -> Number -}
alpha = case _ of
0 -> \_ -> 1.0
x -> \y -> do
let
b = (\_ -> bravo#0!*) {}
in ... {-
possible usages of `b`:
no args - b
1 arg - b 0
2 args - b y x
3 args - b y y 3
etc... -}Since we cannot know based on syntax alone, we must must make the code work in the worst case scenario (arguments propagate infinitely). To simplify this, we assume that Delay-Force Analysis of Recursive Binding GroupsLazy or Strict InitializationAn identifier is initialized
Annotated ExampleIt's pretty tricky to write examples that are recursive and type-check and aren't rejected by different cycle-checking logic in the type checker and demonstrate some of the weirder edge cases of these rules. However, the following delay-and-force-annotated example seems to suffice as a good example for understanding this analysis. alpha :: Int -> Int -> Number
alpha = case _ of
0 -> bravo#1!0
1 -> charlie#1!0
2 -> \y -> if y > 0 then bravo#2!1 y else charlie#2!1 y
x -> \y -> delta#2!2 y x
bravo :: Int -> Number
bravo = (\_ -> alpha#0!1) {} 3
charlie :: Int -> Number
charlie = (\_ -> alpha#0!1) {} 4
gamma :: Int -> Int -> Number
gamma = (\_ -> alpha#0!1) {} 4
delta :: Int -> Int -> Number
delta =
let b = (\_ -> bravo#0!*) {}
in \x y -> if x == y then b 0 else 1.0RulesRead the below three rules with the following in mind for the first two: When is a binding,
Calculation Table (Sorted by `f`, `d`, and `g`)
Calculation Table (sorted by `h`)
Implementation of the AnalysisThere are two parts to modeling the relationships of these bindings as a graph. First, we model the relationships between bindings using a finite graph.
When force is unknown, we assume it's infinite (e.g. However, to account for the possibility of arguments propagating infinitely via the
The blue lines are the original relationships that the 'binding group desugaring' pass identified, and which caused it to place these identifiers into the same recursive binding group. Conceptually, we can reproduce the blue lines by taking the infinite graph and collapsing it back down to a finite graph by dropping the 'force' part of the node and deduplicating the resulting nodes and edges. In other words, this infinite graph stores the same general structural information as the binding group desugaring pass' finite graph. This resulting infinite graph produced by copying the base graph upwards ad infinitum is what we analyze. Our goal is to determine what vertices are irreflexively reachable from each identifier with force 0 (e.g. The analysis exploits the fact that some edges in higher force levels (e.g.
Each identifier's search either aborts (i.e. force is unknown) or terminates with a finite set of vertices. Once we know which verticies are reachable from a given identifier, we use that data to produce the below finite graph: Using the above finite graph, we conclude the following:
|
JordanMartinez
left a comment
There was a problem hiding this comment.
AFAICT, this is good to go! I'm not yet approving due to the comments I have, but an approval will come shortly after that.
I think there are still parts of this code that are just hard for me to understand. I feel like I nearly grasp it but it's still somewhat out of reach. However, given this code only triggers in recursive bindings (which tend not to be frequent), this fixes an actual bug, the tests feel comprehensive and actually pass, and my general understanding of the code is that it implements the solution you have proposed correctly, I think this is ready to be merged.
I think the only question I have is whether this may introduce space leaks due to the heavy usage of laziness in this implementation.
| -- A clumsy hack to preserve TCO in a particular idiom of unsafePartial seen in Data.Map.Internal. | ||
| App a1 e1@(Var _ (Qualified (Just C.PartialUnsafe) (Ident up))) (Abs a2 i e2) | up == C.unsafePartial | ||
| -> App a1 e1 . Abs a2 i <$> handleExpr' e2 |
There was a problem hiding this comment.
This can be removed now that purescript/purescript-ordered-collections#57 has been merged.
I'm not sure if the ordered-collection dependency in the test directory's bower.json file only needs to be updated to master or whether all of them do.
There was a problem hiding this comment.
Since this is just a does-it-optimize issue and not a does-it-work issue, I don't think we need to worry about updating the test dependencies for this if we take it out.
But, continuing the conversation from purescript/purescript-ordered-collections#57 (comment):
However... the real question here is whether
unsafePartial case _ ofshould trigger TCO or not, which is difficult to answer, because there is no real spec for when TCO should specifically not trigger. From a syntactic/elaboration standpoint, it's clearly not what we consider a tail call. However, the optimizer treatsunsafePartialas irrelevant, and removes it, which coincidentally means that it triggers TCO. Same thing happens if you do something likeunsafeCoerce $ go .... This issue will come up with any sort of inlining optimizations. Should inlining open up opportunities for TCO, or should TCO only apply to source syntax?
That is the question here, I agree. If unsafePartial, unsafeCoerce, etc. are meant to be transparent to TCO, then they also have to be transparent to this analysis (via hacks like the above or something more principled that I haven't come up with yet).
Do we know how we want to make that decision?
There was a problem hiding this comment.
I think that, generally, the expectation around optimizations is that inlining opens up more opportunities for it. The advantage is that with a decent inliner, more things can be considered tail recursive. The disadvantage is that relying on optimizations for something like stack safety can be quite brittle.
Take for example, existentials. We can CPS encode existentials (or use purescript-exists), but that means it's impossible to write safe, fast, tail-recursive code with them. I've run up against this so, so many times and it's incredibly frustrating. Language support would be nice, but another alternative is if the eliminator were inlined (since it's the equivalent of aliasing unsafeCoerce), it would just work. Another option is tailRec, but it has significant overhead. If we could totally eliminate the overhead of tailRec, I might be OK with removing TCO altogether.
That doesn't really answer the question, except to say:
- Given the status quo, I think we should grant the expectation that unsafeCoerce/unsafePartial are removed.
- If we only had optimized
tailRec(with multiple parameters), the issue would be moot because then it's no longer a correctness problem, but truly an optimization problem.
There was a problem hiding this comment.
I like the sound of that, but to be clear, when talking about optimizing tailRec, do you mean the case of tailRec (something that inlines to a literal lambda where there is a `Loop` or `Done` lexically present in every tail position) initValue, or the case of tailRec (arbitrary expression) initValue? Because the first sounds straightforward and lovely, and the second sounds impossible unless you have other special cases in mind that you would carve out.
Assuming we agree on what the above means, the logical plan would then be keep the hacks in this analysis for 0.15, implement tailRec optimization for 0.16 and warn everyone to use it if they care about stack safety, remove the hacks in 0.17?
There was a problem hiding this comment.
Yes, I mean the former. tailRec is nice because if the optimization bails it's still stack safe, but it may allocate more, so it's not an issue of something working vs not working, just how efficient it is. I don't know if we can realistically get rid of TCO in general (tailRec is implemented in terms of TCO, after all), but we can dissuade it's usage, and basically say that it's necessary if you want to guarantee stack safety.
JordanMartinez
left a comment
There was a problem hiding this comment.
Fantastic work! Thanks for your patience as I've tried to wrap my head around this.
|
Just FYI. I haven't merged this because I wasn't sure whether more needed to be done in this thread: #4283 (comment) |
This commit adds src/Language/PureScript/CoreFn/Laziness.hs, a transformation on CoreFn that introduces call-by-need semantics to bindings in recursive binding groups. This is the only way to restore soundness to those groups without forbidding many programs we've previously allowed, as historically any reference to another identifier within the same binding group has been permitted as long as it occurs within a lambda, regardless of whether that lambda is reduced in the course of initialization. The rest of PureScript remains call-by-value. Also, the imperfect prohibitions on cyclic references enforced during desugaring and type checking remain in place; consequently, this change does not alter the set of programs accepted by the compiler. For details, see the extensive commentary in Laziness.hs. This is unlikely to break a working program, but the upshot for users is that it's now possible to get a run-time error when dereferencing an identifier in a recursive binding group before it has been initialized, instead of silently getting an `undefined` value and having that maybe or maybe not lead to an error somewhere else.
9fe742c to
814157b
Compare
|
I think we've settled on leaving the hack in place until we can give users a performant way to explicitly use (The commit I just force-pushed is purely a rebase, squash, and commit message rewrite.) |
Just as a sanity check, @natefaubion do you agree? If so, I'll merge. |
|
Yes, I agree with that statement. |







I wrote a book in src/Language/PureScript/CoreFn/Laziness.hs; read that
for details.
This is unlikely to break a working program, but the upshot for users is
that it's now possible to get a run-time error when dereferencing an
identifier in a recursive binding group before it has been initialized,
instead of silently getting an
undefinedvalue and having that maybeor maybe not lead to an error somewhere else.
Description of the change
Fixes #4179.
Checklist: