WIP: dce on CoreFn by coot · Pull Request #3039 · purescript/purescript

coot · 2017-08-15T08:22:56Z

Attempt to solve #2970

Polymorphic ModuleT type: when parsing foreign declarations we don't have access to their type, and thus we need a `ModuleT ()` type. The original `Module` type becomes an alias for `ModuleT Type`.

It is required when runnings `moduleToJs`.

* export list might contain foreign imports

coot · 2017-08-15T08:33:43Z

This adds purs dce. It reads corefn dumped to output and generates javascript modules in dce-output directory. (There is also a switch to just generate corefn representation).

To generate javascript modules I use moduleToJs but it required to add annotations to the json representation of CoreFn (source spans and meta).

I run into problem when using Language.PureScript.CodeGen.JS.moduleToJs. The CoreFn representation does not distinguish between non recursive bind and the recursive one (NonRec vs Rec). And it seems that moduleToJs shuffles the recursive ones to put them in the right order, am I right?. This does not work for modules parsed from CoreFn dump, since all binders are read as non recursive. Would it be fine to distinguish NonRec and Rec in the json representation?

coot · 2017-08-15T21:30:05Z

The issue is that declarations are encoded using an object, and aeson using using hash maps and they do not guarantee order. Changing the json encoding of module declarations to a list binds, where each one is a list of three tuples (identifier, annotation, expression) (this way NonRec and Rec binds can be encoded using the same structure), solved the problem.

paf31 · 2017-08-15T21:45:57Z

src/Language/PureScript/CoreFn/ToJSON.hs

-bindToJSON (NonRec ann n e) = object [ runIdent n .= ( annToJSON ann, exprToJSON e ) ]
-bindToJSON (Rec bs) = object $ map (\((ann, n), e) -> runIdent n .= ( annToJSON ann, exprToJSON e ) ) bs
+bindToJSON (NonRec ann n e) = toJSON [(runIdent n, annToJSON ann, exprToJSON e)]
+bindToJSON (Rec bs) = toJSON $ map (\((ann, n), e) -> (runIdent n, annToJSON ann, exprToJSON e)) bs


I don't understand why this needs to change. It's a recursive group, so the order should be unimportant.

I just couldn't compile the Aff module. It contains a group of recursive binds with applicativeAff, nonCanceler and alwaysCanceler. When reading it using a hash map the order was changed to nonCanceler, alwaysCanceler followed by applicativeAff. ILooking at the compiled code one can see that here order is important:

var applicativeAff = new Control_Applicative.Applicative(function () { return applyAff; }, function (v) { return $foreign._pure(nonCanceler, v); }); var nonCanceler = Data_Function["const"](Control_Applicative.pure(applicativeAff)(false)); var alwaysCanceler = Data_Function["const"](Control_Applicative.pure(applicativeAff)(true));

When applicativeAff is at the end, executing this in node was failing with an error indicating that applicativeAff passed to Control_Applicative.pure(applicativeAff) in nonCanceler is undefined.

So far I tested this on a purescript react isomorphic app. At the end I run purescript dce-ed code on node and in the browser.

pure dce with uglifyify produced 705kB while pulp build -O with uglifyify 685kB.
There is still room for improvement though. Now I am only removing top level declarations and not touching foreign modules at all.

It will contian code which is unrealated to CoreFn.

This is done in an unsafe way, that may produce foreign module that is missing some declarations (since it blindly removes export statements).

coot · 2017-08-17T06:03:54Z

With webpack I could reduce the bundle down to 440kB and with a naive removing of exports from foreign modules additional 20kB - so I guess it makes sense to do that in a safer way.

Perform dce in the foreign module: only on the export statements. This makes some assumptions and might fail if one using dynamic features of JavaScript, eval, etc. The `purs dce -O1` switch will enable this optimization. I am thinking about including `-O2` which would do dce on all declarations of the foreign module rather than just on exports. Right now any non export declaration that is referenced (as `exports.ident` or `export["ident"]`) in non export statements is included.

paf31 · 2017-08-18T22:42:34Z

Thanks for taking the time to work on this!

This is obviously a very big change, and could eventually replace psc-bundle, which is pretty stable and well-tested at this point. I'd like to suggest this:

merge the corefn json parser into the compiler
make a separate repo to test this approach, and give it some real time to iron out any bugs.

Once it's thoroughly tested on lots of code (maybe after 1.0 even), we could deprecate purs bundle, with a plan to remove it in a future breaking release, and a plan to add this tool in a non-breaking release (could be 0.12.x or 1.x for example).

What do you think?

paf31 · 2017-08-18T22:43:16Z

Also, now would be the time to break the corefn format, if we're going to do it, since we can do it in 0.12.0.

coot · 2017-08-19T05:14:25Z

I agree, that indeed needs some time to mature and grow its test suite and be tested on real projects.

I will prepare PR for the json representation.
Is its format documented somewhere?

Cmdv · 2017-08-19T09:33:53Z

I'll be glad to test it out, if any help is needed in that department 😃

coot · 2017-08-19T10:12:41Z

@paf31 maybe when you will announcing the next version of purescript you could include a line about new experimental (external) tool for tree shaking?

coot · 2017-08-19T10:50:25Z

Here it is zephyr.

paf31 · 2017-08-19T14:24:49Z

maybe when you will announcing the next version of purescript you could include a line about new experimental (external) tool for tree shaking?

Yes, I think we should definitely do that. Thanks!

joneshf · 2017-08-20T14:48:21Z

Also, now would be the time to break the corefn format, if we're going to do it, since we can do it in 0.12.0.

Can we do this? Corefn is kind of a pain to work with, enough so that I've given up on writing a JSONSchema for it. It being a pain to work with is a sentiment shared by others that aren't known/active in the PS community.

paf31 · 2017-08-20T19:46:23Z

Yes. If you have experience working with it, it would be good to get suggestions for a better format.

joneshf · 2017-08-20T22:00:53Z

Here are some things we found hard, in no particular order:

Very few things are named.

It can be kind of hard to know which parts are which when you're deep in the nesting. It required me to explain what each piece of the JSON was in order for others to understand what they were looking at.
For instance, case expressions look very convoluted unless you understand all the possibilities that PS source has for case expressions:

[
    "Case",
    [
        [
            "Var",
            "v"
        ]
    ],
    [
        [
            [
                [
                    "LiteralBinder",
                    [
                        "IntLiteral",
                        1
                    ]
                ]
            ],
            [
                "Literal",
                [
                    "IntLiteral",
                    1
                ]
            ]
        ],
        [
            [
                "NullBinder"
            ],
            [
                "Literal",
                [
                    "IntLiteral",
                    0
                ]
            ]
        ]
    ]
]

The S-expression encoding feels harder to work with than an object encoding would.

This ties in with the naming concern. I see the appeal for the S-expression encoding, but I think it would be better if we just had a way to dump an actual S-expression and left the JSON dump as an object.

[
    "Abs",
    "dict",
    [
        "Accessor",
        "foo",
        [
            "Var",
            "dict"
        ]
    ]
]

vs

{
    "type": "Abs",
    "argument": "dict",
    "body": {
        "type": "Accessor",
        "key": "foo",
        "object": {
            "type": "Var",
            "name": "dict"
        }
    }
}

I have no strong feelings on what the names should be, but they should exist.

The value of each `decls` has to be parsed before you know how to generate them.

In JS this is not much of a problem, because a function can be used for both a lambda abstraction and an object constructor. But in languages that don't conflate the two ideas, it requires more work.

"decls": [
    {
        "Foo": []
    },
    {
        "Bar": []
    }
]

You can't know what Foo or Bar are supposed to be until you parse their arrays. One could be a type class, the other a value constructor, maybe other things.

Qualified names are serialized as a string interspersed with `.` between module names.

This is another thing that is fine in JS because you can dot your way wherever you need to go. But, it's harder in other languages that don't use . for everything. If module resolution is done with :: you have to then parse the string into its module pieces and value pieces (or whatever you're trying to do).

coot · 2017-08-21T10:30:03Z

I agree with @joneshf, having more verbose output would make it much easier to work with the json files.

coot · 2017-08-22T06:20:49Z

@paf31 if you agree with @joneshf proposal I can add the changes to #3049.

paf31 · 2017-08-22T20:31:15Z

@joneshf Thank you for the detailed notes! I can't disagree with anything you've said. The S-expression syntax was something of an experiment to begin with, and maybe it hasn't worked out. I would be fine with breaking the format to use the suggested type fields for 0.12.0.

@rightfold Any thoughts?

no-longer-on-githu-b · 2017-08-23T04:41:25Z

I agree. Especially problems with ambiguous names

coot · 2017-08-24T09:05:32Z

I think we should also change recordToJson to always encode records as lists of pairs, rather than try to encode it as object. This will simplify parsers, but also avoid bugs in them due to unexpected format that is present only if encoding of a label fails.

Marcin Szamotulski added 7 commits August 10, 2017 00:13

CoreFn.Module: ModuleT

7ea5214

Polymorphic ModuleT type: when parsing foreign declarations we don't have access to their type, and thus we need a `ModuleT ()` type. The original `Module` type becomes an alias for `ModuleT Type`.

CoreFn.FromJSON: parse json representation of CoreFn

f42f3df

Language.PureScript.CoreFn.DCE module

adeb00a

CoreFn: include SourceSpans in json representation

be13654

CoreFn: add Meta to json representation

4053c91

It is required when runnings `moduleToJs`.

DCE: add ForeignVertexes, and correctly filter exports

9c5a549

* export list might contain foreign imports

purs dce: check version

e86bcbf

Marcin Szamotulski added 3 commits August 15, 2017 11:33

remove unused import

564a57d

Update TestCoreFn

a1b2658

Merge branch 'master' into dce

7428143

Change json encoding of CoreFn Bind

b103505

paf31 reviewed Aug 15, 2017

View reviewed changes

Marcin Szamotulski added 2 commits August 16, 2017 18:18

Move DCE module to Language.PureScript.DCE

bd0c67d

It will contian code which is unrealated to CoreFn.

dce exports in foreign module

409bb48

This is done in an unsafe way, that may produce foreign module that is missing some declarations (since it blindly removes export statements).

Marcin Szamotulski added 3 commits August 17, 2017 08:35

purs dce: improve error message

4c4320b

dceExpr: dce locally bound variables

c76798c

coot closed this Aug 19, 2017

This was referenced Aug 24, 2017

Corefn Json representation #3049

Merged

purs bundle uses too much memory #3055

Closed

Conversation

coot commented Aug 15, 2017

Uh oh!

coot commented Aug 15, 2017

Uh oh!

coot commented Aug 15, 2017

Uh oh!

paf31 Aug 15, 2017

Choose a reason for hiding this comment

Uh oh!

coot Aug 15, 2017

Choose a reason for hiding this comment

Uh oh!

coot Aug 15, 2017

Choose a reason for hiding this comment

Uh oh!

coot commented Aug 17, 2017

Uh oh!

paf31 commented Aug 18, 2017

Uh oh!

paf31 commented Aug 18, 2017

Uh oh!

coot commented Aug 19, 2017

Uh oh!

Cmdv commented Aug 19, 2017

Uh oh!

coot commented Aug 19, 2017

Uh oh!

coot commented Aug 19, 2017

Uh oh!

paf31 commented Aug 19, 2017

Uh oh!

joneshf commented Aug 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paf31 commented Aug 20, 2017

Uh oh!

joneshf commented Aug 20, 2017

Very few things are named.

The S-expression encoding feels harder to work with than an object encoding would.

The value of each decls has to be parsed before you know how to generate them.

Qualified names are serialized as a string interspersed with . between module names.

Uh oh!

coot commented Aug 21, 2017

Uh oh!

coot commented Aug 22, 2017

Uh oh!

paf31 commented Aug 22, 2017

Uh oh!

no-longer-on-githu-b commented Aug 23, 2017

Uh oh!

coot commented Aug 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

joneshf commented Aug 20, 2017 •

edited

Loading

The value of each `decls` has to be parsed before you know how to generate them.

Qualified names are serialized as a string interspersed with `.` between module names.