Skip to content

WIP: dce on CoreFn#3039

Closed
coot wants to merge 16 commits intopurescript:masterfrom
coot:dce
Closed

WIP: dce on CoreFn#3039
coot wants to merge 16 commits intopurescript:masterfrom
coot:dce

Conversation

@coot
Copy link
Copy Markdown
Contributor

@coot coot commented Aug 15, 2017

Attempt to solve #2970

Marcin Szamotulski added 7 commits August 10, 2017 00:13
Polymorphic ModuleT type: when parsing foreign declarations we don't have
access to their type, and thus we need a `ModuleT ()` type.  The
original `Module` type becomes an alias for `ModuleT Type`.
It is required when runnings `moduleToJs`.
* export list might contain foreign imports
@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 15, 2017

This adds purs dce. It reads corefn dumped to output and generates javascript modules in dce-output directory. (There is also a switch to just generate corefn representation).

To generate javascript modules I use moduleToJs but it required to add annotations to the json representation of CoreFn (source spans and meta).

I run into problem when using Language.PureScript.CodeGen.JS.moduleToJs. The CoreFn representation does not distinguish between non recursive bind and the recursive one (NonRec vs Rec). And it seems that moduleToJs shuffles the recursive ones to put them in the right order, am I right?. This does not work for modules parsed from CoreFn dump, since all binders are read as non recursive. Would it be fine to distinguish NonRec and Rec in the json representation?

@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 15, 2017

The issue is that declarations are encoded using an object, and aeson using using hash maps and they do not guarantee order. Changing the json encoding of module declarations to a list binds, where each one is a list of three tuples (identifier, annotation, expression) (this way NonRec and Rec binds can be encoded using the same structure), solved the problem.

bindToJSON (NonRec ann n e) = object [ runIdent n .= ( annToJSON ann, exprToJSON e ) ]
bindToJSON (Rec bs) = object $ map (\((ann, n), e) -> runIdent n .= ( annToJSON ann, exprToJSON e ) ) bs
bindToJSON (NonRec ann n e) = toJSON [(runIdent n, annToJSON ann, exprToJSON e)]
bindToJSON (Rec bs) = toJSON $ map (\((ann, n), e) -> (runIdent n, annToJSON ann, exprToJSON e)) bs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this needs to change. It's a recursive group, so the order should be unimportant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just couldn't compile the Aff module. It contains a group of recursive binds with applicativeAff, nonCanceler and alwaysCanceler. When reading it using a hash map the order was changed to nonCanceler, alwaysCanceler followed by applicativeAff. ILooking at the compiled code one can see that here order is important:

var applicativeAff = new Control_Applicative.Applicative(function () {
    return applyAff;
}, function (v) {
    return $foreign._pure(nonCanceler, v);
});
var nonCanceler = Data_Function["const"](Control_Applicative.pure(applicativeAff)(false));
var alwaysCanceler = Data_Function["const"](Control_Applicative.pure(applicativeAff)(true));

When applicativeAff is at the end, executing this in node was failing with an error indicating that applicativeAff passed to Control_Applicative.pure(applicativeAff) in nonCanceler is undefined.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I tested this on a purescript react isomorphic app. At the end I run purescript dce-ed code on node and in the browser.

pure dce with uglifyify produced 705kB while pulp build -O with uglifyify 685kB.
There is still room for improvement though. Now I am only removing top level declarations and not touching foreign modules at all.

Marcin Szamotulski added 2 commits August 16, 2017 18:18
It will contian code which is unrealated to CoreFn.
This is done in an unsafe way, that may produce foreign module that is
missing some declarations (since it blindly removes export statements).
@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 17, 2017

With webpack I could reduce the bundle down to 440kB and with a naive removing of exports from foreign modules additional 20kB - so I guess it makes sense to do that in a safer way.

Marcin Szamotulski added 3 commits August 17, 2017 08:35
Perform dce in the foreign module: only on the export statements.  This
makes some assumptions and might fail if one using dynamic features of
JavaScript, eval, etc.  The `purs dce -O1` switch will enable this
optimization.  I am thinking about including `-O2` which would do dce on
all declarations of the foreign module rather than just on exports.
Right now any non export declaration that is referenced (as
`exports.ident` or `export["ident"]`) in non export statements is included.
@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Aug 18, 2017

Thanks for taking the time to work on this!

This is obviously a very big change, and could eventually replace psc-bundle, which is pretty stable and well-tested at this point. I'd like to suggest this:

  • merge the corefn json parser into the compiler
  • make a separate repo to test this approach, and give it some real time to iron out any bugs.

Once it's thoroughly tested on lots of code (maybe after 1.0 even), we could deprecate purs bundle, with a plan to remove it in a future breaking release, and a plan to add this tool in a non-breaking release (could be 0.12.x or 1.x for example).

What do you think?

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Aug 18, 2017

Also, now would be the time to break the corefn format, if we're going to do it, since we can do it in 0.12.0.

@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 19, 2017

I agree, that indeed needs some time to mature and grow its test suite and be tested on real projects.

I will prepare PR for the json representation.
Is its format documented somewhere?

@Cmdv
Copy link
Copy Markdown
Contributor

Cmdv commented Aug 19, 2017

I'll be glad to test it out, if any help is needed in that department 😃

@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 19, 2017

@paf31 maybe when you will announcing the next version of purescript you could include a line about new experimental (external) tool for tree shaking?

@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 19, 2017

Here it is zephyr.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Aug 19, 2017

maybe when you will announcing the next version of purescript you could include a line about new experimental (external) tool for tree shaking?

Yes, I think we should definitely do that. Thanks!

@coot coot closed this Aug 19, 2017
@joneshf
Copy link
Copy Markdown
Member

joneshf commented Aug 20, 2017

Also, now would be the time to break the corefn format, if we're going to do it, since we can do it in 0.12.0.

Can we do this? Corefn is kind of a pain to work with, enough so that I've given up on writing a JSONSchema for it. It being a pain to work with is a sentiment shared by others that aren't known/active in the PS community.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Aug 20, 2017

Yes. If you have experience working with it, it would be good to get suggestions for a better format.

@joneshf
Copy link
Copy Markdown
Member

joneshf commented Aug 20, 2017

Here are some things we found hard, in no particular order:

Very few things are named.

It can be kind of hard to know which parts are which when you're deep in the nesting. It required me to explain what each piece of the JSON was in order for others to understand what they were looking at.
For instance, case expressions look very convoluted unless you understand all the possibilities that PS source has for case expressions:

[
    "Case",
    [
        [
            "Var",
            "v"
        ]
    ],
    [
        [
            [
                [
                    "LiteralBinder",
                    [
                        "IntLiteral",
                        1
                    ]
                ]
            ],
            [
                "Literal",
                [
                    "IntLiteral",
                    1
                ]
            ]
        ],
        [
            [
                "NullBinder"
            ],
            [
                "Literal",
                [
                    "IntLiteral",
                    0
                ]
            ]
        ]
    ]
]

The S-expression encoding feels harder to work with than an object encoding would.

This ties in with the naming concern. I see the appeal for the S-expression encoding, but I think it would be better if we just had a way to dump an actual S-expression and left the JSON dump as an object.

[
    "Abs",
    "dict",
    [
        "Accessor",
        "foo",
        [
            "Var",
            "dict"
        ]
    ]
]

vs

{
    "type": "Abs",
    "argument": "dict",
    "body": {
        "type": "Accessor",
        "key": "foo",
        "object": {
            "type": "Var",
            "name": "dict"
        }
    }
}

I have no strong feelings on what the names should be, but they should exist.

The value of each decls has to be parsed before you know how to generate them.

In JS this is not much of a problem, because a function can be used for both a lambda abstraction and an object constructor. But in languages that don't conflate the two ideas, it requires more work.

"decls": [
    {
        "Foo": []
    },
    {
        "Bar": []
    }
]

You can't know what Foo or Bar are supposed to be until you parse their arrays. One could be a type class, the other a value constructor, maybe other things.

Qualified names are serialized as a string interspersed with . between module names.

This is another thing that is fine in JS because you can dot your way wherever you need to go. But, it's harder in other languages that don't use . for everything. If module resolution is done with :: you have to then parse the string into its module pieces and value pieces (or whatever you're trying to do).

@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 21, 2017

I agree with @joneshf, having more verbose output would make it much easier to work with the json files.

@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 22, 2017

@paf31 if you agree with @joneshf proposal I can add the changes to #3049.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Aug 22, 2017

@joneshf Thank you for the detailed notes! I can't disagree with anything you've said. The S-expression syntax was something of an experiment to begin with, and maybe it hasn't worked out. I would be fine with breaking the format to use the suggested type fields for 0.12.0.

@rightfold Any thoughts?

@no-longer-on-githu-b
Copy link
Copy Markdown
Contributor

I agree. Especially problems with ambiguous names

@coot
Copy link
Copy Markdown
Contributor Author

coot commented Aug 24, 2017

I think we should also change recordToJson to always encode records as lists of pairs, rather than try to encode it as object. This will simplify parsers, but also avoid bugs in them due to unexpected format that is present only if encoding of a label fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants