fix(ocaml): use LANGUAGE_OCAML_INTERFACE grammar for .mli files#730
fix(ocaml): use LANGUAGE_OCAML_INTERFACE grammar for .mli files#730carlos-alm merged 3 commits intomainfrom
Conversation
OCaml interface files (.mli) use a distinct grammar that lacks expression syntax. Parsing them with LANGUAGE_OCAML produces error-polluted ASTs and misses interface declarations (val, external, module type, exception). Rust native engine: - Add OcamlInterface variant to LanguageKind - Map .mli to OcamlInterface, use LANGUAGE_OCAML_INTERFACE - Route to shared OCaml extractor with interface-specific handlers WASM engine: - Build tree-sitter-ocaml_interface.wasm from grammars/interface - Add ocaml-interface registry entry for .mli extension - Extend TS extractor for value_specification, external, module_type_definition, exception_definition nodes Closes #720
|
Claude finished @carlos-alm's task —— View job PR Review: OCaml Interface Grammar SupportTodo List:
Analysis Summary:Problem Being Solved: Phase 1: Root Cause Analysis ✅
Phase 2: Configuration Changes ✅
Phase 3: Test Integrity ✅
Phase 4: Code Quality ✅
Root Cause AnalysisProblem Being Solved: OCaml interface files ( Why This Approach: Using dedicated Risk Assessment: Low risk. This is an additive change that doesn't affect existing Backlog Compliance
Critical ConcernsNone. This is a well-implemented language accuracy improvement. Minor Notes
Final Recommendation
Conditions for approval: Verify CI build passes, confirming Rust compilation succeeds and WASM grammar builds correctly. |
Greptile SummaryThis PR correctly fixes OCaml
Confidence Score: 5/5Safe to merge — all changes are additive, well-tested, and the only remaining finding is a P2 edge-case misclassification in the function-type heuristic. All P0/P1 concerns from prior rounds are resolved. The single remaining note is a P2 edge case where hasDescendantType may over-classify a value like Both src/extractors/ocaml.ts and crates/codegraph-core/src/extractors/ocaml.rs share the hasDescendantType/has_descendant_kind heuristic worth revisiting for higher classification precision. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[File input] --> B{Extension?}
B -- ".ml" --> C[LanguageKind::Ocaml\nocaml registry entry\ntree-sitter-ocaml.wasm]
B -- ".mli" --> D[LanguageKind::OcamlInterface\nocaml-interface registry entry\ntree-sitter-ocaml_interface.wasm]
C --> E[OcamlExtractor / extractOCamlSymbols]
D --> E
E --> F{Node type?}
F -- value_definition/let_binding --> G[function / variable]
F -- type_definition --> H[type]
F -- module_definition --> I[module]
F -- open_module --> K[import]
F -- value_specification --> L{Outer type is\nfunction_type?}
L -- yes --> M[function]
L -- no --> N[variable]
F -- external --> O[function]
F -- module_type_definition --> P[interface]
F -- exception_definition --> Q[type]
Reviews (2): Last reviewed commit: "fix: add ocaml-interface to LanguageId t..." | Re-trigger Greptile |
src/extractors/ocaml.ts
Outdated
| // Interface-specific (.mli) node types | ||
| case 'value_specification': | ||
| handleOCamlValueSpec(node, ctx); | ||
| break; | ||
| case 'external': | ||
| handleOCamlExternal(node, ctx); | ||
| break; | ||
| case 'module_type_definition': | ||
| handleOCamlModuleTypeDef(node, ctx); | ||
| break; | ||
| case 'exception_definition': | ||
| handleOCamlExceptionDef(node, ctx); | ||
| break; |
There was a problem hiding this comment.
Misleading "interface-specific" comment — handlers apply to
.ml too
The comment // Interface-specific (.mli) node types is inaccurate. All four node types (external, exception_definition, module_type_definition, value_specification) are also valid in regular .ml files:
external length : string -> int = "caml_ml_string_length"is extremely common in.mlfiles for C FFIexception My_error of stringappears in.mlimplementationsmodule type S = sig ... endcan be defined inside.mlfiles
Because both LanguageKind::Ocaml and LanguageKind::OcamlInterface route to the same extractor (see mod.rs), these handlers will fire for .ml files too — which is actually a beneficial side-effect, but the comment implies they are unreachable for .ml inputs. The same misleading comment appears in the Rust extractor at crates/codegraph-core/src/extractors/ocaml.rs:27.
Consider updating the comment to reflect that these node types are now handled for both grammars:
| // Interface-specific (.mli) node types | |
| case 'value_specification': | |
| handleOCamlValueSpec(node, ctx); | |
| break; | |
| case 'external': | |
| handleOCamlExternal(node, ctx); | |
| break; | |
| case 'module_type_definition': | |
| handleOCamlModuleTypeDef(node, ctx); | |
| break; | |
| case 'exception_definition': | |
| handleOCamlExceptionDef(node, ctx); | |
| break; | |
| // Shared node types present in both .ml and .mli files | |
| case 'value_specification': | |
| handleOCamlValueSpec(node, ctx); | |
| break; | |
| case 'external': | |
| handleOCamlExternal(node, ctx); | |
| break; | |
| case 'module_type_definition': | |
| handleOCamlModuleTypeDef(node, ctx); | |
| break; | |
| case 'exception_definition': | |
| handleOCamlExceptionDef(node, ctx); | |
| break; |
There was a problem hiding this comment.
Fixed in fd2a1fe. Updated the comment to "Shared node types present in both .ml and .mli files" in both the TS extractor (src/extractors/ocaml.ts) and the Rust extractor (crates/codegraph-core/src/extractors/ocaml.rs). Also added .ml test coverage for external, exception, and module type definitions to document this beneficial side-effect.
| function handleOCamlExceptionDef(node: TreeSitterNode, ctx: ExtractorOutput): void { | ||
| const ctorDecl = findChild(node, 'constructor_declaration'); | ||
| if (!ctorDecl) return; | ||
| const nameNode = findChild(ctorDecl, 'constructor_name'); | ||
| if (!nameNode) return; | ||
|
|
||
| ctx.definitions.push({ | ||
| name: nameNode.text, | ||
| kind: 'type', | ||
| line: node.startPosition.row + 1, | ||
| endLine: nodeEndLine(node), | ||
| }); | ||
| } |
There was a problem hiding this comment.
Exception aliases silently skipped
For the exception aliasing syntax exception Foo = Bar, the tree-sitter-ocaml grammar produces an exception_definition node whose first payload is a constructor_path (pointing to Bar), not a constructor_declaration. The handler returns early if findChild(node, 'constructor_declaration') fails, so aliased exceptions are silently dropped.
This is an acceptable limitation for a first pass, but worth noting. A more robust approach would also fall back to extracting the name directly from a constructor_name child on the exception_definition node itself:
function handleOCamlExceptionDef(node: TreeSitterNode, ctx: ExtractorOutput): void {
// Standard: `exception Foo of bar`
const ctorDecl = findChild(node, 'constructor_declaration');
const nameNode = ctorDecl
? findChild(ctorDecl, 'constructor_name')
: findChild(node, 'constructor_name'); // fallback for `exception Foo = Bar`
if (!nameNode) return;
ctx.definitions.push({
name: nameNode.text,
kind: 'type',
line: node.startPosition.row + 1,
endLine: nodeEndLine(node),
});
}The same pattern applies to the Rust counterpart at crates/codegraph-core/src/extractors/ocaml.rs:278-295.
There was a problem hiding this comment.
Fixed in fd2a1fe. Updated both the TS and Rust extractors to fall back to constructor_name directly on the exception_definition node when constructor_declaration is not found. This handles the exception Foo = Bar alias syntax. Added a test case for exception aliases in the .ml test suite as well.
- Add 'ocaml-interface' to LanguageId union in types.ts (fixes TS2322) - Update misleading "Interface-specific" comments to "Shared node types present in both .ml and .mli files" in both TS and Rust extractors - Handle exception alias syntax (exception Foo = Bar) by falling back to constructor_name on the exception_definition node directly - Add .ml test coverage for external, exception, exception alias, and module type definitions that now fire for .ml files too
|
Addressed all Greptile review feedback in fd2a1fe:
|
Summary
.mlifrom.mlparsing:.mlifiles now use the dedicatedLANGUAGE_OCAML_INTERFACEgrammar instead ofLANGUAGE_OCAML, which was producing error-polluted ASTs and missing interface-specific declarationsOcamlInterfacevariant; WASM engine getsocaml-interfaceregistry entry withtree-sitter-ocaml_interface.wasmvalue_specification(val),external,module_type_definition, andexception_definitionnodes — shared with the existing OCaml extractor since most node types overlapvalclassification:val name : typedeclarations are classified asfunctionwhen the type contains->(function_type), otherwisevariableTest plan
.mlparser tests pass unchanged.mliparser tests added and passing:valspecs as functions (arrow type) and variables (non-arrow)externaldeclarations as functionsmodule typedefinitions as interfacesexceptiondefinitions as typestypedefinitions in interfacesopenstatements in interfacesbuild-wasm.tsCloses #720