Skip to content

Commit 7a2536d

Browse files
author
Sergey Pariev
committed
Merge commit 'http-parser/master' into catchup
Conflicts: .gitignore LICENSE-MIT README.md test.c
2 parents df2c191 + 36808f4 commit 7a2536d

10 files changed

Lines changed: 2582 additions & 947 deletions

File tree

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
core
12
tags
23
*.o
34
test_g
@@ -7,3 +8,6 @@ test_g
78
TAGS
89
a.out
910
.DS_Store
11+
test_fast
12+
*.mk
13+
*.Makefile

.mailmap

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# update AUTHORS with:
2+
# git log --all --reverse --format='%aN <%aE>' | perl -ne 'BEGIN{print "# Authors ordered by first contribution.\n"} print unless $h{$_}; $h{$_} = 1' > AUTHORS
3+
Ryan Dahl <[email protected]>
4+
Salman Haq <[email protected]>

AUTHORS

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Authors ordered by first contribution.
2+
Ryan Dahl <[email protected]>
3+
Jeremy Hinegardner <[email protected]>
4+
Sergey Shepelev <[email protected]>
5+
Joe Damato <[email protected]>
6+
7+
Phoenix Sol <[email protected]>
8+
Cliff Frey <[email protected]>
9+
Ewen Cheslack-Postava <[email protected]>
10+
Santiago Gala <[email protected]>
11+
Tim Becker <[email protected]>
12+
Jeff Terrace <[email protected]>
13+
Ben Noordhuis <[email protected]>
14+
Nathan Rajlich <[email protected]>
15+
Mark Nottingham <[email protected]>
16+
Aman Gupta <[email protected]>
17+
Tim Becker <[email protected]>
18+
Sean Cunningham <[email protected]>
19+
Peter Griess <[email protected]>
20+
Salman Haq <[email protected]>
21+
Cliff Frey <[email protected]>
22+
23+
Fouad Mardini <[email protected]>
24+
Paul Querna <[email protected]>
25+
Felix Geisendörfer <[email protected]>
26+
27+
Andre Caron <[email protected]>
28+
Ivo Raisr <[email protected]>
29+
James McLaughlin <[email protected]>
30+
David Gwynne <[email protected]>
31+
LE ROUX Thomas <[email protected]>
32+
Randy Rizun <[email protected]>

LICENSE-MIT

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,11 @@ IN THE SOFTWARE.
2323
This code mainly based on code with the following license:
2424

2525

26-
Copyright Joyent, Inc. and other Node contributors. All rights reserved.
26+
http_parser.c is based on src/http/ngx_http_parse.c from NGINX copyright
27+
Igor Sysoev.
28+
29+
Additional changes are licensed under the same terms as NGINX and
30+
copyright Joyent, Inc. and other Node contributors. All rights reserved.
2731

2832
Permission is hereby granted, free of charge, to any person obtaining a copy
2933
of this software and associated documentation files (the "Software"), to

Makefile

Lines changed: 34 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,58 @@
1-
OPT_DEBUG=-O0 -g -Wall -Wextra -Werror -I.
2-
OPT_FAST=-O3 -DHTTP_PARSER_STRICT=0 -I.
3-
41
CC?=gcc
2+
AR?=ar
3+
4+
CPPFLAGS += -I.
5+
CPPFLAGS_DEBUG = $(CPPFLAGS) -DHTTP_PARSER_STRICT=1 -DHTTP_PARSER_DEBUG=1
6+
CPPFLAGS_DEBUG += $(CPPFLAGS_DEBUG_EXTRA)
7+
CPPFLAGS_FAST = $(CPPFLAGS) -DHTTP_PARSER_STRICT=0 -DHTTP_PARSER_DEBUG=0
8+
CPPFLAGS_FAST += $(CPPFLAGS_FAST_EXTRA)
59

10+
CFLAGS += -Wall -Wextra -Werror
11+
CFLAGS_DEBUG = $(CFLAGS) -O0 -g $(CFLAGS_DEBUG_EXTRA)
12+
CFLAGS_FAST = $(CFLAGS) -O3 $(CFLAGS_FAST_EXTRA)
13+
CFLAGS_LIB = $(CFLAGS_FAST) -fPIC
614

7-
test: test_g
15+
test: test_g test_fast
816
./test_g
17+
./test_fast
918

1019
test_g: http_parser_g.o test_g.o
11-
$(CC) $(OPT_DEBUG) http_parser_g.o test_g.o -o $@
20+
$(CC) $(CFLAGS_DEBUG) $(LDFLAGS) http_parser_g.o test_g.o -o $@
1221

1322
test_g.o: test.c http_parser.h Makefile
14-
$(CC) $(OPT_DEBUG) -c test.c -o $@
15-
16-
test.o: test.c http_parser.h Makefile
17-
$(CC) $(OPT_FAST) -c test.c -o $@
23+
$(CC) $(CPPFLAGS_DEBUG) $(CFLAGS_DEBUG) -c test.c -o $@
1824

1925
http_parser_g.o: http_parser.c http_parser.h Makefile
20-
$(CC) $(OPT_DEBUG) -c http_parser.c -o $@
26+
$(CC) $(CPPFLAGS_DEBUG) $(CFLAGS_DEBUG) -c http_parser.c -o $@
2127

22-
test-valgrind: test_g
23-
valgrind ./test_g
28+
test_fast: http_parser.o test.o http_parser.h
29+
$(CC) $(CFLAGS_FAST) $(LDFLAGS) http_parser.o test.o -o $@
2430

25-
http_parser.o: http_parser.c http_parser.h Makefile
26-
$(CC) $(OPT_FAST) -c http_parser.c
31+
test.o: test.c http_parser.h Makefile
32+
$(CC) $(CPPFLAGS_FAST) $(CFLAGS_FAST) -c test.c -o $@
2733

28-
test_fast: http_parser.o test.c http_parser.h
29-
$(CC) $(OPT_FAST) http_parser.o test.c -o $@
34+
http_parser.o: http_parser.c http_parser.h Makefile
35+
$(CC) $(CPPFLAGS_FAST) $(CFLAGS_FAST) -c http_parser.c
3036

3137
test-run-timed: test_fast
3238
while(true) do time ./test_fast > /dev/null; done
3339

40+
test-valgrind: test_g
41+
valgrind ./test_g
42+
43+
libhttp_parser.o: http_parser.c http_parser.h Makefile
44+
$(CC) $(CPPFLAGS_FAST) $(CFLAGS_LIB) -c http_parser.c -o libhttp_parser.o
45+
46+
library: libhttp_parser.o
47+
$(CC) -shared -o libhttp_parser.so libhttp_parser.o
48+
49+
package: http_parser.o
50+
$(AR) rcs libhttp_parser.a http_parser.o
3451

3552
tags: http_parser.c http_parser.h test.c
3653
ctags $^
3754

3855
clean:
39-
rm -f *.o test test_fast test_g http_parser.tar tags
56+
rm -f *.o *.a test test_fast test_g http_parser.tar tags libhttp_parser.so libhttp_parser.o
4057

4158
.PHONY: clean package test-run test-run-timed test-valgrind

README.md

Lines changed: 133 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ The parser extracts the following information from HTTP messages:
2424
* Response status code
2525
* Transfer-Encoding
2626
* HTTP version
27-
* Request path, query string, fragment
27+
* Request URL
2828
* Message body
2929

3030
Building
@@ -49,3 +49,135 @@ Usage
4949
help or have suggestions, feel free to contact me at
5050
5151

52+
53+
One `http_parser` object is used per TCP connection. Initialize the struct
54+
using `http_parser_init()` and set the callbacks. That might look something
55+
like this for a request parser:
56+
57+
http_parser_settings settings;
58+
settings.on_path = my_path_callback;
59+
settings.on_header_field = my_header_field_callback;
60+
/* ... */
61+
62+
http_parser *parser = malloc(sizeof(http_parser));
63+
http_parser_init(parser, HTTP_REQUEST);
64+
parser->data = my_socket;
65+
66+
When data is received on the socket execute the parser and check for errors.
67+
68+
size_t len = 80*1024, nparsed;
69+
char buf[len];
70+
ssize_t recved;
71+
72+
recved = recv(fd, buf, len, 0);
73+
74+
if (recved < 0) {
75+
/* Handle error. */
76+
}
77+
78+
/* Start up / continue the parser.
79+
* Note we pass recved==0 to signal that EOF has been recieved.
80+
*/
81+
nparsed = http_parser_execute(parser, &settings, buf, recved);
82+
83+
if (parser->upgrade) {
84+
/* handle new protocol */
85+
} else if (nparsed != recved) {
86+
/* Handle error. Usually just close the connection. */
87+
}
88+
89+
HTTP needs to know where the end of the stream is. For example, sometimes
90+
servers send responses without Content-Length and expect the client to
91+
consume input (for the body) until EOF. To tell http_parser about EOF, give
92+
`0` as the forth parameter to `http_parser_execute()`. Callbacks and errors
93+
can still be encountered during an EOF, so one must still be prepared
94+
to receive them.
95+
96+
Scalar valued message information such as `status_code`, `method`, and the
97+
HTTP version are stored in the parser structure. This data is only
98+
temporally stored in `http_parser` and gets reset on each new message. If
99+
this information is needed later, copy it out of the structure during the
100+
`headers_complete` callback.
101+
102+
The parser decodes the transfer-encoding for both requests and responses
103+
transparently. That is, a chunked encoding is decoded before being sent to
104+
the on_body callback.
105+
106+
107+
The Special Problem of Upgrade
108+
------------------------------
109+
110+
HTTP supports upgrading the connection to a different protocol. An
111+
increasingly common example of this is the Web Socket protocol which sends
112+
a request like
113+
114+
GET /demo HTTP/1.1
115+
Upgrade: WebSocket
116+
Connection: Upgrade
117+
Host: example.com
118+
Origin: http://example.com
119+
WebSocket-Protocol: sample
120+
121+
followed by non-HTTP data.
122+
123+
(See http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-75 for more
124+
information the Web Socket protocol.)
125+
126+
To support this, the parser will treat this as a normal HTTP message without a
127+
body. Issuing both on_headers_complete and on_message_complete callbacks. However
128+
http_parser_execute() will stop parsing at the end of the headers and return.
129+
130+
The user is expected to check if `parser->upgrade` has been set to 1 after
131+
`http_parser_execute()` returns. Non-HTTP data begins at the buffer supplied
132+
offset by the return value of `http_parser_execute()`.
133+
134+
135+
Callbacks
136+
---------
137+
138+
During the `http_parser_execute()` call, the callbacks set in
139+
`http_parser_settings` will be executed. The parser maintains state and
140+
never looks behind, so buffering the data is not necessary. If you need to
141+
save certain data for later usage, you can do that from the callbacks.
142+
143+
There are two types of callbacks:
144+
145+
* notification `typedef int (*http_cb) (http_parser*);`
146+
Callbacks: on_message_begin, on_headers_complete, on_message_complete.
147+
* data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);`
148+
Callbacks: (requests only) on_uri,
149+
(common) on_header_field, on_header_value, on_body;
150+
151+
Callbacks must return 0 on success. Returning a non-zero value indicates
152+
error to the parser, making it exit immediately.
153+
154+
In case you parse HTTP message in chunks (i.e. `read()` request line
155+
from socket, parse, read half headers, parse, etc) your data callbacks
156+
may be called more than once. Http-parser guarantees that data pointer is only
157+
valid for the lifetime of callback. You can also `read()` into a heap allocated
158+
buffer to avoid copying memory around if this fits your application.
159+
160+
Reading headers may be a tricky task if you read/parse headers partially.
161+
Basically, you need to remember whether last header callback was field or value
162+
and apply following logic:
163+
164+
(on_header_field and on_header_value shortened to on_h_*)
165+
------------------------ ------------ --------------------------------------------
166+
| State (prev. callback) | Callback | Description/action |
167+
------------------------ ------------ --------------------------------------------
168+
| nothing (first call) | on_h_field | Allocate new buffer and copy callback data |
169+
| | | into it |
170+
------------------------ ------------ --------------------------------------------
171+
| value | on_h_field | New header started. |
172+
| | | Copy current name,value buffers to headers |
173+
| | | list and allocate new buffer for new name |
174+
------------------------ ------------ --------------------------------------------
175+
| field | on_h_field | Previous name continues. Reallocate name |
176+
| | | buffer and append callback data to it |
177+
------------------------ ------------ --------------------------------------------
178+
| field | on_h_value | Value for current header started. Allocate |
179+
| | | new buffer and copy callback data to it |
180+
------------------------ ------------ --------------------------------------------
181+
| value | on_h_value | Value continues. Reallocate value buffer |
182+
| | | and append callback data to it |
183+
------------------------ ------------ --------------------------------------------

0 commit comments

Comments
 (0)