RudderStack is a customer data pipeline tool for collecting, routing and processing data from your websites, apps, cloud tools, and data warehouse.
More information on RudderStack can be found here.
This section assumes you have Terraform installed on your machine, and you can configure the AWS CLI credentials. Please refer to the following documents if needed:
- https://docs.aws.amazon.com/en_pv/cli/latest/userguide/cli-chap-install.html,
- https://www.terraform.io/downloads.html
- Go to the RudderStack dashboard and set up your account. Copy your workspace token from top of the home page.
- Replace
<your_workspace_token>indataplane.envwith the above token. - Configure a new source and get the
<source_write_key>. Thesource_write_keywill be used later for basic authentication to send the events.
If you are launching the machine in a default VPC, please skip this step and move on to the next section.
If you don't have a default VPC or want to launch RudderStack in a non-default VPC, check out the branch custom-vpc. Fill in the variables custom_vpc.vpc_id and custom_vpc.subnet_id in variables.tf depending on where you want to launch.
-
Create an AWS user with Administrator access and save your credentials in
~/.aws/credentials. These credentials are only used by Terraform. We don't need Administrator access but it is easy to setup. -
The AWS resources that we create is 1 EC2 key pair, 1 EC2 instance, 1 S3 bucket, 2 security groups (to open 22, 8080 ports) 1 IAM role and corresponding policy.
-
Create a SSH keypair. Store in any location (preferably in
~/.ssh/id_rsa_tf, otherwise update new location invariables.tf). If you want to use your existing keypair, you can skip generating a new one and provide the path to that keypair.
ssh-keygen -t rsa -b 4096 -C "[email protected]"
-
Clone this repo.
-
Change the S3 bucket name in
variables.tf, bucket names are global scoped. Change theprefixinvariables.tf, if needed. If you get a conflict, you might have to use a different bucket name. You can also update EC2 type (default volume type is gp2), volume size (default volume size is 100GB), etc in thevariables.tf -
terraform init -
terraform applyand Enteryeswhen prompted. -
Jot down the
instance_ipfrom output. -
You can now send events to following endpoints with basic auth. Basic auth username would be the source key you got earlier and empty password.
http://<instance_ip>:8080/v1/track
http://<instance_ip>:8080/v1/identify
http://<instance_ip>:8080/v1/page
http://<instance_ip>:8080/v1/screen
http://<instance_ip>:8080/v1/batch
- Configure a new S3 destination and give the bucket name that you created as part of Terraform setup.
-
Save your event data in a JSON file, say
event.json(A sampleevent.jsonhas been included in the repo) -
Make the following curl request to send an event
curl -u <source_write_key>: -X POST http://<instance_ip>:8080/v1/track -d @event.json --header "Content-Type: application/json"
If you come across any issues while following the steps in this guide, please feel free to contact us or start a conversation on our Slack channel. We will be happy to help you.