Mark Edmondson https://code.markedmondson.me/ Recent content on Mark Edmondson Hugo -- gohugo.io en-GB Fri, 07 Jul 2023 00:00:00 +0000 Running Large Language Models on Google Cloud Platform via Cloud Run, VertexAI and PubSub - LLMOps on GCP https://code.markedmondson.me/running-llms-on-gcp/ Fri, 07 Jul 2023 00:00:00 +0000 https://code.markedmondson.me/running-llms-on-gcp/ <p>Hello blog, its been a long time. Since I finished the GA4 book I have had a good break and lots of life events have happened such as a new job, philosophies and family arrangements, but I have always intended to pick this thread up again once I had an idea on where it would best lead.</p> <p>As old readers may remember, I&rsquo;ve always tried to work on the meta-horizons of where I am, restlessly looking for the next exciting lesson, and that impulse has led me to Large Language Models (LLMs) sparked off by the Chat-GPT revolution, but foreshadowed by the image generation models such as Stable Diffusion a few months before.</p> <p>A key facilitator has been Harrison Chase&rsquo;s <a href="https://python.langchain.com">Langchain</a>, an active hive of open-source goodness. It has allowed me to learn and imagine and digest this new active field of LLMops (Large Language Model Operations), that is the data engineering to make LLMs actually useful on a day to day basis. I took it upon myself to see how I could apply my Google Cloud Platform (GCP) data engineering background to these new toys Langchain has helped provide.</p> <p>This means I now have this new brain, Edmonbrain, that I converse with daily in Google Chat, Slack and Discord. I have fed it in interesting URLs, Git repos and Whitepapers so I can build up a unique bot of my very own. I fed it my own book, and can ask it questions about it, for example:</p> Activating GA4 events with GTM Server-Side and Pub/Sub for Fun and Profit https://code.markedmondson.me/sending-ga4-events-pubsub/ Tue, 04 Jan 2022 00:00:00 +0000 https://code.markedmondson.me/sending-ga4-events-pubsub/ <p><em>Image from <a href="https://solarsystem.nasa.gov/resources/758/brief-outburst/?category=solar-system_sun">https://solarsystem.nasa.gov/resources/758/brief-outburst/?category=solar-system_sun</a></em></p> <p>With Google Tag Manager Server-side (GTM-SS), the scope on what you can do with your GA4 events is much enhanced, since using GTM-SS you have the ability to interact easily with other GCP services, in particular easier Google authentication. This integration can allow you to enrich your data streams or send your GA4 data to different locations other than the Google Marketing Platform. The first example of this has been using the BigQuery API in your GTM-SS templates to export your event data, but what if you need your event data on a more real-time basis? For that, there is <a href="https://cloud.google.com/pubsub/docs/overview">Google Pub/Sub</a>.</p> Google Tag Manager Server Side on Cloud Run - Pros and Cons https://code.markedmondson.me/gtm-serverside-cloudrun/ Fri, 21 Aug 2020 00:00:00 +0000 https://code.markedmondson.me/gtm-serverside-cloudrun/ One of the most exciting developments in 2020 for me is the launch of Google Tag Manager Server Side, which lies at the intersection of cloud and digital analytics that I&rsquo;ve gravitated towards in recent years. There are many good resources out there on GTM Serverside, Simo in particular has got me up to speed with his excellent range of posts. This post will assume you&rsquo;ve read those, and be more about the viability of deploying GTM server side within Cloud Run. Shiny on Google Cloud Run - Scale-to-Zero R Web Apps https://code.markedmondson.me/shiny-cloudrun/ Sat, 01 Aug 2020 00:00:00 +0000 https://code.markedmondson.me/shiny-cloudrun/ There are some references on how to deploy Shiny apps to Cloud Run around the web and in various bits of my package documentation, but its a cool service so I thought it worth pulling out and having a blog post to refer to. Why Shiny on Cloud Run? As mentioned in my R at scale on Google Cloud Platform post, Cloud Run is a container-as-a-service which lets you deploy Docker containers to the web without needing to worry about the infrastructure. Online payments for data science apps (DSaaS) using R, Shiny, Firebase, Paddle and Google Cloud Functions https://code.markedmondson.me/datascience-aas/ Sun, 28 Jun 2020 00:00:00 +0000 https://code.markedmondson.me/datascience-aas/ This post has been delayed due to the events of 2020 including a global pandemic and social justice demonstrations, if you are touched by those I wish you all the best. For my family and I we have been very lucky and spent most of the last months tending our garden in the Copenhagen suburbs, but others have been a lot more directly affected. One consequence of the pandemic may be that from now on more interaction will be online, which presents both opportunities and challenges. Introducing googleCloudRunner - serverless R on Google Cloud Platform https://code.markedmondson.me/googleCloudRunner-intro/ Sat, 18 Jan 2020 00:00:00 +0000 https://code.markedmondson.me/googleCloudRunner-intro/ I&rsquo;ve been working on googleCloudRunner over the last couple months which is soon available on CRAN - https://code.markedmondson.me/googleCloudRunner/ googleCloudRunner feels like the culmination of my last couple of years interest in this R/Google Cloud Platform niche I find myself in. The package seems to fulfill every use case I have wanted for working with R in the cloud, available now in a simple UI syntax that hopefully any R user can pick up quickly. gago: Blazingly fast Google Analytics API downloads with Go https://code.markedmondson.me/gago/ Wed, 09 Oct 2019 00:00:00 +0000 https://code.markedmondson.me/gago/ gago is a new Go library for working with the Google Analytics Reporting API v4. I used it as a way to learn Go, transferring across some of the lessons I learned from working with the Google Analytics API in googleAnalyticsR. In particular how to get fast downloads and adding an anti-sample option, whilst taking advantage of Go&rsquo;s natural multi-threaded nature. The imagined use case is for when you need to download Google Analytics data but you don&rsquo;t want to install an interpreted language such as Python or R to do so. R at scale on the Google Cloud Platform https://code.markedmondson.me/r-at-scale-on-google-cloud-platform/ Sat, 23 Feb 2019 00:00:00 +0000 https://code.markedmondson.me/r-at-scale-on-google-cloud-platform/ This post covers my current thinking on what I consider the optimal way to work with R on the Google Cloud Platform (GCP). It seems this has developed into my niche, and I get questions about it so would like to be able to point to a URL. Both R and the GCP rapidly evolve, so this will have to be updated I guess at some point in the future, but even as things stand now you can do some wonderful things with R, and can multiply those out to potentially billions of users with GCP. Auto Google Analytics Data Imports from Cloud Storage https://code.markedmondson.me/automatic-google-analytics-data-imports-cloud-storage/ Thu, 29 Nov 2018 00:00:00 +0000 https://code.markedmondson.me/automatic-google-analytics-data-imports-cloud-storage/ Continuing my infatuation with cloud functions (see last post on using cloud functions to manipulate BigQuery exports) this is a post showing how to bring together various code examples out there so that you can easily upload custom data imports from a Google cloud storage bucket. The code is available in this GitHub repo for useful cloud functions with Google Analytics Extended data imports Google Analytics offers various versions of uploads. R on Kubernetes - serverless Shiny, R APIs and scheduled scripts https://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts/ Wed, 02 May 2018 00:00:00 +0000 https://code.markedmondson.me/r-on-kubernetes-serverless-shiny-r-apis-and-scheduled-scripts/ <h2 id="why-run-r-on-kubernetes">Why run R on Kubernetes?</h2> <p><a href="https://kubernetes.io/">Kubernetes</a> is a free and open-source utility to run jobs within a computer cluster. It abstracts away the servers the jobs are running on so you need only worry about the code to run. It has features such as <a href="https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/">scheduling</a>, <a href="https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler">auto-scaling</a>, and auto-healing to replace nodes if they breakdown.</p> <p>If you only need to run R on a single machine, then its probably a bit OTT to use Kubernetes, but if you are starting to work with multiple Docker containers and/or VMs it gets more and more attractive to have a way to easily orchestrate them.</p> <p>Kubernetes works via <a href="https://www.docker.com/">Docker</a> containers, so if you are already familiar with using Docker for abstracting away code environments, it should be a short step up to abstracting away the computers those Docker containers run upon.</p> Run RStudio Server on a Chromebook as a Cloud Native https://code.markedmondson.me/rstudio-server-chromebook/ Tue, 05 Sep 2017 13:55:57 +0100 https://code.markedmondson.me/rstudio-server-chromebook/ I recently got an Asus Chromebook Flip with which I&rsquo;m very happy, but it did make me realise that if a Chromebook was to replace my normal desktop as my primary workstation, my RStudio Server setup would need to be more cloud native than was available up until now. TL;DR - A how-to on making RStudio Server run on a Chromebook that automatically backs up data and configuration settings to Google Cloud Storage is on the googleComputeEngineR website here. About https://code.markedmondson.me/about/ Sun, 22 Jan 2017 22:15:51 -0700 https://code.markedmondson.me/about/ Mark Edmondson is a Google Developer Expert. He currently lives in Copenhagen where he moved from Cornwall, UK in 2010. Hmm. Hmm. Hmm. Real-time forecasting dashboard with Google Tag Manager, Google Cloud and R Shiny - Part two https://code.markedmondson.me/real-time-GTM-google-cloud-r-shiny-2/ Sun, 22 Jan 2017 14:20:57 +0100 https://code.markedmondson.me/real-time-GTM-google-cloud-r-shiny-2/ In part two of this two part series we walk through the steps to stream data from a Google Tag Manager (GTM) implementation into a Google App Engine (GAE) web app, which then adds data to a BigQuery table via BigQuery&rsquo;s data streaming capability. In part two, we go into how to query that table in realtime from R, make a forecast using R, then visualise it in Shiny and the JavaScript visualisation library Highcharts. Real-time forecasting dashboard with Google Tag Manager, Google Cloud and R Shiny - Part one https://code.markedmondson.me/real-time-GTM-google-cloud-r-shiny-1/ Thu, 12 Jan 2017 23:03:57 +0100 https://code.markedmondson.me/real-time-GTM-google-cloud-r-shiny-1/ In part one of this two part series we walk through the steps to stream data from a Google Tag Manager (GTM) implementation into a Google App Engine (GAE) web app, which then adds data to a BigQuery table via BigQuery&rsquo;s data streaming capability. In part two, we go into how to query that table in realtime from R, make a forecast using R, then visualise it in Shiny and the JavaScript visualisation library Highcharts. Insights sorting by delta metrics in the Google Analytics API v4 https://code.markedmondson.me/quicker-insight-sort-metric-delta/ Thu, 01 Dec 2016 23:03:57 +0100 https://code.markedmondson.me/quicker-insight-sort-metric-delta/ As analysts, we are often called upon to see how website metrics have improved or declined over time. This is easy enough when looking at trends, but if you are looking to break down over other dimensions, it can involve a lot of ETL to get to what you need. For instance, if you are looking at landing page performance of SEO traffic you can sort by the top performers, but not by the top most improved performers. Launch RStudio Server in the Google Cloud with two lines of R https://code.markedmondson.me/launch-rstudio-server-google-cloud-in-two-lines-r/ Thu, 20 Oct 2016 23:03:57 +0100 https://code.markedmondson.me/launch-rstudio-server-google-cloud-in-two-lines-r/ I&rsquo;ve written previously about how to get RStudio Server running on Google Compute Engine: the first in July 2014 gave you a snapshot to download then customise, the second in April 2016 launched via a Docker container. Things move on, and I now recommend using the process below that uses the RStudio template in the new on CRAN googleComputeEngineR package. Not only does it abstract away a lot of the dev-ops set up, but it also gives you more flexibility by taking advantage of Dockerfiles. A digital analytics workflow through the Google Cloud using R https://code.markedmondson.me/digital-analytics-workflow-through-google-cloud/ Mon, 10 Oct 2016 23:03:57 +0100 https://code.markedmondson.me/digital-analytics-workflow-through-google-cloud/ There are now several packages built upon the googleAuthR framework which are helpful to a digital analyst who uses R, so this post looks to demonstrate how they all work together. If you&rsquo;re new to R, and would like to know how it helps with your digital analytics, Tim Wilson and I ran a workshop last month aimed at getting a digital analyst up and running. The course material is online at www. Efficient anti-sampling with the Google Analytics Reporting API https://code.markedmondson.me/anti-sampling-google-analytics-api/ Fri, 05 Aug 2016 23:03:57 +0100 https://code.markedmondson.me/anti-sampling-google-analytics-api/ Avoiding sampling is one of the most common reasons people start using the Google Analytics API. This blog lays out some pseudo-code to do so in an efficient manner, avoiding too many unnecessary API calls. The approach is used in the v4 calls for the R package googleAnalyticsR. Avoiding the daily walk The most common approach to mitigate sampling is to break down the API calls into one call per day. SEO keyword research using searchConsoleR and googleAnalyticsR https://code.markedmondson.me/search-console-google-analytics-r-keyword-research/ Tue, 21 Jun 2016 23:03:57 +0100 https://code.markedmondson.me/search-console-google-analytics-r-keyword-research/ In this blog we look at a method to estimate where to prioritise your SEO resources, estimating which keywords will give the greatest increase in revenue if you could improve their Google rank. Overview Thanks to Vincent at data-seo.com who proof read and corrected some errors in the first draft Data comes from Google Search Console and Google Analytics. Search Console is used to provide the keywords in these days post (not provided). Scheduling R scripts for a team using RStudio Server, Docker, Github and Google Compute Engine https://code.markedmondson.me/setting-up-scheduled-R-scripts-for-an-analytics-team/ Thu, 21 Apr 2016 23:03:57 +0100 https://code.markedmondson.me/setting-up-scheduled-R-scripts-for-an-analytics-team/ edit 20th November, 2016 - now everything in this post is abstracted away and available in the googleComputeEngineR package - I would say its a lot easier to use that. Here is a post on getting started with it. http://code.markedmondson.me/launch-rstudio-server-google-cloud-in-two-lines-r/ This blog will give you steps that allows you to run on Google Compute Engine a server that has these features: RStudio Server instance with multiple login. Apache to host a welcome webpage. googleAuthR 0.2.0 https://code.markedmondson.me/googleAuthR-0.2.0/ Fri, 05 Feb 2016 23:03:57 +0100 https://code.markedmondson.me/googleAuthR-0.2.0/ googleAuthR is now on CRAN version 0.2.0. This release is the result of using the library myself to create three working Google API libraries, and tweaking the googleAuthR code to better support the process. As a result all of these libraries are now able to be authorised with one Google OAuth2 login flow: googleAnalyticsR searchConsoleR bigQueryR Batching This means the libraries above and any other created with googleAuthR can take advatage of batching: this uses a Google API feature that means you can send multiple API calls at once. Jekyll + Github + Markdown = Blog https://code.markedmondson.me/hello-world/ Thu, 14 Jan 2016 23:03:57 +0100 https://code.markedmondson.me/hello-world/ Hello World! Welcome to my new home for code blogging. I&rsquo;ll keep the regular blog for all the other stuff, running on Posthaven. Why? Its educational, and free, and writing code posts using Posthaven&rsquo;s rich text editor was sometimes painful. Using this setup means its all written in Markdown so I get control over everything just the way I want it, and since it combines Github and Markdown, two things I know well already, Jekyll seemed worth knowing.