<![CDATA[Technology-on-Sea]]>/Ghost 0.6Thu, 05 Nov 2015 18:00:39 GMT60<![CDATA[What's the BIG deal about end user development?]]>End User Developed Applications (EUDA) also called User Developed Applications (UDA) and End User Development (EUD) is the practice of non-technologists or at least employees who are not in the technology department developing applications themselves and using those in a production setting123.

What's the little deal?

Many

]]>
/whats-the-big-deal-about-end-user-development/6caa3506-6e14-4e86-8121-c8cd2fff4500Thu, 05 Nov 2015 12:24:13 GMTEnd User Developed Applications (EUDA) also called User Developed Applications (UDA) and End User Development (EUD) is the practice of non-technologists or at least employees who are not in the technology department developing applications themselves and using those in a production setting123.

What's the little deal?

Many of you will be familiar with the issues that arise when non-experts develop applications and/ or when applications are developed in non-typical ways. This is not particular to end user application development but rather reflects immature technology practice wherever it occurs. Here are a few of the issues that can arise:

  1. inexpert (probably) use of the language and tooling leads to wasted time
  2. no development standards or poor adherence to standards leads to poor quality work
  3. no independent testing leads to errors in production
  4. no version control/ proliferation of old copies leads to wasted time and errors in production
  5. no quality control leads to errors
  6. no documentation leads to wasted time
  7. no resiliency planning leads to wasted time when unexpected circumstances arise
  8. opaque dependencies means applications depend on data/ connectivity/ user input/ accounts that are not obvious
  9. 'super glue' effect means applications get into a corporate production environment and are very hard to remove, UDA's get in to production without a rigorous tollgate process
  10. 'tail wagging the dog' effect means processes form around the application and pretty soon you are 'opening the confirmation spreadsheet' rather than 'confirming the transaction'.

All of that basically boils down to errors, poor process and inefficiency. That is all bad, I agree. But it's not the really BIG deal.

What's the BIG deal?

At one point in my professional past I worked for an organisation that did payment processing. They processed millions of transactions daily, all across the globe and had very robust and suitable systems for this (and I dare say larger) workloads.

They also had over 4000 user developed applications in current or recent use. Presuming every one of those applications had a single core use case then the end users in this business thought there were an additional 4000 ways to do this business.

That is the BIG deal!

This is interesting for so many reasons and I'll discuss a few in the next couple of paragraphs. Let me first state however that I didn't dig in to this 4000 so I cannot make any empirical statements about the complexity of these applications or the duplication in this population (although I know from anecdotal evidence that it wasn't particularly high). That certainly would have been nice to have but I'll continue without it in a more general sense of a mature business with a non-trivial number of user developed applications.


Let's go!

Firstly, looking at this from the point of view of wasted effort, there simply aren't 4000 extra use cases in a mature business operating in a well understood market. The payment processing business is very competitive and is only profitable at scale therefore any organisation that is in it must have the systems to do it. They've got the core use cases covered. They've got the non-core use cases covered. They are working on new use cases, at scale. If there were 4000 use cases they hadn't covered they'd be out of business. So those applications represent wasted effort.

Secondly, looking at this from the point of view of organisational dysfunction, end users are the ones in the trenches, doing the job/ using the systems day in and day out and they have requirements that aren't being met. They have 4000 of them! I know from first hand experience that end users are not knocking up spreadsheets just for the fun of it. Many End User Developers find it is a frustrating process but that frustration must have been outweighed by the frustration of an unmet requirement. 4000 times. I think it's reasonable to suggest this evidences a degree of the technology department being out of touch with its user base.

Thirdly, looking at this from the point of view of individual responsibility, how can an end user who understands the payment processing business think there are an additional valid 4000 things the organisation should be doing? There aren't! Now, granted, there's typically no global view, in other words the end user doesn't know that they are making the 4000th UDA but they must know they are not expert developers and consequently they are doing something potentially dangerous for the business. It's certainly true that the negative aspects of end user development are more widely known now than they once were. But very often it is easier to just 'bang out a spreadsheet' than to raise a requirement against a strategic system, present the case for escalation and argue succesfully for prompt fulfilment. So users have been doing the easy thing and not the right thing.

Finally, looking at this from the point of view of business agility, these applications are business quicksand. For all the reasons stated in the little deal section earlier, these applications tend to be very brittle. They very often cannot be run: by a different user; on a different machine; on a different date; in a different context or with different data. They are almost never scale invariant (in other words they will break with more volume) and they are almost never defect free. They are invisible at a global level but indispensable at a local level meaning they don't feature in planning and strategy but they are an impediment to execution of that strategy.

That's the big deal about end user development.


What should be done about it?

Maybe nothing. I've focussed only on the negatives but end user development delivers technology solutions in short order which capture business that would otherwise be lost and save time that would otherwise be wasted. Quick and dirty is quick afterall.

Maybe something. Most operations in a mature business have to be performed in a consistent manner. Application development is no different, whether done by a user or an expert. End user development is chaotic, standards-lite or standards-free and generally badly done. It shouldn't continue unsupervised in the organisation. There are technology solutions to the little deal problems provided by the likes of Cimcon and ClusterSeven among others. It's also reasonably straightforward to put surveillance in place to inventory end user applications and guard against proliferation.

Regarding the big deal problems, well UDAs can actually be part of the solution. UDA's are like bandaids. The very existence of a UDA is a pointer to management saying 'things may not be right here, come and have a look'. Reactionary policies like blanket bans on UDAs and hyper-aggressive retirement schedules will hide the underlying problems again. UDAs are the hitherto acceptable expression of end user frustration and they should be considered a valuable indicator for technology, operations and management.

That said the proliferation cannot go unabated. Local operations management must better manage the circumstances that lead to the creation of UDAs. For example: they shouldn't agree to take on processes with technology gaps. Technology product owners must provide transparent issue management and product roadmaps and actively engage with their user communities. Look at the evangelism programs that technology vendors commit to for their products. That's practically unheard of inside a corporate but it should exist. Technology strategy and business strategy must be hand-in-glove and better communication must explain why users may need to suffer short term pain in order for the organisation to gain in the long term. Quick and dirty is dirty afterall.

As always, I'd be very interested in your comments.

Have fun!

  1. End User Computing (EUC) while sometimes mentioned in the same context is something different. I define this as the practice of end users consuming technology rather than creating it. In my mind there is also an implication that the use of that technology is broadly inline with its expected use.

  2. I'm specifically discussing the creation of technology in this post as opposed to the use of an existing technology, for example: using a browser plugin that is non-standard for your organisation. I include the latter practice under the broader umbrella of Non-standard Technology Use (NTU) to which many of the same arguments in this post apply but it's not under discussion here.

  3. I'm mostly discussing spreadsheet applications as that seems to be the predominant technology for end user development in the corporate space. However batch files, shell files, script files (given an execution context), MS Access databases, MSOffice and OpenOffice macros are all available to be 'developed' by the end user on most corporate desktops and this discussion applies equally to them.

]]>
<![CDATA[What are all the rows for in Excel?]]>A look at the FUSE corpus using F#, nodejs, CouchDB and R. Part 1 of n.

Most spreadsheet researchers are familiar with the EUSES Corpus1 (4498 spreadsheets). More recently (thanks to Felienne Hermans and Emerson Murphy-Hill), the Enron Corpus2 (15,770 spreadsheets), came to the attention of the

]]>
/what-are-all-the-rows-for/b0ca6e05-7d40-4c86-8c49-d740a7581330Wed, 19 Aug 2015 15:38:53 GMTA look at the FUSE corpus using F#, nodejs, CouchDB and R. Part 1 of n.

Most spreadsheet researchers are familiar with the EUSES Corpus1 (4498 spreadsheets). More recently (thanks to Felienne Hermans and Emerson Murphy-Hill), the Enron Corpus2 (15,770 spreadsheets), came to the attention of the wider community. More recently still, the FUSE corpus3, has made some 249,376 spreadsheets available to everyone. So, I decided to have a look at the use of the grid across all these workbooks.


Summary.

The process I followed succeeded in analysing nearly 95% of the spreadsheets in the corpus. The scatter plots below show the last row and last column produced by the analysis for each of the 236,226 workbooks I was able to analyse completely.

pre 2007 last row vs last col, log scale 2007-2013 last row vs last col, log scale

The analysis shows concentration of use of the grid is overwhelmingly in the upper left corner (in other words R1C1). This is borne out by the summary statistics as well.

The workbooks break down as follows:

Once extreme workbooks are removed, distinguishing between pre-2007 sheets and the rest shows that generally Excel users are not taking advantage of the extra 983,040 rows nor extra 16,128 columns4. In fact while there has been a more than thousand fold increase in the number of available cells there has only be an approximate doubling in the total number of cells used.

Even including the extreme workbooks the average Excel 2007-2013 workbook in the analysis uses just less than 0.01% of the grid. Looking at it another way, based on 0.01% occupancy rate, if Excel was Las Vegas, there'd only be 15 people staying tonight5.


As practicioners, we know this is true from our own experience. We even advise and prefer parsimonious and organised use of the grid because it generally aids understanding of the workbook. Using the grid for persisting data is contrary to best practice and no higher level function in Excel (non-trivial calculation, pivot, chart, print) involving that many (heterogenous) cells could be sufficiently performant (on a desktop machine) or even be guaranteed to work. So, from the point of view of best practice it's a good thing that, on the whole, users aren't filling the grid.

So why are all the rows there?

They are there for the edge case. So that regardless of the size of your data, the grid will not hinder you. In my view however it hinders the beginner. Any piece of data can go in an effectively limitless number of locations. When teaching I very commonly come across the question: 'where should I put 'X''. The extent of the grid demands organisation. The extent of the grid permits effectively unbounded expansion of a model. How many novice users arrange their spreadsheets in such a way that they can cope with the flexibility the grid permits? Of course, the answer is, very few.

This highlights how important it is to have a methodology when developing/ using spreadsheets. If you are in the financial sector, traditionally heavy users of Excel, then there are a number of organisations like Operis, Corality and F1F9 who offer training and even valuable free advice on best practice use of Excel. But there is nothing to stop you or your organisation developing your own methodology, especially if a 'balance sheet style' doesn't suit your domain. It can start simply with something like: every cell with a number must be accompanied by a row header or column header explicitly stating the units of measure, and evolve from there. Getting a consistent approach to spreadsheet development/ use across your organisation will be worth the effort.

Separately, I've written other posts about a novice mode in Excel and I'd like to add to that in this post. I think the novice mode should include the following:

  • grey out anything below and to the right of the viewport (anchored at cell A1), to discourage (but not prevent) putting data there
  • 'garbage collect' (recursively) the furthest unused cells below and to the right of the viewport. Garbage collection would mean if a cell contains no data, no formulas, is not part of a named range and is not in a print area then it should be greyed out and excluded from the Worksheet.UsedRange. The used range must always be rectangular however, a jagged range is not permitted.

I plan to continue analysis of the FUSE corpus with a focus on identifying aspects of spreadsheet use that could be improved to make life easier for Excel beginners. So expect more posts on the idea of a novice mode and ways in which constraining the product can actually make it easier to use for most people. I'd really appreciate any comments you'd like to share.

The remainder of this post outlines the steps I took to perform the analysis.


Method

If you'd like to follow along with the method, I am referencing the information in this link: static.barik.net/fuse/.

This analysis was performed on the current Fuse set which at the time of writing was 249,376 binaries extracted from the Common Crawl files from Winter 2013 through December 2014. Fuse is a dynamic archive hence why I'm recording this fact.

236,226 workbooks were successfully analyzed. In order to perform this analysis I took the following steps:

  • Downloaded the JSON metadata file (fuse-bin.analysis.dedup.poi-dec2014.json.gz)
  • Decompressed the file with 7-zip and used nodejs to read it into a CouchDB instance. MongoDB is suggested and commands are provided to load it in to a MongoDB instance but it's not my cup of tea. CouchDB worked fine in any case, although because I was doing it on my laptop I had to split the file and batch the inserts to get it to work.
  • FUSE does provide POI stats in the JSON, but a MapReduce on doc.POI['countCOLUMNS'], doc.POI['countCOLUMN'], doc.POI['countROW'] and doc.POI['countROWS'] seemed to show the majority of these counts at zero, so I decided to roll my own.
  • Conveniently FUSE provides all 249,376 binaries and an index on their site so I downloaded the index file (fuse-all.sha1.sorted-dec2014.txt) and the archive (fuse-binaries-dec2014.tar.gz).
  • Decompressed the archive using 7-zip. It is 21.3GB fully expanded.
  • I then iterated through the collection using F# and COM Interop to discover the last non-empty row and last non-empty column on each worksheet in each workbook. Then per workbook I took the maximum last used row from all sheets in the book and separately the maximum last used column from all sheets to produce a blended maximum used range for the workbook. In other words, using the example below, if the maximum used range in Sheet1 was A1:T10 and the maximum used range in Sheet2 was A1:E15 the workbook maximum used range was A1:T15.

How to choose the maximum used range

An aside here. There are plenty of other technologies available to read spreadsheets and I looked at and dismissed the following: Gembox (because the free version only lets you read 150 rows per sheet and 5 sheets per workbook), ExcelProvider (because it looks at the file extension to determine whether to open a binary stream or an xml stream and the FUSE file names don't have file extensions), NPOI (this is the .NET port of POI so it was appealing to use similar technology to the FUSE guys but I just didn't grok it immediately so maybe I'll come back to it) and ExcelDataReader (I did a partial implementation using this but there ended up being a lot of files it couldn't open, the crux of the issue being not having a good way to determine whether to open a binary stream - for xls - or an xml stream - for xlsx). So I settled on Office.Interop. Which, is not without its own issues - especially around tightly managing your RCW references (here's a good stackoverflow post on the subject) - but FUSE notes some 28,616 workbooks unreadable by their analysis tools as opposed to my 13,150. Many of the COM errors came from password protected workbooks and the sudden death of the RPC server, which I couldn't get to the bottom of. One further point worth noting is that COM introduces a dependency on Windows which restricts your VM options if you are pushing this analysis up into the cloud. A definite next step for me so whether or not I continue with COM will remain to be seen.

  • The technique I used for finding the last used column was:
let cellsColl = sheet.Cells  
let firstCell = cellsColl.[1,1] :?> Range  
let mutable lastCell = cellsColl.[1,1] :?> Range  
let mutable lastCol = -1  
try  
  try
    lastCell <- cellsColl.Find("*", firstCell, Excel.XlFindLookIn.xlFormulas, Excel.XlLookAt.xlPart, Excel.XlSearchOrder.xlByColumns, Excel.XlSearchDirection.xlPrevious, Type.Missing, Type.Missing, Type.Missing)
            if(not(obj.ReferenceEquals(lastCol, null))) then lastCol <- lastCell.Column
        with
            | _ as ex -> printfn "%s has thrown an error: %s" sheet.Name ex.Message
finally  
  // clean up com references    

Then change Excel.XlSearchOrder to Excel.XlSearchOrder.xlByRows to find the last used row. I have assumed content in all sheets starts in cell A1 rather than attempt to resolve the upper left corner of the used range aswell as the bottom right. I think this is a reasonable assumption.

  • Having got the results I then used nodejs to update the JSON records in CouchDB. Once again due to the resource constraints on my machine this operation needed to be batched.
  • Finally I wrote another MapReduce to get the results back out in csv and loaded this content in to R.
  • I split the data in to pre and post Excel 2007 groups then produced the scatter plots for the last row vs the last column. I used the same log scales for both pre and post Excel 2007 to highlight the similarity in the distributions.

In terms of improving the method it's worth noting that COM is not fast. It also introduces a dependency on Windows and Excel. Also, the programming is just plain ugly. To remove the need for COM a reliable method for determining whether the file is a binary file or xml file would need to be developed and then the ExcelDataReader could be used. NPOI is also worth further investigation. I will look at these two technologies next before pushing the analysis up in to the cloud.

Please comment if you've got any questions I can help with.

Have fun!


Footnotes:
  1. Marc Fisher II and Gregg Rothermel. The EUSES Spreadsheet Corpus: A shared resource for supporting experimentation with spreadsheet dependability mechanisms. In Proceedings of the 1st Workshop on End-User Software Engineering, pages 47-51, St. Louis, MO, USA, May 2005

  2. Felienne Hermans and Emerson Murphy-Hill. Enron's Spreadsheets and Related Emails: A Dataset and Analysis. 37th International Conference on Software Engineering, ICSE 2015

  3. Titus Barik, Kevin Lubick, Justin Smith, John Slankas, and Emerson Murphy-Hill. "Fuse: A Reproducible, Extendable, Internet-scale Corpus of Spreadsheets." In: Proceedings of the 12th Working Conference on Mining Software Repositories (Data Showcase), Florence, Italy, 2015.

  4. In section V of the FUSE paper a limitation of the Common Crawl which means it can only store binary files no greater than 1MB is highlighted so I'm aware that this analysis excludes really big Excel files and that they could conceivably skew the grid usage down and to the right. But the paper goes on to state that such files only make up low single digit percentages in the other available corpora so my contention is that it wouldn't make a very big difference to this analysis.

  5. 150,544 rooms as at 03/2015 according to the Las Vegas Convention and Visitor Authority

  6. An extreme workbook is defined as one where data is reported either in row 1,048,576 or column 16,384. These are removed from the workbook summary only for the purpose of calculating the blended average cell usage. They remain in the scatter plots and statistical analysis.

]]>
<![CDATA[Greedy formulas and layout errors, the case for a novice mode in Excel.]]> What you are looking at is an Excel 2013 worksheet containing a VLOOKUP formula written slightly incorrectly, giving completely the wrong result. What you can also see is Excel giving two indications that something might be wrong. However if you put "background error check excel" in to Google you will

]]>
/greedy-formulas-and-layout-errors-the-case-for-a-novice-mode-in-excel/dc89fcf6-8880-45a7-a729-a8fdccc568b8Tue, 07 Jul 2015 19:42:00 GMT What you are looking at is an Excel 2013 worksheet containing a VLOOKUP formula written slightly incorrectly, giving completely the wrong result. What you can also see is Excel giving two indications that something might be wrong. However if you put "background error check excel" in to Google you will find as many links telling you how to turn it off as how to use it properly.1 As I'll demonstrate, using it to trap this particular error scenario is not straightforward. If you'd like to follow along the workbooks are in the 2013 folder in my Github repo. Please star it if it helps you.

I've spent a lot of time over the last several years in big corporates teaching people how to use Excel. In that time I've seen plenty of VLOOKUPs that have been entered like this. What's more I've heard the use of this particular formula syntax (with the second argument specified as columns, this is the greedy in the blog title, and the deliberate absence of the final argument) advocated at a peer to peer level, in other words, colleagues learning from their colleagues.

In the rest of this post I'll examine what's going wrong, how Excel is trying to tell us and where that explanation requires some intuitive leaps which are most likely beyond the understanding of novice spreadsheet users.


The following image is the function wizard dialogue for the particular formula in question and I draw your attention to the elements I've labelled 1 and 2.

Function Wizard

  1. The obliquely named Table_array is the range, in the leftmost column of which you will attempt to match your Lookup_value. Now this is a legitimate range, (it's all of column B, C, D and E), but in the case of this workbook there is data further down in those columns, out of sight and impacting the formula result (this is the layout error in the blog title, there are 1,048,576 rows after all). The justification I've heard for using the range B:E rather than $B$3:$E$6 is because it is quicker to select and because using that syntax you won't miss any data or have to update your formula should more rows be added to the Table_array. Well, given a preference for use of the mouse over the keyboard, not uncommon in novice users, then I agree it probably is quicker to select. And you certainly won't miss any data but the danger is you will include data that you didn't intend to include. Finally, there are better ways to have a formula that will automatically update when new rows are added, Tables come to mind immediately.

  2. The even more obliquely named Range_lookup is displayed in standard font (as opposed to bold), indicating that it is optional. It hasn't been entered and the behaviour is as expected2, an approximate match. Now the justification I've heard for this is, circularly: 'it's optional, you don't need to enter it'. I don't want to examine approximate matching on text values, there are valid cases but they are edge cases. I use and teach the heuristic that when matching text values this final argument should always be specified as FALSE indicating an exact match is required. That's the case in the example under discussion here.

So, in summary, the final argument is wrong for matching on text values and the second argument should be more parsimoniously stated. But what does the error checking say?

Quite honestly, that doesn't seem like a big deal to me. I've also run the INQUIRE workbook analysis on this book and in the report the following information is presented. I've drawn out the salient points for you because frankly, it isn't obvious.
Once again we're referencing blank cells. Once again, no big deal. We also have a formula referencing text cells. Well that's what we wanted we're matching on the widget name after all so, no big deal. Granted, 384 blank referenced cells jumps off the page a bit but that's just stating the first point the other way round so, no big deal. But, the error is egregious! We're reporting the revenue for the full year 2014 sales of Widget A instead of the part year 2015 sales. Where is the warning:

Danger, Will Robinson!

Instead it's a meeker:

Danger? Will Robinson.

Taking the text 'error' first and drilling in to the appropriate sheet [Text Cell Refs] on the analysis workbook we should interpret it like this: the formula is a lookup and it is working on cells containing text. We have to link that in our mental Excel heuristics dictionary to the rule of thumb: matching on text values must be exact. Having made that association then we know we must make a correction to the formula at least as follows, from =VLOOKUP(I3,B:E,4) to =VLOOKUP(I3,B:E,4,FALSE). Now, this will be enough to fix the error but let's continue.

Taking the two blank 'errors' next and drilling in to the appropriate sheets [Blank Cell Refs] and [Blank Referenced Cells] on the analysis workbook we should interpret it like this: the formula is a lookup and it is looking up in a range that contains blank cells. Now in and of itself that is not a problem for a lookup (performance notwithstanding). In fact this analysis is a very indirect way of zeroing in on the real issue. But, at this point we should consult our mental Excel heuristics dictionary for the rule of thumb: parsimony over greed when consuming arguments in a function. If we're not swayed by this we should, at a minimum, traverse the entirety of the left-most column (B)3 of the Table_array (that's the column where the Lookup_value is being matched) and in so doing we would discover the data for 2014. We could then move the 2014 data somewhere else which would fix the error, in which case we could leave the formula unchanged and ignore or dismiss the error. Hence why I stated earlier that this analysis is only an indirect way of identifying the issue. Instead however, we will make a correction to the formula at least as follows, from =VLOOKUP(I3,B:E,4) to =VLOOKUP(I3,B3:E6,4). That will also be enough to fix the error.

Finally we should make both corrections so the formula becomes =VLOOKUP(I3,B3:E6,4,FALSE) or better yet =VLOOKUP(I3,$B$3:$E$6,4,FALSE). We should also move the 2014 data, probably on to its own tab.


I think you'll agree that none of that was particularly obvious. That's because that use of the formula is not wrong, per se, it's only ill-advised. Therefore it's difficult to catch it. Were there a syntax error Excel might catch it as a #REF! or it might pop up a warning and prevent the formula from being calculated. But it can't tell you this is outright wrong because it's not. But it is!

In order to address this, in my opinion, Excel should have a novice mode. I will expand on this in future blog posts but for the time being, that novice mode should have the following functionality:

  • forbid optional arguments in vlookup (I'll look in to whether this could be extended to all built in functions or a popular subset)
  • forbid all column only range references in all functions (in other words references like B:E)

I know I'm not the only one with a view on the Excel flexibility vs control debate. There are parties trying to address this by bringing software development best practice to spreadsheet development with unit/ integration testing and peer review among other things. There are companies selling rules based automated monitoring. There is more training available than you can possibly imagine. And there's also error checking in the box. But when a novice opens Book1.xlsx and enters some data and a formula, that's spreadsheet use, not development. And there's currently nothing to stop them doing it wrong. I'd very much appreciate any comments you would like to share on the idea of a novice mode. In the near future I'll write some more posts further examining this.

Have fun!


Footnotes:

  1. I'm aware how unscientific this statement is given the tailored search results from Google nevertheless the point that there are a lot of links advising how to disable error checking is valid. Try it for yourself and see what you get.

  2. As an aside I think there is a case to be made for an apparent inconsistency in not entering the final argument. Spot the subtle difference in the formulas below but the big difference in the formula result. I'll follow up on this in a future post.

  3. Use the key combination CTRL + DOWN ARROW to quickly skip down the column from data region to data region.

]]>
<![CDATA[Hosting a Ghost blog on Github for free!]]>/hosting-a-ghost-blog-on-github-for-free/7eb18528-b5ce-486a-9c3c-b30a56f666faWed, 01 Jul 2015 07:56:23 GMT

What you are looking at is a blog post written on the Ghost blogging application then turned into static html and pushed in to a Github repo.

Sounds like a lot of hassle, why bother? Well, because it will save you money.

In order to edit the blog I run Ghost v0.6.4 locally in node v0.10.9 on my Windows machine. On changing content/ theme or adding/ editing a post I use Buster to transpile the dynamic content from the running Ghost instance in to interim static html pages. I then run some post-processing steps with Gulp to produce the final version of the static html and assets (css, js, images etc.). Finally I push those contents up to my GitHub repo.

I've taken the additional step of having my own custom url so I paid for the registration (USD14) and I pay monthly for the DNS using DNSimple which is USD0.80 on a maxed out (10 domains for USD8 per month) legacy Silver plan. So it's not quite free but 80 cents a month running cost for a blog takes some beating!

If you are going to take this route to host your blog then the following two links are invaluable and Alexander Kahoun is a legend.

  1. Using GitHub Pages with Ghost and Buster on Windows (part 1)
  2. Using GitHub Pages with Ghost and Buster on Windows (part 2)

My own modest contribution to the process can be found in the README on my ghost-builder repo which you are welcome to have. Please star it if it helps you.

It's further worth understanding the limitations of Github pages for hosting and why it's necessary to take all these steps. Fundamentally Github Pages will only serve static content. That means only exactly the files hosted on the server. It will not serve content from an app like Ghost. That's why you aren't hosting Ghost on Github but rather the output from the Ghost application running on your local machine. You are using Ghost because it is a beautiful way to blog and it is fully featured running locally. But you could conceivably use another blogging platform and do the same sort of process to get it to emit static content which you then push in to your Github repo to display using Github Pages. Phew! Get it? If not put a comment below and I'll try to help you out.

Have fun!

]]>
">Ghost on Github but rather the output from the Ghost application running on your local machine. You are using Ghost because it is a beautiful way to blog and it is fully featured running locally. But you could conceivably use another blogging platform and do the same sort of process to get it to emit static content which you then push in to your Github repo to display using Github Pages. Phew! Get it? If not put a comment below and I'll try to help you out.

Have fun!

]]>