Stephen A. Fuqua (saf)

a Bahá'í, software engineer, and nature lover in Austin, Texas, USA

It looks like a beautiful morning in Austin, Texas, from the comfort of my feeder-facing position on the couch. Later in the afternoon I will get out and enjoy it on my afternoon walk with All Things Considered. As I write these lines a bully has been at work: a Yellow-Rumped Warbler (Myrtle) has been chasing the other birds away. Thankfully this greedy marauder was absent for most of the morning, as I read portions of Dr. J. Drew Lanham’s The Home Place, Memoirs of a Colored Man’s Love Affair with Nature.

Lanham, who also penned the harrowing-yet-humorous 9 Rules for the Black Birdwatcher, shares a compelling and beautifully-written story of family and place — at least, those are the key themes of first third of the book that I’ve read thus far. Appropriate to this day of reflection and remembrance for one of our great American heroes, Dr. Martin Luther King, Jr, it is a story of forces and people who shaped this scientist, a Black man from the South who learned to love nature from the first-hand experiences of playing, watching, listening, chopping, and hoeing on the edge of the South Carolina piedmont.

Understanding that one man’s experience, views, and insights can never encapsulate those of an entire amorphous people, it is nevertheless critical that we all spend time getting to better know and understand the forces that shape our myriad cultures and the people who emerge from them. As we become more familiar with “others,” “they” become “we” and “we” become self-aware. Becoming self-aware, we recognize the truth of Dr. King’s famous saying:

“We are caught in an inescapable network of mutuality, tied in a single garment of destiny. Whatever affects one directly, affects all indirectly.”

Being aware of our mutuality, believing in it deeply, we can make better choices about how to live well with and for everyone on this planet, both those alive today and those yet to be born.


A passage of beautiful prose from pages 1-2 of The Home Place, to give you a taste of what is in store. After describing his ethno-racial heritage — primarily African American with an admixture of European , American Indian, Asian, “and Neanderthal tossed in” — he remarks,

“But that’s only a part of the whole: There is also the red of miry clay, plowed up and planted to pass a legacy forward. There is the brown of spring floods rushing over a Savannah River shoal. There is the gold of ripening tobacco drying in the heat of summer’s last breath. There are endless rows of cotton’s cloudy white. My plumage is a kaledeiscopic rainbow of an eternal hope and the deepest blue of despair and darkness. All of these hues are me; I am, in the deepest sense, colored.”


Birds seen at the “backyard” feeder this morning while reading. Photos are a few weeks old but all of these species were observed today. © Tania Homayoun, some rights Creative Commons:
by-nc-nd

Black-Crested Titmouse
Black-Crested Titmouse


Carolina Wren
Carolina Wren


Hermit Thrush
Hermit Thrush


Orange-crowned Warbler
Orange-crowned Warbler


Ruby-crowned Kinglet
Ruby-crowned Kinglet


Yellow-rumped Warbler
Yellow-rumped Warbler


Also seen: Northern Cardinal, American Robin, Bewick’s Wren.

Are algorithms doomed to be racist and harmful, or is there a legitimate role for them in a just and equitable society?

Algorithms have been causing disproportionate harm to low- and middle-income individuals, especially people of color, since long before this current age of machine learning and artificial intelligence. Two cases in point: neighborhood redlining and credit scores. While residential redlining was a deliberately racist anti-black practice [1], FICO-based credit scoring does not appear to have been created from a racist motive. By amplifying and codifying existing inequities, however, the credit score can easily become another tool for racial oppression [2].

Still, with appropriate measures in place, and a bit of pragmatic optimism, perhaps we can find ways to achieve the scalability/impartiality goals of algorithms while upholding true equity and justice.

equality, equity, justice graphic
Justice: changing conditions, removing the barriers. Could not find the original source to credit, so I drew my own version of this thought-provoking graphic. I leave the sport being played behind the fence up to your imagination.


Fresh out of college I served as an AmeriCorps*VISTA at a non-profit dedicated to supporting small business development for women and minorities. There I began learning about the detrimental effects, deliberate and insidious, of so many modern policies around finance and housing. Later when I became a full-time employee, I was given a mission: come up with a rubric - an algorithm - for pre-qualifying loan applicants. The organization only had so much money to lend, and to remain solvent it would need to ensure that most loans would be repaid in full. Could we come up with a scoring mechanism that would help focus our attention on the most promising opportunities, bring a measure of objectivity and accountability, and yet remain true to our mission?

The organization was founded and, at that time, run by Jeannette Peten, an African American woman with a background in business finance and a passion for helping small businesses to succeed. Where credit scores attempt to judge credit worthiness through a complex calculation based on repayment histories, she asked me to take a broader approach that was dubbed the Four C’s of lending: Cash Flow, Character, Credit, and Collateral. Thus: what manner of calculation, utilizing these four concepts, would yield a useful prediction of a potential borrower’s capacity and capability to thrive and repay the loan?

Roughly following a knowledge engineering [3] approach, we brainstormed simple scoring systems for each of the C’s, with Character counting disproportionately relative to the others. To avoid snap judgment and bias, Character especially had to be treated through careful inference rather than subjective opinion, and thus was drawn from multiple sources including public records, references, site visits, business training and experience, and more.

Then I applied the scores to existing borrowers for validation: would the successful borrowers have made the grade? No? Tweak the system and try again. And again. And when a handful of great businesses in the target demographic were still on the borderline, my mentor identified additional “bonus points” that could be given for high community impact. I do not recall any formal measurement of model fitness / goodness beyond the simple question: does this model include more of our pool of successful loan applicants than all other models? Admittedly this was an eyeball test, not a rigorous and statistically valid experiment.

create model, test, tweak, validate, evaluate

Machine learning is the automated process of creating models, testing them against a sample, and seeing which yields the best predictions. Then (if you are doing it right) cross-validating [4] the result against a held-out sample to make sure the model did not over-fit the training data. In a simplistic fashion, I was following the historical antecedent of machine learning: perhaps we can call it Human Learning (HL). As a Human Learning exercise, I was able to twiddle the knobs on the formula, adjusting in a manner easily explained and easily defended to another human being. Additionally, as an engineer whose goal was justice, rather than blind equality, it was a simple matter to ensure that the training set represented a broad array of borrowers who fell into the target demographic.

In the end, the resulting algorithm did not make the lending decisions for the organization, and it required human assessment to pull together the various criteria and assign individual scores. What it did accomplish was to help us winnow through the large number of applicants, identifying those who would receive greater scrutiny and human attention.

Nearly twenty years ago, we had neither the foresight nor the resources to perform a long-range study evaluating the true effectiveness. Nevertheless, it taught this software engineer to work harder: don’t use the easy formula, make sure the baseline data are meaningful and valid for the problem, listen to domain experts, and most of all treat equity and justice as key features to be baked in rather than bolted on.

blurred image of the scoring spreadsheet


Algorithms are increasingly machine-automated and increasingly impacting our lives, all too often for the worse. The MIT Technology Review summarizes the current state of affairs thus:

“Algorithms now decide which children enter foster care, which patients receive medical care, which families get access to stable housing. Those of us with means can pass our lives unaware of any of this. But for low-income individuals, the rapid growth and adoption of automated decision-making systems has created a hidden web of interlocking traps.”[5]

On our current path, “color-blind” machine learning will continue tightening these nets that entrap those “without means.” But it does not have to be that way. With forethought, care, and a bit of the human touch, I believe we can work our way out of this mess to the benefit of all people. But it’s gonna take a lot of work.


Resources

  1. The History of Redlining (ThoughtCo), Redlining in America: How a history of housing discrimination endures (Thomson Reuters Foundation). Potential modern version: Redlined by Algorithm (Dissent Magazine), Modern-day redlining: How banks block people of color from homeownership (Chicago Tribune).
  2. Credit scores in America perpetuate racial injustice. Here’s how (The Guardian). Counterpoint from FICO: Do credit scores have a disparate impact on racial minorities?. Insurance is another arena where “color-blind” algorithms can cause real harm: Supposedly ‘Fair’ Algorithms Can Perpetuate Discrimination (Wired).
  3. Knowledge Engineering (ScienceDirect).
  4. What is Cross Validation in Machine learning? Types of Cross Validation (Great Learning Blog)
  5. The coming war on the hidden algorithms that trap people in poverty (MIT Technology Review)

Advances in the availability and breadth of data over the past few decades have enabled the rapid and unregulated deployment of statistical algorithms that aim to predict and thereby influence the course of human behavior. Most are designed to promote the corporate bottom line, not the welfare of the people. Those that aim to promote the common good run the danger of straying into authoritarian suppression of freedoms. Regardless of intention, these algorithms often reinforce existing social inequities or present a double-edged sword, with potential for positive use weighed against potential for misuse.

Coded Bias film poster

The films Coded Bias (now in virtual theaters) and The Social Dilemma (Netflix) probe these issues in detail through powerful documentary filmmaking and storytelling. Where The Social Dilemma focuses on the dangers of corporate and extremist manipulation through social media, Coded Bias reveals the biases inherit more broadly in “artificial intelligence” (AI) / machine learning (ML) systems. If you must choose just one, I would watch Coded Bias both for its incisive reveal of injustices large and small and its inspiring depiction of those working to bring these injustices to light.

Several well-regarded books explore these topics; indeed some of the authors are among those featured in these films. While I have yet to read the first three, they seem well-regarded and worth mentioning:

In Race After Technology (2019), Ruha Benjamin pulls the strands of algorithmic injustice together in a broader critique of technology’s impact on race, describing what she calls the New Jim Code: “The employment of new technologies that reflect and reproduce existing inequities but that are promoted and perceived as more objective or progressive than the discriminatory systems of a previous era.” (p10)

The New Jim Code thesis is a powerful critique of technology that simultaneously fails to see people of color (facial recognition, motion detection) and pins them in a spotlight of law enforcement surveillance and tracking. By explicit extension it is also a critique of the societies that tolerate, sponsor, and exploit such technologies, even while it acknowledges that many of the problems emerge from negligence rather than intention. From the outset, Benjamin gives us a useful prescription for working our way out of this mess, exhorting us to “move slower and empower people.” (p16).

After detailing many manifestations of technological inequality and injustice, she urges technologists like me to “optimize for justice and equity” as we “come to terms with the fact that all data are necessarily partial and potentially biased” (p126). The book concludes with further explorations on how justice might (and might not) be achieved while re-imagining technology.

Benjamin’s book was also my introduction to the Algorithmic Justice League, an advocacy organization that “combine(s) art and research to illuminate the implications and harms of AI.” The AJL is featured prominently in Coded Bias, and their website provides many resources for exploring this topic in more detail.

These works send a clear message that data and the algorithms that exploit them are causing and will continue to cause harm unless reined in and reconceptualized. This is the techno-bureaucratic side of #BlackLivesMatter, calling for #InclusiveTech and #ResponsibleAI. As with so many other issues of equity and justice, we all need to stand up, take notice, and act.

  1. Think twice before choosing to use AI/ML. Then get an outside review.
  2. Vet your data sets carefully and clearly describe their origin and use for posterity and review.
  3. Ask yourself: what are the potential impacts of the technology I am developing on historically oppressed groups?
  4. Cross-validate your assumptions, data, and results with a broad and representative audience.
  5. Keep listening. Keep learning.

Slack - choosing skin tone
Something positive: choosing an emoji skin tone

Last month my manager asked me about changing our naming convention for the primary “source of truth” in source code management: from “master” to… well, anything but “master.” I admit to initial hesitancy. I needed to think about it. After all, it seems like the name derives from the multimedia concept of a “master copy.” It’s not like the terribly-named “master-slave” software and hardware patterns. Or is it?


From 1996 to 2001 I spent near countless hours in two buildings on the University of Texas campus: RLM Hall and an un-named annex to the Engineering Science Building. Soon neither will exist: the one renamed, the other demolished. Reflecting on this I feel a small sense of empathy, but no sympathy, for others’ whose cherished institutions are being renamed. It was well past time to change the one’s name, and the other had outlived its usefulness.

This business of identifying that which needs to change, and then quickly acting on it, has gathered incredible momentum at last in 2020, as the people of the United States grapple with the double pandemic of a ruthless virus and endemic racism. Collectively we have barely moved the needle on either front: but there is movement.

Symbols must be re-evaluated and removed when they are found wanting, whether they are statues or names. Robert Lee Moore Hall honored a man who operated at the pinnacle of his profession and yet was, apparently, an outright segregationist. Not that any of us knew that. As an undergraduate and graduate student in Physics at UT, 52% of my classes where held in that building. I studied in its library and in the undergraduate physics lounge. I split time working part and full time between RLM and that unnamed building, in the High Energy Physics Lab. I remember more misery than joy there, but mostly extreme stress. There is no love lost for that frankly odious brick hulk or its even more odious name, yet there is a feeling of losing something personal with the change of name that was finally accepted by the University a month ago.

And that just goes to show the power of a name, of a symbol. All the more reason to change it. Time for an attitude adjustment.

The name has been found wanting and it must go. Just like that other little building, whose utility in housing twin three-story particle accelerators had long run out. It made way for a new building, better serving the needs of the students. And the Physics, Math, and Astronomy Building now takes its place on campus as, I hope, a more welcoming place for diverse groups of students, faculty, and staff to continue advancing the boundaries of science.


And that’s exactly what we need in software development: a welcoming place. Detaching from the name “RLM” was quite easy. But I had to think through the source code problem for a minute or two, rather than just rely strictly on GitHub’s judgment. My conclusion: if it bothers someone, then do something about it. And then I found one person who acknowledged: yes, it is disturbing being a Black programmer and confronting this loaded word on a regular basis (sadly I didn’t hang onto the URL and can’t find the blog post right now). OK, time to change.

I started with the code repository backing this blog. Took me all of… perhaps a minute to create a main branch from the old master, change the default branch to main, and delete master. If I had been working with a code base associated with a continuous integration environment it might have been a few more minutes, but even then it is so easy, as I have already found with the first few conversions at work. So much easier than having to print new business cards and letterhead for all the faculty in the Physics, Math, and Astronomy Building, assuming they still use such things.

A simple attitude adjustment is all it took: no sympathy for that which is lost, for the way we’ve always done things. Instead, a quick and painless removal of a useless reminder of a cruel past.


Steps taken to change this blog’s source code repository:

  1. Create the main branch
    Screenshot showing creation of main branch
  2. Switch the default from master to main
    Screenshot showing change of default branch
  3. Change the branch used by GitHub Pages
    Screenshot showing change to the GitHub Pages branch
  4. Finally, delete the old branch
    Screenshot showing deletion of old branch

This summer, one of the development teams at the Ed-Fi Alliance has been hard at work building Project Buzz: “When school shutdowns went into effect across the country as a result of COVID-19, much of the information teachers need to support students in the new online-school model had become fragmented across multiple surveys and the Student Information System.” (Fix-It-Fridays Delivers Project Buzz, A Mobile App to Help Teachers Prepare for Back-to-School).

As project architect, my role has been one of support for the development team, guiding technology toolkit choices and supporting downstream build and deployment operations. The team agreed to develop the applications in TypeScript on both the front- and back-ends. My next challenge: rapidly create TeamCity build configurations for all components using Kotlin code.

Components

At this time, there are four components to the software stack: database, API, GUI, and ETL. The project is available under the Apache License, version 2, on GitHub. The build configurations for these four are generally very similar, although there are some key differences. This gave me a great opportunity to explore the power of creating abstract base classes in TeamCity for sharing baseline settings among several templates and build configurations.

Requirements

  1. Minimize duplication
  2. Drive configurations through scripts that also operate at the command line, so that developers can easily execute the same steps as TeamCity.
  3. The above item implies use of script tasks. When those scripts emit an error message, that message should trigger the entire build to fail.
  4. All build configurations should check for sufficient disk space before running.
  5. All build configurations should use the same Swabra settings.
  6. All build configurations will need access to the VCS root, and the Kotlin files will be in the same repository as the rest of the source code.
  7. All projects will need build steps for pull requests and for the default branch.
    • Pull requests should run build and test activities
    • Default branch should run build, test, and package activities, and then trigger deployment.
  8. Both branch and pull request triggers should operate only when the given component is modified. For example, a pull request for the database project should not trigger the build configurations for the API, GUI, or ETL components.
  9. Pull requests should publish information back to GitHub so that the reviewer will know the status of the build operation.

Classes

Class diagram

BuildBaseClass

The most general settings are applied in class BuildBaseClass, covering requirements 3, 4, 5, 6, and the commonalities in the two branches of requirement 7.

Structure of BuildBaseClass

Note that only the required imports are present. The class is made abstract via the open class keywords in the signature.

package _self.templates

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.buildFeatures.freeDiskSpace
import jetbrains.buildServer.configs.kotlin.v2019_2.buildFeatures.swabra
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.powerShell

open class BuildBaseClass : Template({
    // contents are split up and discussed below
})

Requirement 3: Fail on Error Message

It took me a surprisingly long time to discover this. PowerShell build steps in TeamCity behave a little differently than one might expect. You can set them to format StdErr as an error message, and it is natural to assume an error message will cause the build to fail. Not true. This setting helps, but as will be seen below, is not actually sufficient.

open class BuildBaseClass : Template({
    // ...

    option("shouldFailBuildOnAnyErrorMessage", "true")

    // ...
})

Requirements 4 and 5: Free Disk Space and Swabra

Apply two build features: check for minimum available disk space, and use the Swabra build cleaner.

open class BuildBaseClass : Template({
    // ...

    features {
        freeDiskSpace {
            id = "jetbrains.agent.free.space"
            requiredSpace = "%build.feature.freeDiskSpace%"
            failBuild = true
        }
        // Default setting is to clean before next build
        swabra {
        }
    }

    // ...
})

Requirement 6: VCS Root

Use the special VCS root object, DslContext.settingsRoot. Checkout rules are applied via parameter so that each component’s build type will be able to specify a rule for checking out only that component’s directory, thus preventing triggering on updates to other components.

open class BuildBaseClass : Template({
    // ...

    vcs {
        root(DslContext.settingsRoot, "%vcs.checkout.rules%")
    }

    // ...
})

Requirement 7: Shared Build Steps

The database project, which deploys tables into a PostgreSQL database, does not have any tests. Therefore this base class contains only the following build steps, without a testing step:

  1. Install and Use Correct Version of Node.js
  2. Install Packages
  3. Build

That first step supports TeamCity agents that need to use different versions of Node.js for different projects, using nvm for Windows. The second executes yarn install and the third executes yarn build. Because the TeamCity build agents are on Windows, all steps are executed using PowerShell.

open class BuildBaseClass : Template({
    // ...

    steps {
        powerShell {
            name = "Install and Use Correct Version of Node.js"
            formatStderrAsError = true
            scriptMode = script {
                content = """
                    nvm install %node.version%
                    nvm use %node.version%
                    Start-Sleep -Seconds 1
                """.trimIndent()
            }
        }
        powerShell {
            name = "Install Packages"
            workingDir = "%project.directory%"
            formatStderrAsError = true
            scriptMode = script {
                content = """
                    yarn install
                """.trimIndent()
            }
        }
        powerShell {
            name = "Build"
            workingDir = "%project.directory%"
            formatStderrAsError = true
            scriptMode = script {
                content = """
                    yarn build
                """.trimIndent()
            }
        }
    }

    // ...
})

BuildOnlyPullRequestTemplate

Structure of BuildOnlyPullRequestTemplate

Once again, the structure below contains only the required imports for this class. Carefully note the brace style: in the abstract class, the class “contents” were all inside braces as an argument to the Template constructor. In this concrete class, the “contents” are inside an init method, which is in turn inside a code block outside the BuildBaseClass constructor. You can learn more about this in the Kotlin: Classes and Inheritance documentation.

This class inherits directly from BuildBaseClass and does not need to apply any additional build steps.

package _self.templates

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.buildFeatures.commitStatusPublisher
import jetbrains.buildServer.configs.kotlin.v2019_2.buildFeatures.PullRequests
import jetbrains.buildServer.configs.kotlin.v2019_2.buildFeatures.pullRequests
import jetbrains.buildServer.configs.kotlin.v2019_2.triggers.VcsTrigger

object BuildOnlyPullRequestTemplate : BuildBaseClass() {
    init {
        name = "Build Only Pull Request Node.js Template"
        id = RelativeId("BuildOnlyPullRequestTemplate")

        // Remainder of the contents are split up and discussed below
    }
}

Requirement 8: Pull Request Triggering

Here I am attempting to use the Pull Request build feature. I have had trouble getting it to work as advertised. This configuration needs further tweking, to ensure that only repository members’ pull requests automatically trigger a build (do not want random people submitting random code in a pull request, which might execute malicious statements on my TeamCity agent). I need to try changing that branch filter to +:pull/*.

object BuildOnlyPullRequestTemplate : BuildBaseClass() {
    init {

        // ...

        triggers {
            vcs {
                id ="vcsTrigger"
                quietPeriodMode = VcsTrigger.QuietPeriodMode.USE_CUSTOM
                quietPeriod = 120
                // This allows triggering on "anything" and then removes
                // triggering on the default branch and in feature branches,
                // thus leaving only the pull requests.
                branchFilter = """
                    +:*
                    -:<default>
                    -:refs/heads/*
                """.trimIndent()
            }
        }
        features {
            pullRequests {
                vcsRootExtId = "${DslContext.settingsRoot.id}"
                provider = github {
                    authType = token {
                        token = "%github.accessToken%"
                    }
                    filterTargetBranch = "+:<default>"
                    filterAuthorRole = PullRequests.GitHubRoleFilter.MEMBER_OR_COLLABORATOR
                }
            }
        }

        // ...

    }
}

Requirement 9: Publishing Build Status

This uses the Commit Status Publisher. Note that the authType is personalToken here, whereas it was just token above. I have no idea why this is different ¯\(ツ)/¯.

object BuildOnlyPullRequestTemplate : BuildBaseClass() {
    init {

        // ...

        features {
            commitStatusPublisher {
                publisher = github {
                    githubUrl = "https://api.github.com"
                    authType = personalToken {
                        token = "%github.accessToken%"
                    }
                }
            }
        }

        // ...

    }
}

PullRequestTemplate

Unlike the class described above, this one needs to run automated tests. Unfortunately, it demonstrates my (current) inability to avoid some degree of duplication. Perhaps in a future iteration I’ll rethink the inheritance tree and find a solution. For now, it duplicates features shown above, with the only difference being the base class: it inherits from BuildAndTestBaseClass, shown next, instead of BuildBaseClass.

BuildAndTestBaseClass

This simple class inherits from BuildBaseClass and adds two steps: run tests using the yarn test:ci command and run quality inspections using command yarn lint:ci.

package _self.templates

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.buildFeatures.freeDiskSpace
import jetbrains.buildServer.configs.kotlin.v2019_2.buildFeatures.swabra
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.powerShell

open class BuildAndTestBaseClass : BuildBaseClass() {
    init {
        steps {
            powerShell {
                name = "Test"
                workingDir = "%project.directory%"
                formatStderrAsError = true
                scriptMode = script {
                    content = """
                        yarn test:ci
                    """.trimIndent()
                }
            }
            powerShell {
                name = "Style Check"
                workingDir = "%project.directory%"
                formatStderrAsError = true
                scriptMode = script {
                    content = """
                        yarn lint:ci
                    """.trimIndent()
                }
            }
        }
    }
}

BuildAndTestTemplate

Based on BuildAndTestBaseClass, this class adds a build step for packaging, and artifact rule, and a trigger. Although these are TypeScript packages, the build process is using NuGet packaging in order to take advantage of other tools (NuGet package feed, Octopus Deploy). The packaging step is orchestrated with a PowerShell script. The configuration can be used for any branch, but it is only triggered by the default branch.

package _self.templates

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.buildFeatures.freeDiskSpace
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.powerShell
import jetbrains.buildServer.configs.kotlin.v2019_2.triggers.VcsTrigger
import jetbrains.buildServer.configs.kotlin.v2019_2.triggers.vcs

object BuildAndTestTemplate : BuildAndTestBaseClass() {
    init {
        name = "Build and Test Node.js Template"
        id = RelativeId("BuildAndTestTemplate")

        artifactRules = "+:%project.directory%/eng/*.nupkg"

        steps {
            // Additional packaging step to augment the template build
            powerShell {
                name = "Package"
                workingDir = "%project.directory%/eng"
                formatStderrAsError = true
                scriptMode = script {
                    content = """
                        .\build-package.ps1 -BuildCounter %build.counter%
                    """.trimIndent()
                }
            }
        }

        triggers {
            vcs {
                id ="vcsTrigger"
                quietPeriodMode = VcsTrigger.QuietPeriodMode.USE_CUSTOM
                quietPeriod = 120
                branchFilter = "+:<default>"
            }
        }
    }
}

Component-Specific Projects

Bringing this all together, each components is a stand-alone project and contains at least two build types: Branch and Pull Request. These respectively utilize the appropriate template. The parameters are defined on the sub-project, making the build types extremely small:

BranchAPIBuild

package api.buildTypes

import jetbrains.buildServer.configs.kotlin.v2019_2.*

object BranchAPIBuild : BuildType ({
    name = "Branch Build and Test"
    templates(_self.templates.BuildAndTestTemplate)

})

PullRequestAPIBuild

package api.buildTypes

import jetbrains.buildServer.configs.kotlin.v2019_2.*

object PullRequestAPIBuild : BuildType ({
    name = "Pull Request Build and Test"
    templates(_self.templates.PullRequestTemplate)
})

API Project

Of the parameters shown below, only project.directory and vcs.checkout.rules will be familiar from the text above. The Octopus parameters are used in an additional Octopus Deploy build configuration, which is not material to the current demonstration.

package api

import jetbrains.buildServer.configs.kotlin.v2019_2.*

object APIProject : Project({
    id("Buzz_API")
    name = "API"
    description = "Buzz API"

    buildType(api.buildTypes.PullRequestAPIBuild)
    buildType(api.buildTypes.BranchAPIBuild)
    buildType(api.buildTypes.DeployAPIBuild)

    params{
        param("project.directory", "./EdFi.Buzz.Api");
        param("octopus.release.version","<placeholder value>")
        param("octopus.release.project", "Buzz API")
        param("octopus.project.id", "Projects-111")
        param("vcs.checkout.rules","""
            +:.teamcity => .teamcity
            +:%project.directory% => %project.directory%
        """.trimIndent())
    }
})

Summary

TeamCity templates have been developed in Kotlin that greatly reduce code duplication and ensure that certain important features are used by all templates. Unfortunately they did not completely eliminate duplication. Through use of class inheritance, merged-branch and pull request build configurations are able to share common settings. However, parallel templates with some duplication were still required.

In the future, perhaps I’ll explore handling this through an alternative approach using feature wrappers instead of or in addition to templates. My initial impression of these wrapper functions is that they obscure a build type’s action: in the examples above, a Template class reveals its base class, signaling immediately that there is more to the Template. In the feature wrapper approach, one only finds the additional functionality when reading the project file. It will be interesting one day to see if the two approaches can be combined, moving the wrapper inside the template or base class, insead of being applied to it externally.

License

All code samples above are Copyright © 2020, Ed-Fi Alliance, LLC and contributors. These samples are re-used under the terms of the Apache License, Version 2.

Previous Articles on TeamCity and Kotlin

While the Ed-Fi Alliance has made investments to improve the installation processes for its tools, it is still a time–consuming task: easy to get wrong, you must have the right runtime libraries, and it is problematic to have multiple versions running on the same server.

What if end-users could quickly startup and switch between ODS/API versions, testing out vendor integrations and new APIs with little development cost and with no need to manage runtime dependencies? Docker containers can do that for us.

Continue reading on ed-fi.org

Potential Docker Architecture

Letter to the City Council of Austin, Texas, in appreciation for action taken this week in response to both the killings of George Floyd and others at the hands of police, and the heavy-handed tactics employed against peaceful protestors.

13 June 2020

Dear City Council Members,

Thank you for passage this week of measures to limit police use of force and begin re-prioritizing the city budget. I strongly support these actions as meaningful steps toward a future where the intrinsic oneness of humanity is fully reflected in our words, our ordinances, and all of our actions.

Systemic inequities require systemic, systematic, and continuous attention through careful study of patterns, consultation on remedies, thoughtful action, and humble evaluation. Without doubt, these steps move us forward on a path. Naturally, questions arise about what next steps may be taken. Further demilitarization of policing, adoption of national standards for use of force, and appropriate funding for social services that reduce the risk of police encounters and escalation should be in the conversation. And, lest one crisis drive us to forget another, continued review of police handling of domestic abuse and rape cases must remain a priority.

Sincerely, Stephen A. Fuqua

Should bugs and spikes receive story points to aid in sprint capacity planning? Some teams will estimate all work items by time during sprint planning in order to find the right commitment. Many teams hate this and/or spend an inordinate amount of time arguing about time. Those that abandon time may be tempted to put points on these unplanned, non-productive items, but there is a cost: the completed velocity will overstate the projected release timeline for the remainder of the release backlog.

Possible solution: track the ratio of story to non-story points and use that to pad out the release projection estimate.


Story point estimation has proven to be an effective tool for providing a general sense of time/complexity of effort on a discrete tasks. Over the course of a several sprint iterations, a team with a thoughtful product backlog and consistent work environment should begin to understand roughly how many points can be completed in upcoming sprints: the velocity. With sufficient statistics, a ScrumMaster can project out not only the average velocity per sprint, but also a confidence interval.

A team engaged in releasing software every few months might reasonably estimate the entire known backlog for the next release. But what about technical spikes and bug fixes? Spikes are timeboxed explorations that ask questions, the answers to which inform future story estimates and solutions. Bugs of course are corrections to previously-built behavior.

Many people choose not to put points on spikes and bugs because they are not stories — they are not directly providing productive value to the end-users. Others do put points on spikes and bugs for the purpose of sprint capacity planning. The two goals of planning a sprint’s activities and projecting a range of potential release dates are at odds. To illustrate the dilemma, let’s consider a team with the following data:

Sprint Points Completed
One 22
Two 19
Three 23
Four 25
Five 23
Six 24
Statistics μ = 23, σ = 2

The backlog for the next release has been estimated at 83 points. How many sprints is that?

  • 83.0/(23-2) = 3.95
  • 83.0/(23+2) = 3.32

Thus the team estimates that it needs four sprints to complete the release based on the current scope of the release backlog.

The team has received two bug reports from the user community and wishes to work on them in the next sprint, and they have identified a one day spike that is expected to resolve lingering doubt about one of the story estimates. The team prefers to push themselves rather than rest on a baseline, so during planning they decide to aim for twenty-four points worth of work. But how should they account for the bugs and the spike?

The team decides to put points on them; one of the bugs is simple looking and gets one point; the other and the spike are assigned two points each. Then the team picks out nineteen points of story work to round out the sprint commitment.

And now the release backlog has 88 points instead of 83, resulting in a range of:

  • 88.0/21.0 = 4.1905
  • 88.0/25.0 = 3.52

Probably done in four sprints, but possibly in five with the worst-case projection.

In solving their sprint capacity-planning, did they discover that the roadmap was potentially off by an entire sprint due to the two bugs and the design uncertainty? If so, it was good to discover that now. And would have been nice to predict even sooner.

On the other hand, what if these aren’t their first bugs or spikes? What if twenty percent of their completed points over the past six sprints were for bugs and spikes? Then the release projection — which only contained known upcoming user stories — was based on an inflated velocity. In that case, the five points added during this sprint will likely re-occur.

For the remainder of the time until release, scrutiny of the software may continue finding small bugs and discover usability rework, while the number of spikes will go down. Lacking any data to say otherwise, it might be reasonable to assume then that there will continue to be four or five points of non-story work in the remaining sprints as well.

Thus the 83 estimated story points account for only eighty percent of the remaining effort. And this can only be seen by separating the story points from the non-story points. Now the release projection looks more like:

  • (83.0*1.2)/21.0 = 4.7429
  • (83.0*1.2)/25.0 = 3.984

That is, it would be more accurate to estimate that the release will be ready in four or five sprints. And if the product owner really wants to hit four sprints, then they’ll need to cut around 25 points from the release to be on the safe side.


What about taking the opposite tack? That is, subtract the bug and spike points from the point total before calculating velocity. That is certainly a viable approach and perhaps worth experimenting on. What I like better about the first approach:

  1. If there are going to be spikes, bugs, and rework, then it is nice to have the extra data to apply to the future projections proactively. Sure, a reduced velocity would have also padded out the sprint length. With the first approach it would be easier to justify lowering the padding if their is less non-story work than “arbitrarily” raising the expected velocity.
  2. Better respects the teams’ desire to track their progress and throughput.

Motivation

I don’t like having a single large file for a TeamCity project, which is the default when exporting a project. It violates the Single Responsibility Principle (SRP). For maintenance, I would rather find each element of interest — whether a sub-project, template, build step, or vcs root — in its own small file, so that I don’t have to hunt inside a large file. And I would rather add new files than modify existing ones.

Is This a Good Idea?

This note about non-portable DSL explains the basic structure when you want to use multiple files. And yet I never noticed it while hunting in detail for help on this topic a week ago; only just stumbled on it while writing this blog piece. It seems to imply that using multiple files is “non-portable,” but apparently I have been using the portable DSL: “The portable format does not require specifying the uuid”, which I’ve not been doing.

There is a small risk that I could do something drastic and lose my build history without a uuid. Since I also have server backups, I’m not too worried. And in all of my experiments I’ve not been able to find any problems with this approach so far.

Starting Point

The official help page has the following sample settings.kts file:

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.script

version = "2020.1"

project {
  buildType {
    id("HelloWorld")
    name = "Hello world"
    steps {
        script {
            scriptContent = "echo 'Hello world!'"
        }
    }
  }
}

File Structure

An approach to splitting this could result in the following structure:

.teamcity directory
|– _self
   |– buildTypes
      |– EchoHelloWorld.kt
   |– Project.kt
|– pom.xml
|– settings.kts

Some conventions to note here:

  • Per the Kotlin Coding Conventions, the directory names correspond to packages, the packages are named with camelCase rather than PascalCase, but the file / class name is in PascalCase.
  • Whereas the single file has the Kotlin script extension .kts, the individual files have plain .kt, except for settings.kts.
  • Root-level project files are in the _self directory. The TeamCity help pages mention this as _Self, but I prefer _self as it reinforces the Kotlin coding convention.
  • When converting from a single portable script to multiple scripts, be sure to set the package name correctly at the top of each file. Otherwise you will likely trip yourself up with compilation errors, unless you explicitly reference the package name in an import.

The individual files are shown below, not including pom.xml; there is no reason to modify it. Note the package imports section, containing both local packages and the jetbrains packages.

EchoHelloWorld.kt

package _self.buildTypes

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.buildSteps.script

object EchoHelloWorld : BuildType ({
    id("HelloWorld")
    name = "Hello world"

    steps {
        script {
            scriptContent = "echo 'Hello world!'"
        }
    }
})

Project.kt

I could have named this HelloWorldProject.kt, but Project.kt is short, simple, and unambiguous in the root of the Self directory.

package _self

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.Project

object HelloWorldProject : Project({
    buildType(_self.buildTypes.EchoHelloWorld)
})

settings.kts

import jetbrains.buildServer.configs.kotlin.v2019_2.*

version = "2020.1"
project(_self.HelloWorldProject)

Enriching with a VCS Root

To further demonstrate, let’s add a new file defining a Git VCS root.

.teamcity directory
|– _self
   |– buildTypes
      |– EchoHelloWorld.kt
   |– vcsRoots
      |– HelloWorldRepo.kt
   |– Project.kt
|– pom.xml
|– settings.kts

HelloWorldRepo.kt

See the previous post’s Managing Secure Data section for important information on the accessToken variable. Note that the GitHub organization name is specified as a variable — allowing a developer to test in a fork (substitute user’s username for organization) before submitting a pull request.

package installer.vcsRoots

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.vcs.GitVcsRoot

object HelloWorldRepo : GitVcsRoot({
    name = "Hello-World"
    url = "https://github.com/%github.organization%/Hello-World.git"
    branch = "%git.branch.default%"
    userNameStyle = GitVcsRoot.UserNameStyle.NAME
    checkoutSubmodules = GitVcsRoot.CheckoutSubmodules.IGNORE
    serverSideAutoCRLF = true
    useMirrors = false
    authMethod = password {
        userName = "%github.username%"
        password = "%github.accessToken%"
    }
})

Next Steps

Hoping to cover in a future post…

  • Templates are just specialized BuildTypes.
  • Build Features
  • Generate XML for further validation

Infrastructure-as-code (IaC) is the principle of configuring systems through code instead of mouse clicks (cf Packer Tips and Lessons Learned for another example). TeamCity, the popular continuous-integration (CI) server from JetBrains, enables IaC through writing scripts to interact with its REST API, or by storing projects settings in version control. This article will share some lessons learned in using the Kotlin DSL for project settings. These will include:

  1. What is Kotlin?
  2. Benefits of using Kotlin
  3. Learning Kotlin from TeamCity
  4. Debugging before committing
  5. Managing secure data
  6. Connecting to forks

What is Kotlin?

Kotlin is a language developed by JetBrains, maker of TeamCity. Originally developed for the JVM, it is statically typed and compiled. JetBrains created a Domain-Specific Language (DSL) for describing TeamCity builds: the TeamCity Kotlin DSL. With this, all of the elements of a project - build steps, features, parameters, VCS roots, etc. - are all defined in a relatively easy to learn language, stored in a source control system (e.g. Git), and shareable across multiple installations.

Benefits of Using Kotlin

Some years ago, I had an architect that (quite rightly!) wanted the development teams to treat TeamCity like it is code. The only problem is that we were still clicking around in the user interface. Want to make a change to a build configuration? Copy it, increment a version number, modify, have a reviewer look at the audit history and confirm the output. This actually works reasonably well, but involves a lot of mouse clicking for the reviewer and programmer alike. And it is not transportable.

Build configurations in Kotlin can follow the standard software development life cycle:

  1. Develop in a text editor / IDE.
  2. Compile and debug locally.
  3. Test locally or on a test server.
  4. Share a code branch for review by another programmer (e.g. through a GitHub pull request).
  5. Deploy approved code to the production TeamCity server.

Each of these steps contains benefits in themselves. Add them together and you have a powerful system for efficient management of TeamCity configurations. No longer is it “treating TeamCity like code” - it is code.

Learning Kotlin from TeamCity

The references at the bottom of this article can do much to help with Kotlin.

View Snippets in the UI

Many of the operations you can perform in the TeamCity web site (“the UI”) will let you view the Kotlin version of your change before committing that change. This is a great way to begin learning how to work with the Kotlin DSL, especially things like build features. The API documentation is of course correct, but hard to translate into reality without seeing these working examples.

Export an Entire Project

Likewise, you can start your first projects in the UI and learn from them, instead of having to figure everything out from scratch. Take a project - preferably a relatively simple one without dependencies in other projects - and export it to Kotlin. Now you have a detailed example to study.

Internal Setting for Creating Smaller Files

If the project is large, you may want to split it into multiple files. Learning how to do this from documentation or blog posts is no easy thing. Thankfully [someone asked on TeamCity[(https://stackoverflow.com/questions/57763826/how-do-i-split-up-the-settings-kts-file-for-teamcitys-kotlin-configuration)], and someone answered. The answer isn’t entirely instructive, hence the section below. In particular, to learn how to split up projects, see the answering author’s comment about setting the “internal settings” property teamcity.configsDsl.singleSettingsKts.maxNumberOfEntities to something less than 20 in TeamCity.

Debugging Before Committing

In the Text Editor / IDE

I’ve been doing all of my work in Visual Studio Code using the Kotlin extension. This extension gives you real-time analysis of basic syntax and semantics, which goes a long way to detecting errors before trying to load your Kotlin scripts into the UI. Other IDEs with built-in or extended support for Kotlin include IntelliJ IDE, Android Studio, and Eclipse. I have not experimented with the others, and so I cannot remark on comparable functionality (although I expect IntelliJ at the least would have excellent support for the language, since it too is made by TeamCity).

Compiling with Maven

One problem with VS Code debugging is that it is not always obvious why something is flagged as an error, and it does not catch all compilation errors. For this, the Maven build tool is quite handy. If you’re not a Java developer you might not be familiar with maven. Thanks to a few random encounters with Maven over the years, I recognized the pom.xml file that was included when I exported a project. This file is similar to package.json or a csproj file. To compile it, install Maven* and then run mvn compile in the directory containin the pom file. Read the debug output carefully and you’ll be on your way to fixing up most problems before you ever got to the UI.

Windows user like me? choco install Maven.

Testing

Now that you know it compiles, it would be nice to test out your project / modifications before updating your production server. TeamCity has made their free Professional Server quite powerful. This is not a crippled demo. You can install this on your localhost or a test/QA server. Push your DSL scripts to a branch or a fork (not the ones used by your production server), sync your test instance of TeamCity, and test that it really does what you think it does. Now create that pull request.

Managing Secure Data

TeamCity has features for managing tokens that secure private data (e.g. api key, password, etc) in your Kotlin scripts. Personally, I prefer the other recommended approach mentioned in the above article:

“Alternatively, you can add a password parameter with the secure value and use a reference to the parameter in the nested projects.”

Since you want these values to be stored outside of source control, the twin parameters can be setup at a higher level (perhaps in the root project). Each installation of TeamCity will need to re-establish these twin parameters manually. This is a good thing: you can have different credentials for a QA-instance of TeamCity - which may be pointing to different source control forks and to different deployment settings, for example - and production.

Example:

  • github.accessToken.secured = {your real access token}
  • github.accessToken = %github.accessToken.secured%

All subsequent references would use the shorter of the two. For example, you may have a Git VCS root that needs to be access with secure credentials. If using GitHub, you can use your access token instead of your password when connecting to the API. In your TeamCity Kotlin file, setup the VCS root like this:

    authMethod = password {
        userName = "%github.username%"
        password = "%github.accessToken%"
    }

The github.username would thus also be stored one level above the source-controlled project, so that it too is not stored in source control.

Connecting to Forks

In GitHub terminology, a “fork” is just a Git clone that is stored under another organization/user’s account. As described above, with Kotlin files stored in version control you can create a robust lifecycle that includes testing a configuration before pushing it to your production instance. One simple way to manage this is with a personal fork. The following VCS root example uses the access token approach and combines it with a GitHub organization or username that is also stored at higher level in the project hierarchy, along with the username and access token. The branch and branchSpec parameters would be set in project, template, or buildType files.

package _Self.vcsRoots

import jetbrains.buildServer.configs.kotlin.v2019_2.*
import jetbrains.buildServer.configs.kotlin.v2019_2.vcs.GitVcsRoot

object FlightNodeApi : GitVcsRoot({
    name = "FlightNode.Api"
    url = "https://github.com/%github.organization%/FlightNode.Api.git"
    branch = "%git.branch.default%"
    branchSpec = "%git.branch.specification%"
    userNameStyle = GitVcsRoot.UserNameStyle.FULL
    checkoutSubmodules = GitVcsRoot.CheckoutSubmodules.IGNORE
    serverSideAutoCRLF = true
    useMirrors = false
    authMethod = password {
        userName = "%github.username%"
        password = "%github.accessToken%"
    }
})

References