Exploring the Human Side of Data — A Creator Interview with Jer Thorp

Key Links #

Some of Jer’s Glitch Projects #

LOC Names (discussed in the interview)
Wordplay
Timeline of Categories
Library of Names

Transcript #

Jenn Schiffer: I’m Jenn Schiffer. I’m the Community Engineer for Glitch.com at Fog Creek. Glitch is the community where you build the app of your dreams, and today, we are going to be talking to one of our community users, Jer Thorp. Hey Jer.

Jer Thorp: Hey.

Jenn Schiffer: So you are Innovator in Residence at the Library of Congress.

Jer Thorp: Yes.

Jenn Schiffer: What else do you do?

Jer Thorp: Well, I like have a lot of rotating hats. I mean … they’re not actually rotating hats though, that would be good if they were, but I teach at a programme called ITP, which people may be familiar with at NYU. It’s a new media arts masters programme.

I am a National Geographic Fellow, so that involves all kinds of things. Sometimes with vigorous animals in vigorous places and I also have an art practise that I’ve had for a long time, which involves making art; the things about the boundaries between data and culture.

Jenn Schiffer: So Innovator-in-Residence is a very interesting title, especially with an organisation like the Library of Congress. What are you doing there?

Jer Thorp: That’s a great question and I’m not actually sure yet. So I guess the residency is a research residency, so it is literally the point of the residency is kind of trying to figure out what I’m gonna be doing there, but that time is coming pretty close. My residency ends in March, so now I’m really trying to think about what the end result of the project will be.

But in general, the library asked me to come and work there and kind of think about what is the changing role of the library in the sort of face of our new digital culture. The question that I’m most interested in is what was the data that the library holds? How can that information be useful, not only to like rarefied digital humanities scholars, but also to artists and to the general public, and so can we activate that huge, huge amount of data that sits underneath the library, can we make that a public resource?

Jenn Schiffer: Well so it seems like the Library of Congress has a lot of data available-

Jer Thorp: Yep.

Jenn Schiffer: You’ve built some Glitch apps using it, what data is there?

Jer Thorp: Well there’s a lot. So the library holds about 165 million items. Not all those items are catalogue usually because they’re kind of strange things, so the largest amount of objects in the library is held by the manuscript’s department. So a manuscript … if you or I died, we could donate our kind of papers to the library and that might include things like journals, but it could also include like the contents of your pocket, and just like weird things. So the manuscript’s division is kind of filled with that.

Then we have books and maps, and so on and so on, which are relatively well catalogued. About 10% of those are digitised. So if we think about the library’s data holdings, we have all the information about the catalogue, so you or I or a member of the public could download … there’s 25 million records, and then there’s all that digital information, so there’s lot of digitised photographs, tonnes of digitised maps. There’s lot of digitised sound and video, so there’s all kinds of things that are available, to pretty much anybody, more or less than the website.

Jenn Schiffer: The large part of content that I personally consume is digital. Is the Library of Congress adapting to archiving digital content?

Jer Thorp: Sure. Yeah. I mean I’ll put a big caveat on all this, that is like I am by no means an expert at all these things. I’ve had a chance to kind of purposely spread it as widely as I can across the library, so I know a little bit about a lot of little things in the library.

The library has an extensive web archive project, which archives kind of a curated set of websites. This works very similarly to how the library might collect books, so somebody who’s the curator of Serbian studies might recommend that certain books, which are important to that particular field be accepted into the bunch of the data collected. That process is called accession. So with web archive, it’s websites, so you know similar to the library’s main function, first of all, it archives all of the Congress peoples’ websites, but then it has been archiving for quite a long time now; websites that I believe have cultural value.

A good example of that is a fairly new project, to archive web comics, so there’s … I forget the number, somewhere around 50 web comics that are now being archived by the Library of Congress, and that’s kind of a pilot project to understand how this type of material could be collected. Very famously, Twitter, the Twitter archives has been collected by the Library of Congress up to about now, when they’re going to stop collecting it as a whole. So there is this sort of thinking about how do you archive something like that? But there’s also a little bit of tension between what the library does and maybe what you know a company like Google does, when they’re trying to archive the entirety of the web.

That was never the idea of the library though. I’m pretty certain the idea was never to just archive everything. The library does not collect every book. It collects books that are supposed to be important to researchers and to culture in general.

Jenn Schiffer: I noticed that you’ve made a couple of projects using Library of Congress data on Glitch. How did you discover Glitch?

Jer Thorp: That’s certainly a good question. I don’t really know. I mean I follow Anil on Twitter, so that could’ve been where it came from that I also have a real interest in coding platforms that are more accessible to particularly to other non-coders or beginner coders. I’ve been involved in the Processing community for a really long time, and the Processing tool is a tool that I really love and it’s certainly a liberation and p5.js is something I’ve really been a supporter of. So I checked it out when it first came out and I was really intrigued by it and then when I started at the library, and I mentioned this before, I wanted to be as public as I could with the work that I was doing, so I thought that Glitch offered a really nice platform to A: share the work that I was doing, but to, B: make it remixable.

I was always going to release the thing and do again have a repository, but that’s only useful to people who know how to use GitHub, who know how to-

Jenn Schiffer: Deploy stuff?

Jer Thorp: Yeah, to know how to deploy node or whatever that he says, so this was a really nice middle ground, and so now, I do all three, so it’s like I’m sharing the thing, I have a Glitch page for the projects, and then I have a GitHub repository work, so that’s sort of anybody along the way.

The work is that I’ve published thus far is like purposely really sketchy in all kinds of ways and I’ve been trying to think less about like try not to be precious about it, and I think that’s also a nice thing that Glitch does, being able to remix things and branch them off and try something that doesn’t work or does work.

Jenn Schiffer: I think the interesting thing about not having to worry about devops stuff is being okay with like a ephemerality and like being okay with putting something out there that you might take down because you realise it doesn’t work exactly the way you want it to, but also I guess, from the standpoint of somebody working as an Innovator-in-Residence. Its like you want more people to know what is coming out of the Library of Congress.

Jer Thorp: Yeah.

Jenn Schiffer: So what about the apps that you’ve made is one that uses like, I think it’s called Library of Names, is a project. Do you want to talk about what that app does or what you’re-

Jer Thorp: Yeah sure. So names have always been something in data that I’ve been attracted to. Partly because they read as really human things, so it’s hard for us to read a name without imagining who that person is. We kind of … our brains just do it. If I read you another piece, so I’m looking at these 25 million records, which are largely about the books that the library holds, so I’m looking at a record about a book. I can read you the title of the book and it might do something for you, like you might be kind of intrigued by it. I could tell you the year it was published. I could tell you the format, like how many pages it is, things like who the publisher was. All those things are interesting, but they don’t do that cognitive spark that a name does. As soon as I say a name, you’re like, “Oh, I can sort of imagine who that person is.”

So I went through those records and extracted all the names out of them and looked a little bit about how names change over time, so the library holds things that go back you know into the 16th century, and further back than that, and then they have things that were published last year. And so you can sort of see that the first project was if I were to randomly pick a group of authors from the 1800s, what would their names be? And I wrote if I were to randomly pick an author from now, what would their names be? And what might speaking those names or reading those names teach us about what type of people were writing books back then? That’s the first thing, but also, what was the library collecting?

Because of course, the library is making political decisions about what type of material they are collecting, and those decisions are being made by curators who have very specific academic backgrounds and so you see some of that reflected in the data.

It’s a small-scale project that it was never meant to be kind of a deep analysis of name frequency, but more of an evocative thing.

Jenn Schiffer: Well so, I have your Library of Congress MARC names project that we talked about up here. I’m gonna remix it so I can edit it. So what’s going on here?

Jer Thorp: Yes, so this is the list of names that we talked about before, so there’s a little bit of text to sort of walk you through. The imaginary idea of thinking about what authors in this case from 1775 might look like and then we see 20 of them listed on the screen. So what’s happening in the background is that it’s going through all of the names and randomly picking a set of them, so this is like a random sampling.

The sketch is kind of built in, in a few parts. There’s the HTML page, which shows the actual thing. There’s a p5.js sketch, which does some of the logic about fading in the names and showing you the particular names, and then there’s the data itself. If you look in the assets, you’ll see there’s a couple of files here. There’s the names file and that’s the main one, where you can see all the first names.

Then there’s the totals file, which actually gives you a year-by-year total of all the names, so that you can normalise against it. It’s not really important for this particular example, but if you’re thinking about ways to use this data, that could be cool too.

Then they also have a last names one in here. The last names one, I actually never tried, but I’m sure it works, but if someone’s looking for a fun thing to do and switch this thing to the last names, it should be pretty easy to do.

Let’s go to the sketch.js and I’ll show you something really simple. So I told you before that there are 20 names being shown, but if you want to see many more, we can change this first number at the top, the display num from 20 to 50, and then once that updates, we should be able to go and see, yep, there’s 50 names. Switch to maybe 1990 or something around there, and so this is the same list from 1990, and we can really see the much broader diversity of names that come at 1990, but still, a lot of names repeating themselves; Peter, and Carlos, and Victor, and-

Jenn Schiffer: A fourth Mary.

Jer Thorp: A fourth Mary.

Jenn Schiffer: Oh, it’s so interesting. Then like names from people who have Asian names in there.

Jer Thorp: Yeah, and you’ll see that more and more as you go further up with that.

Jenn Schiffer: Still a lot of Davids.

Jer Thorp: Yeah, yeah, Undefined David.

Jenn Schiffer: Awesome. So you’re using p5 for this, which I see. Let’s take a look at package.json, you have a dependency of p5.

Jer Thorp: Yep.

Jenn Schiffer: You’ve worked a lot with p5.

Jer Thorp: Yeah, it’s great and you know I think p5, the thing that I like the most about it is it’s actually not even the thing itself, it’s the community they’ve built around it. They have this really amazing community, which is you know I think Lauren McCarthy, who has run a lot of the development on p5, has really described it as a community that’s kind of built on people being nice to each other first, and then code afterwards, so it’s a nice sort of change, if people have maybe tried in the past of getting into an open-source community and then turned off by it. This is a really good place to get into it.

Jenn Schiffer: So it’s sort of a small set of data that matches names, but like it can tell a huge story about history and those people choices, which is really cool.

Jer Thorp: Yeah, this is sort of, was inspired really in a big way by a project that I and two collaborators, Mark Hansen and Ben Rubin, did at the Museum of Modern Art, five years ago, I guess. We did a residency there and as part of that residency, we did a performance with a theatre group called The Elevator Repair Service. One part of our performance was we actually a weeding of the most popular names for artists in the collection, starting with the most popular and then going all the way down. The first 29 of them are Ben, and then you get Mary, and then … And we sort of staged that as a performance.

One of the things that I realised then, which was I was really carrying into this piece is what I’m trying to encourage the user to do, is to read those names. You know maybe not read them out loud, but at least hear the voice in your head. It does transform it into a crowd and you can see faces, and so, that’s what I wanted to get people to think about. It’s a good critical skill too. It’s like if you look at a list of authors on a paper or speakers on a panel, like to be able to sort of close your eyes and imagine what those people look like and see what gives you a chance to be critical about what they may be talking about or where their biases lie, so on and so on.

Jenn Schiffer: What API was used for that? Is there supposed to be like an end-point?

Jer Thorp: Yeah, so that’s not an API. The library, two years ago, a year and a half ago, I don’t know exactly, they’ve done a public release of what are called their MARC records. MARC is one of the acronyms that’s not really exact, it’s Machine Readable Cataloguing Records, so MARC records. MARC records are a really interesting story actually.

They were developed by one of the first computer programmers, her name was Henriette Avram. She went to the library, was tasked with solving this new problem that they knew they wanted to do, which was put the records of the library in the computer. Before then, they were in card catalogues. She would still visit the card catalogues in the library. They’re huge, like millions and millions of cards. So she developed this system, MARC, this format MARC, which allowed for a mechanism in a series of algorithms to digitise all their records. I think I’ll get the year wrong … I think it was 1959, still being used today, MARC version 21.

So the MARC records are available for public download.

Jenn Schiffer: What’s the file format?

Jer Thorp: Well, the native file format is this original MARC format and so it’s kind of, more or less, weird, free-text code, but they also publish it in XML, so the XML is what I’m parsing in the background of those Glitch experiments, I’m doing the heavy lifting, also using a series of node apps to parse those into data sets that are sort of digestible. So if you go to any of the Glitch experiments that I’ve done in the Assets folder, you’ll find those processed files.If you want to see how they were processed, you go to my GitHub repository, that’s github.com/blprnt/loc and you can find all of the code to do the heavy lifting, to process those 25 million records, you know to do forwarding some analysis on them; it could take hours, so not a great thing for like Glitch, just hang out and wait for a few hours and then we’ll come up with some data.

Jenn Schiffer: Does the Library of Congress have plans to maybe create infrastructure where maybe they do that processing, so that developers can access an API at some point?

Jer Thorp: Yeah, so, there is sort of API-like access to the library search, which is you know is relatively similar, but when you want to process that many records, an API is never going to be the answer. You know, no one is going to make 25 million API hits, unless you wanna drive where you’re hitting their API like crazy. So zip downloads are always going to be the solution for that, and I mean that’s fine. I think sometimes us programmers … I’m guilty of this, you know, “I’ll build an API,” when it’s like, “No, I’ll just publish zip files,” and somebody can download those zip files.

But there’s a lot of thinking about how the library can be more open to certain digital humanities work through things like APIs. There’s a really good API that you can use, for example, historical newspaper records. They have a massive, massive collection of historical newspaper records that’s available. The project name is slipping my mind and that has an API that you can use, or well again, like I said, there’s kind of an API-like way to access the search.

If you wanna find out all of this stuff, there’s a group that I worked with, Library of Congress Labs. If you go to labs.loc.gov, they have a really cool site called LC for Robots, and it’ll give you lots of access to this type of information.

Jenn Schiffer: LOC now is Library of Congress, not Lines of Code.

**Jer Thorp: **That’s right. LOC for LOC.

**Jenn Schiffer: **I think a lot about people learning more about artists and coders out in the world, and you’re working at the Library of Congress, and you started this podcast, it’s called Artist in the Archive.

Jer Thorp: That’s right.

**Jenn Schiffer: **What do you go over in the podcast?

Jer Thorp: So the podcast, it’s kind of about two things, and one is the people that I’m meeting in the library and the other is the work that I’m doing to try to understand what I’m gonna do, so one episode involves a long interview with somebody from the library. The first two episodes we interviewed Kate Zwaard, who is the chief of the national digital initiatives there, so she does a lot of these digital projects that we were talking about before.

Then in the second episode, I interviewed John Hessler, who is a curator in the maps department and works on Meso-American maps, but he’s also a mathematician and topologist, and there’s lots of interesting conversations.

Then I’ve been asking librarians and people who work at the library what are their favourite objects, and I do short interviews with people about those objects. So we’ve had a census from I think the late 1700s; although, I may be wrong, from Connecticut, which is an early census of a small community. It’s kind of things like how many sheep they owned and if they knew were to own a clock, and so it’s really interesting data-centric look into life in early America, but then we saw a daily journal from a guy named Izzy Young, who ran a club and bookstore in Greenwich Village in the late 1950s, early 1960s, so it’s his notes about Bob Dylan, and Joni Mitchell, and Leonard Cohen, and so it’s kind of little insights.

It’s a way for me to balance the work that I’m doing, which really tends to focus on the large data dumps to these other little pieces of texture that are in the database.

Jenn Schiffer: So your residency, you said, is coming to a close soon. What’s next for you?

Jer Thorp: The plan?

Jenn Schiffer: Besides more Glitch projects, I hope.

Jer Thorp: Yeah, I know, but I’m going to try to get something magical done at the library, and that’s my main focus. And then, the big thing that I’m thinking about right now is, along with all that other stuff, is a conference I helped organise in Minneapolis called The Eyeo Festival, and Eyeo, we’re releasing tickets on February 1st, and it happens in June, and that’s a lot of fun. I’m just finishing up putting together the speaker list for that, so if you are somebody who’s interested in creative coding, it’s like the best place in the world to come.

**Jenn Schiffer: **So you’ve got a lot going on, where can people find your work online?

Jer Thorp: Oh my. I have a really dusty website that you’re not allowed to go to. Yeah, I think the best thing is probably to follow me on Twitter, blprnt, and I post most of the stuff there. I ran a studio for a number of years that we did a lot of really cool work and you can find that work at ocr.nyc, and that encompasses most of the last six years of works that I’ve done, but I promise I am working on a new website, I’ve heard the kids say you have to have one these days.

Jenn Schiffer: Yep, got to have one, a web site.

Also thank you so much. It’s been great talking to you and I’m looking forward to future work, again hopefully, a lot of them on Glitch.

Jer Thorp: Yeah, I’ve got a few things I’m working on so …

Jenn Schiffer: Awesome, thanks.