Week 8 – Bouman and Cahill Collections

This week I finished up the spreadsheet for the Bouman Collection and started in on scanning the Cahill Collection.

This collection is a small packet of photos and one letter by Edward Cahill, a WWI soldier. Most of the photos are of Cahill and his wife post-war, but there are three photos of a young Franklin D. Roosevelt with soldiers at the beginning of the war. I’m not sure if Cahill is in these photos or not, it’s hard to tell. The letter was written when Cahill was at Walter Reed hospital, and when I read it I learned that he was one of the first American soldiers wounded in the war. The letter is to his sister, describing his experiences in France.

Week 7 – Ochs and Bouman Collections

This week I wrapped up the Ochs Collection spreadsheet, with help from Mark, who wanted to help me get ahead so I’d get some experience in other areas.

I made good progress in one of those other areas through working on the Bouman Collection, one of my first digitization projects from my volunteer days at WWPL. Although I scanned all the letters from the collection to multipage PDFs a few years ago so the donor could have a digital copy, they had not yet been uploaded to the Omeka site. I asked Mark this week if I could contribute to digitizing this collection, because I have the knowledge of the collection that would help get the work done quickly, and because the bulk of the collection will provide great material for the library to use during the upcoming centennial of the 1919 Paris Peace Conference.

Jon Anthony Bouman was a British AP correspondent who was working in Europe during WWI. The bulk of the collection is letters from Bowman to his wife and children from Paris during the 1919 peace conference and from Germany during the 1920s. While the subject is a little off course from my project topic, the letters offer great insights into both the peace conference and the cultural atmosphere of postwar Europe.

In addition, I got to experience a stage of the digitization process that I had not worked with before. Mark showed me how to upload the PDFs to a host site that provides Omeka with URLs. I completed this stage of the process and also got through about half of the .csv spreadsheet of metadata for the collection.

Week nine: 03/19-03/23

I ended up having to work from home this week because of some pretty heavy snows, which ironically hit us right after the first official day of spring. I continued working on my finding aid for the Race and Segregation collection, formatting the citations and information on each piece in the collection. However, I’ll have to put that one on hold until I can get back to the collection and continue cataloguing it.

I spent the rest of the time working on transcribing documents as usual. I transcribed one cablegram laying out the terms of the naval armistice during the first World War. This document was poorly scanned (or photographed? I can’t tell actually) so it was really hard to make out in places. Many of the black, block printed words had bled together to form solid black lines, and one side of the document was shrouded in darkness. But considering the poor quality of the image, I think I still managed to make out a good deal of it. It doesn’t seem like I’ll be needing glasses anytime soon!

 

Week eight: 03/12-03/16

After a very restful spring break, I’m back at the library. We finally decided to retire the cloud drop and go back to transcribing things by hand. But now I’m starting a new project which is a bit more interesting.

Now I’m working on a Library Guide for the Race and Segregation collection. This is essentially a catalogue of the items in the collection, with a section at the beginning to help historicize and contextualize the collection.

I enjoy getting to actually do a bit of research for the first section. I’ve always liked looking for things and trying to piece together bits of information. I like the challenge and the satisfaction of actually learning something of value, making arguments, and finding the evidence. It’s too bad my biographical notes section can only be a couple of paragraphs.

The rest of the library guide isn’t all that exciting. Cataloging is slow work; important, but very slow. I’m now going back through the excel sheets I made earlier in the semester to get the information I need for the catalogue.

I’ve been listening to podcasts to help keep the mind fog at bay, which creeps up easier than you think. I’ve come across some very humorous ones that have left me giggling at my desk. I’m sure the others think I’m crazy, just sitting at my desk going over a collection about race and segregation giggling. Little do they know that actually I’m listening to a story about a woman’s pet bird that is hated by the rest of her family, which features a recording of it screeching. Who knew that Macaws could sound like the velociraptors from Jurassic Park? The idea of this little bird creating such a chilling sound is extremely amusing to me.

 

Week seven: 02/26-03/02

I kept up with the transcribing this week. The documents are coming along, but the cloud drop isn’t proving to be as helpful as we had hoped. One problem with it is that it won’t accept PDF files, only Jpegs. So that means I have to take every PDF and convert it into a Jpeg using Photoshop. While this is an easy enough conversion, it’s still really time consuming and not all that efficient.

Photoshop can only convert one page of a document at a time, and you have to do some extra steps to keep all the pages together. I managed to get all the pages together by creating a custom panorama, where you can stitch together panels in any order you like. However, I encountered another snag because the cloud drop can only handle about 25KB images, and panoramas are about five times that.

So now I’m back to transcribing one page at a time. For a seventeen page document, this gets tedious very fast. The computer still doesn’t do that great of a job at transcribing, and if the image is bigger than 25 KB it will only transcribe about half the page. So I’m still spending copious amounts of time editing and transcribing myself.

I had high hopes for the cloud drop, and for the smaller documents it did do pretty well. But I think I’ll stick with transcribing things myself, just so I can save some time.

Week six: 02/19-02/23

This week I began working on transcribing documents using Google cloud drop. This Google app has text recognition software that’s supposed to be more advanced than Adobe Acrobat, and will hopefully make our transcriptions go by faster. While it can’t recognize anything that’s handwritten (at least in cursive), it does very well at recognizing the text of several cablegrams and typewritten documents we have.

It may seem pointless to be transcribing documents that are written in print. The majority are clearly legible, and often rather short, so why transcribe them? While we may be able to read them with ease, the vast majority of computers and search engines can’t because the text is not in a format they can understand.

Let’s say you’re trying to find all documents that contain the phrase “safe for democracy”. If you type this in the search engine you will get varied results, and probably not many documents containing that phrase, unless they’ve been transcribed. This is because computers use the language of 1s and 0s. Everything you see on a computer screen is actually some combination of those two numbers, which tell the computer what to put up on your screen. Every letter in this post is also a unique combination that allows the computer to present you with a language you understand.

But most computers can’t understand words written on a scanned page, because that image does not have the underlying code that translates it for the computer. This is where transcription comes in.  Transcribing is as much for the computer’s readability as yours, and by typing out the contents of a letter into a computer, the letter then becomes text searchable.

So if computers can’t read words the same way we do, then how can Google cloud drop do it? The answer is through API, or Application Programming Interface. This is a learning software that helps develop apps, and can teach computers how to recognize things not written in computer code. A programmer creates a code that explicitly tells the computer what certain letters look like, how to recognize letters that are grouped together to form words, and even some specific words. The computer is then put to work.

However, the computer only has the basics and can’t always recognize all words. Sometimes it will still mistake certain letters for others, especially if they’re smudged or slightly too close together, causing some pretty wacky transcriptions. For example, if I see the typo “Ihope to see you againsoon”, I know that “I” and “hope”, and “again” and “soon” are meant to be seperated, but a computer will think it’s one word, or blend a few letters together like this: “agaiÑoon”.

But because it’s a learning software, you can actually correct its errors and teach it what the correct thing is, and the computer will remember that for next time. If you repeat this process enough times the computer will eventually be able to do it without help. This learning curve is what keeps many website builders from using it, except as a seperate application that won’t affect their own platform.

At the WWPL, our cloud drop is still in need of lot’s of editing, and can only handle documents under 25 KB. But it’s still saved me a bit of time on transcribing. Instead of spending 30 mins on a document, I’m spending about 15 just editing. So that’s an improvement.

 

Week 5: 02/14-02/16

This week I continued going through the Race and Segregation Collection trying to find the ones that hadn’t been uploaded. I’m still finding a lot that haven’t been put up, and if they were maybe not under the identifiers on the folders. It’s been tedious work, but I’m glad to be getting it more organized.

I’m about a little more than a quarter of the way through the box now, and hopefully I’ve found all the ones that still needed to be uploaded. I have noticed that all the ones that needed to be uploaded were about a Department of Agriculture survey of African-American employees. Most of these documents are just bureau and office heads sending letters with maybe a few short names of their black employees, or just simply one line saying that they have none. I can see why these never got uploaded. It’s pretty dull and there are so many letters like this on this one subject.

Hopefully once I get through the entire Department of Agriculture I’ll be done uploading new docs and be able to just focus on the metadata.

Week 6 – Ochs Collection

As I continue work on the Ochs Collection, I began an important step in the digitization process: scanning the documents!

I am scanning each document as an archival PDF, rather than JPEGs like I did for the Ambuehl Collection, which was mostly photographs. As many of the documents are multipage and/or double-sided, I’ve brushed up on my PDF editing skills by creating multipage PDFs so each page will be in the same file.

At the end of the week I took a break from scanning to catch up on filling in the metadata spreadsheet for this collection. I wasn’t able to complete it earlier, because it needed file names for each item. I’m also working on adding tags and LCSH subjects to the items as well. This part is a little slower going than I had hoped, since the subjects of the documents are a little more varied than the ones in the Ambuehl collection–so that means less copy-pasting, and more time spent getting familiar with LCSH.

Week four: 02/07-02/09

After pouring through the Race and Segregation files and comparing them with what was on the excel sheet, I thought I was close to finishing the project. I had found a few files that had no website addresses on file for them, and could not locate them on the Omeka site. After bringing this up with my supervisor, I was told to start uploading them to Omeka.

I enjoyed getting to put new stuff up on the site. However, the more files I went through, the fewer I found on Omeka. Again I asked my supervisor about this and I was told that many of the files had been uploaded to a cloud server until they could be put on the website. That was when I realized that the site addresses that were on file were for this cloud server, and that only a small portion of the collection had actually been uploaded online. I then spent the next few days uploading files and creating the metadata and excel file for them.

I worked on transcriptions from home on Fridays as usual. I finally finished transcribing the letter to Wilson from his niece. I was able to decipher one set of text that I had been trying to make out for over a week. I thought the section had read “Atty Ben Arbuckle” and had thought it was a very odd name. However, after looking at it i realized it said Atty. Gen. I looked this up and realized it was another abbreviation for Attorney General. It was satisfying to finally figure the whole letter out.

 

Week 5 – Ochs Collection

This week I completed the metadata spreadsheet for the Ochs Collection. Drawing from my prior knowledge of the collection, along with the donor file for Shelby Ochs Owen, I also created a finding aid and box list for the collection. This will help researchers learn more about the collection and identify what folders they may want to look at. The finding aid will also become part of the collection description in the Omeka elibrary.

Next week, I should begin work on scanning the documents so they can be uploaded online. As I’m now almost halfway through my internship, I would also like to begin fleshing out my plan for the online exhibit I will create for my final project.

css.php