Looking inside your G+ archive

February 11, 2019

Looking inside your G+ archive

This post is about undocumented features in the Google+ takeout downloads. But, let's start with a couple of basics. Download your data (choosing the Google+ Stream) in zipped format, unzip and then add them all to the same storage folder (you want to end up with a single Takeout folder in your storage folder). Inside the takeout folder you will find index.html: open it with your browser. Closely examine the index menu. Most is self-explanatory: in Google+ Stream>Posts you will find a date ordered list of all your posts. Some other useful things are easy to overlook: in Google+ Stream>ActivityLog you will find a massive list of all your comments on other people's posts, and anther list of your +1's on other people's posts.

Now for the hidden stuff.

Inside the archive of your G+ posts, you will find a little more than you expected.

Each of your images will have an associated 'csv' file which, for example, includes a field showing how many times your image was viewed (the view counter).

To use, just find the image you are interested in and look for the csv file that best matches that name (for the moment, ignore any files called metadata.csv).

I was asked, in comments, why this .csv data might be useful :)

The data patterns in the csv files, being contemporary electronic records of the publisher, have evidentiary status in many world courts. With the destruction of the primary published record, the csv file might be used in a wide range of regulatory, criminal or copyright situations.

There has been a vibrant debate about the value or non-value of the view counter. Some say that the basis for any count is unclear (as a call to populate a screen which is never viewed or placement on a google product that is never viewed, eg a screensaver). They argue that the counter has no intrinsic value and/or is not a reliable indicator of visibility. These arguments are regularly raised in relation to counters, and have some limited respectability. Professional photographers and account managers have argued that counters remain an important (perhaps only) indication of visibility and some charge clients on the basis of view counts. Recently, other authorities have expressed interest in the counter (i guess we will see what comes of that :) ). Similar counters are commonly encountered on many other products (Google Maps), and are seen as a value indicator.

Whether either position has merit is for others to ponder, but it might be used in civil cases (say breach of copyright) as an indicator of unlawful use and therefore of damages. Conversely, a client charged for a large number of views, might find in the counter arguments that their internet manager has misled them.

There is other data in the CVS file, but let us focus on the view counter for now.

Example:

Assume you are looking for an image posted on 28 January 2019.

Go to Takeout\Google+ Stream\Photos\Photos from posts\1-28-19 (note that your earlier folders may have a different format structure - you may have to dig for the right folder)

Find the image file you are looking for (in this case a tricky image of the Narregarang, named 1ej9hvbcna14y.jpg

You will see the similarly named file: 1ej9hvbcna14y.jpg.metadata.csv

When you open 1ej9hvbcna14y.jpg.metadata.csv in Excel it will show you formatted data about the image.

When opened in something basic like Notepad, the data will be there, but you have to format it yourself.

In either case, the first 7 fields (bolded) reported:

title: IMG_9002.jpg (this is the file name i uploaded ie, a direct link back to my home file system).

description: "The Narregarang (Shaky Place): Mermaid Pool Falls..." (etc)

url: https://lh3.googleusercontent.com/..." (etc, the location in the google system)

license: (empty field)

image_views: 1,784,941 (a week later, as shown in the image above, the number had grown to 2,044,754)

creation_time.formatted: Jan 28, 2019, 12:01:43 PM UTC

There are additional fields... (many, in this case, were blank)

modification_time.formatted
geo_data.latitude
geo_data.longitude
geo_data_exif.latitude
geo_data_exif.longitude
tags
people.name
people.email
comments.comment
comments.user_id
comments.email
media_key
hex_photo_id
upload_ip status
liking_user_ids
album_id
exif.camera_make
exif.camera_model
exif.cell_width
exif.cell_length

There are lots of different ways you might use this data with archives created after 3 Feb - achives created earlier were stuffed with rubbish and you will need to clean that out before attempting this process).

The data in the csv file might have lots of different types of uses.

To examine the data as a totality you will need a program like Excel. I appended all the csv files and then sorted the data using a couple of basic commands in Windows 10 DOS and Excel.

1) Unzip your archive, open your Command prompt and navigate to Takeout\Google+ Stream\Photos\Photos from posts
[since 3 Feb, this is where the relevant *.csv files have been kept]

2) mkdir targetDir
[create a new folder called Takeout\Google+ Stream\Photos\Photos\targetDir]

3) for /r %x in (*metadata.csv) do copy "%x" targetDir\ /Y
[we will put a copy of relevant *.csv files in Takeout\Google+ Stream\Photos\Photos\targetDir We want to keep all our data intact.]

4) cd targetDir
[jump into that directory]

5) del metadata.csv
[for our purposes, this is an unnecessary 1Kb file]

6) copy *.csv all.csv
[append all your csv files into the one file called "all.csv"]

7) open the file "all.csv" in Excel - and save as an excel file (i deleted fields that i was not going to use and header fields - you can end up with some big files which can get pretty slow).

8) mine away. at this point i sorted on views.

This process is quick and dirty - to create an interactive list which links to the posts and images, we would need more complex code.

3 comments:

Anonymous said...: Just out of curiosity: what would that data be useful for?; Monday, February 04, 2019
Peter Quinton said...: The data patterns in the csv files, being contemporary electronic records of the publisher, have evidentiary status in many world courts. With the destruction of the primary published record, the csv file might be used in a wide range of regulatory, criminal or copyright situations.

There has been a vibrant debate about the value or non-value of the view counter. Some professional photographers and account managers have argued that it remains an important indication of visibility. Recently, other authorities have expressed interest in the counter - i guess we will see what comes of that. Similar counters are commonly encountered on many other products (Google Maps), and are seen as a value indicator.

Whether either position has merit is for others to ponder, but it might be used in civil cases (say breach of copyright) as an indicator of unlawful use and therefore of damages.

I think of it as an indication (however flawed) of visibility. To test this, i am using it as a way of differentiating 50 'hi-visibility' images (>1.7m - 2.6m views). By rerunning variations of the images in the final months of G+, subtle differences may provide insights into a range of matter :); Monday, February 04, 2019
Anonymous said...: Oh thanks! Now it makes a lot more sense to me :); Monday, February 04, 2019

Search This Blog

Letters

Looking inside your G+ archive

3 comments:

Comments

Popular Posts

Upper Turon (Macquarie River catchment) (Sofala to Turon Gates)

Animation: Unreal Engine 5.4, Chaos Cloth, DAZ Studio and Marvelous Designer

Winter Skyline: Brindabella and Tidbinbilla Ranges, Canberra

Eurobodalla Basin Falls (South Coast New South Wales)