Export all media with GEDCOM

DallanQ · May 29, 2020, 4:28am

I will work on it this Summer.

MJPitman · May 29, 2020, 8:22am

Yay.

cdhorn · May 31, 2020, 1:17pm

Just curious Dallan, is that the Tamura Jones led effort which I hope gains traction? I read through the 5.5.5 specification the other week and think something like it was long overdue.

If you are involved in an effort for a 6.0 specification I think the overriding priority should be a refactor that supports zero data loss. My understanding is the Gramps team came up with Gramps XML because Gramps exports to Gedcom are lossy which is really unacceptable.

No matter the source of the next standard, getting the big players to adopt it is what would actually make it a standard.

DallanQ · June 5, 2020, 4:15am

It’s an effort led by FamilySearch, but I really wish Tamura Jones was involved. I agree that getting the big players to adopt it is the big challenge. And I agree that zero data loss would be ideal. It’s not easy though: A Robust Open-source GEDCOM Parser

cdhorn · June 7, 2020, 1:17am

I know he seems to have tried to take over the standard to help straighten things out as they really have let it languish for years. I guess it is good it has spurred some action there at FamilySearch, and glad you are involved. I think it would be good if they tried to include him and someone to represent the GEDCOM-L guys. Regardless will be interesting to see what comes of it.

Thanks for sharing that deck, I need to take a closer look. I was playing with the python-gedcom parser a couple weeks ago and learned a good bit doing so.

Have you read any of Tony Proctor’s blog posts or looked at his STEMMA data model? I’ve spent a bit of time with it the past week, I really like how he recognizes and treats Events as a top level entity. If you have not read any of his blog posts at least try to make time to read three, this is the last of the set and at the start are links to the first two:

And here was the specification of the data model he put together:
http://www.familyhistorydata.parallaxview.co/
I also thought his approach of using multiple XML files and not storing things in a database was interesting. It made me think of your comment about Matrix really being sort of a distributed json database. It would be a very different model to do something like that.

I wonder if you did if you could structure the data in such a way that you could manage it with Git.

DallanQ · June 11, 2020, 5:04am

I personally like what Tamura is doing. But I think FamilySearch has more of a chance at getting broad support. I wish FamilySearch and Tamura would work together as well.

I looked into STEMMA a long time ago; I’ll have to check it out again.

I’ve thought a lot about git. It has some interesting ideas, but I haven’t been able to reconcile the idea of periodic git commits and pushes with the idea of being able to save a history instantly and share it with others. I think that a closer model is google docs.

Another interesting problem is how to share only a portion of your tree with someone else, so part of your tree might be shared with a cousin on your mom’s side, and another part with a cousin on your dad’s side. And how you could do this without making the permission system too complex.

cdhorn · June 12, 2020, 2:59am

What I take away from studying STEMMA is that the data model most Genealogy applications are built around was largely influenced by the one implicit in the GEDCOM standard. It works but it was influenced by and designed to fit the needs of the Mormon church.

Regarding managing permissions to share branches of a tree, why does a tree need to represent the contents of a full database? It could be a logical construct, a set of pointers to a subset of people in the database.

Then the same person could be a member of multiple trees at the same time. The mechanism for adding and removing people to the trees would operate on individuals and optionally could recursively include or exclude ancestor, descendant, spousal, and sibling lines among other things.

This accomplishes what I was thinking about in my Git comment earlier as you could keep research trees and if things work out then you could easily add them into your primary tree or trees as desired.

DallanQ · July 4, 2020, 9:24pm

I like the idea of sharing subsets of your tree with different people, so my tree is really a combination of multiple trees. This way I could even have part of my tree at WikiTree and part of it private. So a branch is simply a collection of objects, users can be given access rights (read or update) to that collection, and my “tree” is the set of branches that I have been given access to. Thinking about it like a graph, a branch is a set of nodes, and my “tree” is a collection of branches.

The tricky part is what to do about family relationships that cross branches. If I link from my father in my branch to my grandfather in another branch, does the other branch automatically link back to my father in my branch? What if I don’t have rights to edit that other branch? And what if members of the other branch don’t have rights to view my branch?

It seems like once we go down this road, instead of having bidirectional links (links from a child to a parent also link from the parent to the child), which is how all genealogy databases that I know of work, we have to go to unidirectional links: when I link from a child to a parent, a separate link from the parent to the child is created if I have rights to update the parent object. And it’s possible that users can see links in their pedigrees to people they don’t have rights to view. But I really like how easy this makes it to share branches of your tree.

DallanQ · November 15, 2020, 6:25am

I have some good news! There is a new “Export into a zip file with media” button next to the traditional “Export” button on the “Tree Settings” menu. It creates a Zip file containing a gedcom file and all of the media files in your tree. I can’t guarantee you will be able to import the media into any program automatically, but getting the media onto your own laptop gives us a great starting point.

MJPitman · November 16, 2020, 4:02am

Many many thanks for delivering on this. Much appreciated.

Marlene

DallanQ · November 16, 2020, 4:28am

Glad to hear it! Let me know how things go. You might need to copy the images into a special folder before importing the GEDCOM in order for the import to work. Figuring out what that folder should be named depending upon the software that you want to import the media into is the next step.

MJPitman · November 16, 2020, 10:29am

Hi Dallan

The export with media works well. All files are in the same folder, and in Reunion just requires me to use Preferences/Multimedia … repair multimedia links and it’s fine. Just one quirk I’ve picked up. The way you’ve formatted the GEDCOM file the media is defined once only and referenced for each person who is linked to it. This doesn’t allow the option to identify which is the preferred photo for that person.

I notice that on an export from Reunion, the Object is defined each time it is referenced, rather than defined once at the beginning of the file. The format is
2 FORM jpg
2 FILE ~/Desktop/_CliftonSpry-16-November-2020/180783425417934.jpg
2 TITL Dorothy Clifton and Harold Carbins Wedding
2 _TYPE PHOTO
2 _PRIM Y
2 _SIZE 2551.000000 1936.000000

Note that _PRIM identifies the primary photo.

I would assume that the photo used as the profile photo would be marked as preferred. That said, if there isn’t a profile photo Reunion will randomly (first one in the list I think) assign the preferred photo.

Any thoughts?

Marlene

DallanQ · November 17, 2020, 4:11am

So just to be clear, the export from Reunion has multiple OBJE components, each pointing to the same file: 180783425417934.jpg180783425417934.jpg ?

MJPitman · November 17, 2020, 6:52am

Yes that seems to a be what they do it. It isn’t optimal I guess, but it does achieve the ability to specify the media item that is the preferred one (and hence the default picture on the screen). I noticed when I repaired all the links I was getting rather odd default/preferred media for the people. Just something to think about.

DallanQ · November 19, 2020, 5:36am

I don’t think that will be too difficult to implement. I’ll look into it on Saturday and let you know.

DallanQ · November 27, 2020, 9:18pm

I didn’t get a chance to fix it on Saturday, but it should be working correctly now. Would you please double-check it sometime?

Thank you!

MJPitman · October 13, 2021, 5:20am

I’m sorry - its taken me a long time to check this, mainly because when I first tested it everything that had previously been working well in regard to media was no longer working. Feeling overwhelmed I put off trying to work out what was going on. Finally, I’m going to give it a go Details to follow.

MJPitman · October 13, 2021, 5:40am

Referring back to my comments last year.

The export with media works well. All files are in the same folder

this is still true

and in Reunion just requires me to use Preferences/Multimedia … repair multimedia links and it’s fine.

unfortunately no longer true. All of the photos appear in Reunion as ‘no media file’ so it is hard to even link them. I had to find a person that had just one photo, search through the photos until I found the right one and link to that. Choosing the option to ‘repair multimedia links’ didn’t resolve the others

Just one quirk I’ve picked up. The way you’ve formatted the GEDCOM file the media is defined once only and referenced for each person who is linked to it. This doesn’t allow the option to identify which is the preferred photo for that person.

so trying to fix the ‘preferred photo’ issue seems to have broken the rest of the logic. Delving further to try to find out why (but not a GEDCOM expert by any means).

MJPitman · October 13, 2021, 6:28am

Ok, so I’ve found one of my small trees that has just 3 media items attached. One person has 2 items attached to them. One item is attached to 2 people. So in total there are 4 references in the file.

Your export has the following for the person with 2 items attached:

1 OBJE @370803443014220_5398311990229985@
1 OBJE @4804808481125282_5398311990229985@

…

0 @370803443014220_2595646115984705@ OBJE
1 FORM image/jpeg
1 TITL Marriage Certificate - Joseph Pitman and Catherine Lang
1 FILE 370803443014220.jpg
1 _PRIM N
0 @370803443014220_5398311990229985@ OBJE
1 FORM image/jpeg
1 TITL Marriage Certificate - Joseph Pitman and Catherine Lang
1 FILE 370803443014220.jpg
1 _PRIM N

When I import this into Reunion there are links to ‘no media found’, ie Reunion know there are 2 files linked (from the first two records) but has not read any of the details (ie does not know the file name or the title etc).

When I amend the exported GEDCOM file and embed the media details into the file as follows:
1 OBJE
2 FORM image/jpeg
2 TITL Marriage Certificate - Joseph Pitman and Catherine Lang
2 FILE 370803443014220.jpg
2 _PRIM N
1 OBJE
2 FORM application/pdf
2 TITL Joseph Pitman death certificate
2 FILE 4804808481125282.pdf
2 _PRIM Y

then import into Reunion, the links are shown, I can in one place point the file to the correct directory to find the media, all links are resolved and work as expected. IN particular the primary photo is displayed as expected. (Note if there is no primary photo identified Reunion will make its own decision, but that’s ok).

So in short, I don’t know why the use of the 0 level OBJE records isn’t working. I can’t recall whether this was the format before or not… but something is wrong.

I can analyse it this far… I’m hoping you might be able to work out the last bit.

With thanks

Marlene

DallanQ · October 17, 2021, 5:07am

Just to make sure I understand, would you mind sending both files to me? I don’t need the photos, just the gedcoms. I want to make certain that I make the changes correctly. My email is dallan at rootsfinder.com. Thanks.