Dallan, a couple more thoughts I wanted to share and expand upon…
First data classification…
Gramps let you flag any item in the tree as private. A person, place, source, citation, relationship, you name it. I think a new program should permit that with any object as well, not just people.
Someone on here asked at one point about being able to flag uncertain information in their tree as being a research item and having an easy way to find it, and I too have wanted the same thing for some time.
While you could argue a tagging system covers that, it should be part of the core functionality of any genealogy product I think just like the privacy flag concept. You have public and private data, and you have supported/cited/proven and speculative/research/unproven data in the tree. You need a way to classify it.
In addition to allowing a user to classify their data they should also be able to, if it is a person, apply that recursively to all descendant or ancestor lines for a person.
On Gedcom exports or when sharing the tree online with others the owner should be able to control whether research data is included/visible or not just like with private data.
Having done that it also makes sense that maybe some elements in the tree view may be color coded to denote the status of the person. Maybe a thin yellow border around people classified as research or something like that.
Second source citations…
Not all citations are created equal, as we know, and as evidence should not be treated as such. At one point I suggested a way to rank them and color code the stars here I think.
I think all citations should have at least two properties for ranking, let us call them knowledge and origin although maybe there are better labels.
The knowledge attribute captures if the citation/peice of information comes from a primary, secondary, or other source. You rank by assigning values 2, 1, and 0 to each.
The origin attribute captures if it is an original, transcribed, derived or other source. Here you rank by assigning values 3, 2, 1, and 0 to each.
Perhaps you add a third attribute, say confidence, that is a more subjective measure of high, medium, low, or none as well based on the knowledge of the overall family profile in the mind of the user. That too would be say 3, 2, 1, and 0 each as well. I’m not sure this should be included or not though. Maybe included for some purposes but not for others as it is subjective.
Rank is computed by adding the values.
A date obtained from viewing an original source, or copy of original source, like a marriage certificate would be ranked higher than a date obtained from an index or transcription of the marriage record where an error could have been introduced. And likewise a marriage or birth year derived from a census ranks less than either of those. And likewise a date from a published family history is even further removed, as even if the author reviewed the originals they could have made a mistake as well as the publisher. Or a date from a Family Tree published from someone else, here I’m thinking of Ancestry.com trees, should also not be given much weight given how many have problems in them.
I think a ranking like this is just as important in an evidence oriented system as data classification.
And think about what it enables now. In a publish/subscribe model if people are able to publish to a pando tree or subscribe to one it helps rank what information to consider more reliable when they publish.
Someone publishes a birth date, the citation is from a primary source. Someone tries to publish one that is from a less accurate source. Now you have a way of better managing that. You also then have a metric to try to rank the users who publish to the tree for accuracy. Maybe those who seem to know what they are doing are allowed to perform direct edits.
The other thing having a ranking like this does is give you something to use to try to visually indicate the quality or reliability of the data to the user.
For example I have deep New England lines in my Ancestry. Some ancestors have ten records or so in that Connecticut Deaths and Burials index. That index was transcribed from the Hale newspaper transcriptions I think. Having ten records with the same date does not tell me anything new. What is worse with that collection is that some of the death dates actually differ because they were not the real death dates, they were dates the newspaper article mentioning the death was published and the data was transcribed wrong. But the point is if I saw ten stars next to a person I might think they are well researched/sourced/documented which is misleading, and worse yet all those sources are of a poor quality.
With ranking you could look at some of the key facts about the person, birth, baptism, marriage, death, burial or whatever. Each fact with a citation is now a star with the color based on rank with green the highest.