Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

Challenge of the week: Clean up GEDCOM-generated data

+13 votes
659 views

Hi WikiTreers,

Will you join our "Data Doctor" Challenge of the week?

This week we're improving profiles whose biographies were automatically generated through GEDCOM imports. Our ancestors deserve better bios.

Here is the list of profiles that could use some TLC.

Will you join us?

If you're participating, please post here to let us know. It's nice to cheer each other on. Or post if you have any questions about how to participate.

Thanks for helping!

P.S. For more of an introduction, see "Would you like to help improve GEDCOM-created profiles?"

in The Tree House by Eowyn Langholf G2G Astronaut (2.6m points)
reshown by Aleš Trtnik
Challenge is active.

PGM has a number of profiles in our Needs Gedcom Cleanup maintenance category, but we have no Gedcom Cleanup Suggestions. I was just wondering if there's a reason why?

Here are some examples of the gedcom text I see in the profiles we've put in the category:

  • Old source tags such as #S22 or #S-123478
  • "Found multiple versions of NAME. Using Abigail /Benjamin/"
  • "User ID: 35BC6629D08B8C43BA40018429600D3EFDCF"
  • "Could not parse date out of ABT 1639."

Thanks!

GEDCOMJunk suggestion is shown if there are 2 or more junk headings in the biography.
  • Old source tags such as #S22 or #S-123478
are also used by some members and are not considered Getcom junk, although it is in many cases.
  • "Found multiple versions of NAME. Using Abigail /Benjamin/"
I can add new suggestions for this text.
  • "User ID: 35BC6629D08B8C43BA40018429600D3EFDCF"
I am checking only the headings and this entry is preceded by 
=== User ID ===
so it should be found.
  • "Could not parse date out of ABT 1639."
Suggestion Warning 851: GEDCOM uncleaned Interpret date covers text Could not interpret date in I will add your string and update the error during the day. It will take some time to recheck all the biographies.
Thanks, Aleš.  No hurry, we do have other suggestions to work on, but it will be good to have this picked up in the DD list.

9 Answers

+13 votes
I will do some...it's Family History Month!
by Sally Kimbel G2G6 Pilot (108k points)
+12 votes
I am helping.
by Laura Ward G2G6 Mach 4 (49.8k points)
+12 votes
I'll do a few. There are certainly a lot that need attention.

Ran into a few in last week's URL cleanup challenge. An entire series had blocks of <p> text interspersed with the occasional phrase or URL, and for some reason had split up the URLs into bits. Can't imagine what kind of tool would actually be able to reconstruct something sensible from it.
by Jim Patterson G2G6 Mach 1 (14.4k points)
+12 votes
I will try. Cleaning GedComs must be my least favourite activity.
by Bruce Simons G2G6 Mach 1 (13.5k points)
+10 votes
I don't see a tracker for this. Should we report here what was done, or is this a silent challenge?
by Laura Ward G2G6 Mach 4 (49.8k points)
The weekly Data Doctor Challenges are not tracked; no report here is needed.  This changed over a year ago when a winner and winner badge were eliminated.  The "reports" will show on the next weekly update for the suggestions involved in this week's challenge, hopefully with a good decrease.
+11 votes
I'll do a few.
by Walter Horowitz G2G6 Mach 1 (17.9k points)
+10 votes
I will help!
by Gina Jarvi G2G6 Pilot (153k points)
+8 votes
I'm gleefully helping !
by Stanley Baraboo G2G Astronaut (1.5m points)
+5 votes
I'll do some!
by Vicki Blanco Borchers G2G6 Mach 8 (80.3k points)

Related questions

+20 votes
22 answers
+15 votes
6 answers
+11 votes
10 answers
+18 votes
21 answers
+16 votes
10 answers
+13 votes
10 answers
+17 votes
8 answers

WikiTree  ~  About  ~  Help Help  ~  Search Person Search  ~  Surname:

disclaimer - terms - copyright

...