Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, December 13, 2017

How important is high resolution for scanning and photography?

Are you tempted to join the megapixel race? Are you concerned about the resolution of your digitization efforts for photos, paper records, and other genealogically important documents? Do you use the megapixel count of a camera or smartphone as a factor in your purchase decisions? These issues and more concern anyone trying to digitize records or take photographs. Genealogists and photographers share some of the same concerns.

I have written on this topic several times in the past. Here is a list of some past posts that deal with aspects of this topic:
This list could go on and on. In a recent post, I expressed my views on the challenges of genealogy and I included an issue about the unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections thereby increasing inability of the larger collections to ingest smaller collections of records. On reflection, that topic needs more explanation and discussion. 

In response to my post on the challenges to genealogy, I got the following comment:
I have always been a believer that preservation should be performed at the highest possible resolution. As time has passed, as you mention, this could be 50 Megapixels today, and who know how much tomorrow? But the biggest advantage of 50 vs 12 Megapixels is the ability to zoom in and examine details closely. I have found this very helpful with things like scans of old vital records where correct interpretation of handwriting, for example, requires great magnification. It is useless if zooming in only results in a highly pixelated image. This applies likewise to photographs where the only image of GG Grandpa is a tiny section of a larger image. If I want to recognize his features clearly, I am grateful for a 50 Meg scan. Obviously, as you mention, file size (storage capacity) is an issue, but less so as time passes. Therefore, I support the ". . . unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections . . .". Tomorrow's researchers will thank us for adhering to those high standards.
Is there a direct relationship with a high megapixel count, say 50 megapixels or more, and the ability to recognize small features in either a photograph or another type of document?

We need to start any discussion of this type with some observations about physical reality.

I will start with photographs. Analog photographs using photographic film are considered to be continuous tone images. However, the resolution of a photograph depends on the type of film used. The sensitivity of film to light is measured in a number assigned by the International Organization for Standardization or ISO or the American Standards Association, now known as the American National Standards Insitute, or ANSI whose standard is usually designated by the older acronym, ASA number. There is a direct relationship between a film's ISO/ASA number and its ability to resolve fine detail, i.e. resolution. The higher the ISO/ASA number, the larger the grains of light-sensitive material, usually some compound of silver, used to capture the image. These numbers are usually used to represent the "speed" of the film or the time it takes to form an image. The higher the numbers, say around 1000 or 2000, mean that the film is very "fast." The tradeoff is always a loss in detail i.e. graininess of the image.

There is no free lunch, greater resolution means smaller discrete light sensitive elements. Photographers know that high ISO/ASA numbers (or fast film) mean a decline in detail in direct proportion to the additional speed. For those wishing to digitally reproduce film photographs, the resolution of the copy cannot exceed the original. Any document or photograph has a certain limit of resolution. Once a duplication method reaches that point of resolution there is no more information in the original that will be lost because of the copy. It may seem counterintuitive, but higher resolution scanning or photography past a certain threshold will simply result in larger file sizes and not any more detail. Once that limit has been reached, there is no more information to obtain.

I am not here talking about photographs of real-life objects, I am talking about copying historical records and photographs, essentially digital reproductions of actual analog documents.

Here is an example of what I mean. This is a microfilmed copy of a record from the website that was previously microfilmed and has now been made available in a digitized copy:

Now, how did this image come to be on the website? In a simplified explanation, someone had access to the original record and then made a photographic copy of the original using some type of microfilm. Here, the resolution was determined by the type of film, probably with a very low ISO/ASA number below 100, i.e. with the highest amount of detail available. Now, to move this image into the digital world, FamilySearch made a digital image at some extremely high resolution (for a digital image) and then processed that image for display on its website. What about the resolution of this image? Well, first of all, it is a JPEG image and we will have to view the image on our computer's monitor. Let's see what happens to this image at magnification. Here is a screenshot of the image at 300%.

Hmm. there appear to be some problems with the original. There is a great deal of bleed through from the back of the page. What about higher resolution? Here it is again at 600%.

Is there an upper limit? Yes, here is the image is again at 800%:

At this point, further magnification will simply start more pixelation and not provide any more detail. Could this be extended indefinitely be making the original with a higher digital pixel count? In reality, the file size would increase dramatically but you would still be limited by the resolution of the original image. Here is the same image at 1200% magnification.

Any higher and the image will start to become unrecognizable. Where can you see the most detail? Guess what? That depends on how closely you look at the image. If you stand some distance back, the high magnification images look just like the ones with lower magnification.

There is a reason why the Libray of Congress established standards as set forth in its "Guidelines: Technical Guidelines for Digitizing Cultural Heritage Materials." There is a balance between increased resolution and the preservation of the detail in a document or photograph. Higher resolutions give you larger file sizes but at some point, no more information from the original.

There is no free lunch. You cannot beat the system and the system is physics.

Tuesday, December 12, 2017

The Ultimate Challenges of Genealogical Access to Digitized Records

Online genealogically important historical records are rapidly transforming the way genealogists find their ancestors and extended ancestral families. Billions of new records are being added every year by the large online genealogy companies. It would seem that this flood of new records could go on indefinitely. But there are strong indications that the flood may soon diminish to a trickle unless the genealogical community can overcome some looming obstacles.

These obstacles to the continued increase in the number of online genealogical records fall into a number of categories that include the following:
  • Political restrictions on the access to records
  • The monetization of records by governments and other organizations
  • The reverse side of the principle of economies of scale, i.e. the cost of digitizing smaller collections of records
  • Unrealistically restrictive copyright and other similar restrictions on historical records
  • The unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections thereby increasing inability of the larger collections to ingest smaller collections of records
  • The costs of maintaining ever larger databases including the costs associated with migrating file formats over time
  • The lack of community standards for record formats and the inability of users to move records from one online family tree program to another
  • Ignorance of the members of the genealogical community as to the identity and availability of online digital record collections
Here is my viewpoint on each of these obstacles:

Political restrictions on the access to records

The most difficult and pervasive obstacles to continued digitization are the politically imposed restrictions on record access around the world. In some areas, record access, much less digitization of those records, is virtually impossible. It is clear that the ability of individuals to access records is a major threat to oligarchies and repressive governments no matter what their origin or motivation. This is not an issue that is limited to national governments but can operate on a local level when politicians believe their control and power are threatened by access. In the United States, for example, we would not have national and local freedom of information statutes were politicians and bureaucrats cooperative in providing access to "public" records. In addition, the ongoing destruction of genealogically important records and the attacks on state archives and libraries continues to threaten the availability of records around the country. Absent major changes in some countries of the world and even in parts of less repressive countries, many records will remain unavailable. Ultimately, the reasonably accessible records around the world will all be "cherry picked" leaving huge numbers of records locked up by repressive governments. 

The monetization of records by governments and other organizations

It is a fact of life for genealogists that access to more and more records around the world are being used by those who maintain or archive those records as local revenue streams. This occurs wholesale, even in the United States, for many types of records. For example, in almost every state of the United States of America, if you are born, get married or die and you or your family want a copy of an official government certificate of any of those events, you will have to pay a fee to obtain a copy. In England, it a common practice for local ecclesiastical parishes to charge a fee for access to historical parish registers. I am not of the opinion that all records must be free, but the monetization of the records makes their acquisition by free websites such as very unlikely. It also makes the overall cost of digitizing and making the records available much more expensive.

The reverse side of the principle of economies of scale, i.e. the cost of digitizing smaller collections of records

Record acquisition and digitization are labor intensive and the equipment needed for high-quality images is still quite expensive. For these reasons, extensive record digitization efforts can achieve economies of scale. On the other hand, smaller projects with fewer records require that those same assets but must be used with far fewer records so the cost per record becomes a major concern. In other words, smaller collections have some of the same overhead considerations as larger collections making the cost per record much higher. Also, the logistics of obtaining smaller records are usually about the same as larger collections. The results are that there are distinct disincentives to acquiring smaller collections of valuable records.

Unrealistically restrictive copyright and other similar restrictions on historical records

Unfortunately, US Copyright law is vague and overly restrictive. Current copyright claims will likely be in effect longer and any person now living. Even old copyright claims dating back to the 1920s and 30s will likely be arguably enforceable longer than anyone now living. This could be called the "Mickey Mouse" effect. In both 1976 and 1998, the existing copyright interests were extended for up to 120 years from the year of creation. See the post, "How Mickey Mount Keeps Changing Copyright Law." Because the provisions of these laws are vague, all sorts of claims to copyright now cloud the ability of genealogists to access records online.

In other cases, record repositories claim a "contractual" ownership right to documents that are clearly in the public domain. These claims prevent the free use of all sorts of records, photographs, and other documents. Until there is a realistic overhaul of the copyright laws and a clarification of the unfounded claims by repositories, many valuable records will be subject to restricted access.

The unrealistic digital resolution and file format requirements imposed by those engineers and administrators of online collections thereby increasing inability of the larger collections to ingest smaller collections of records

This particular issue is less obvious than any of the other challenges facing genealogical access to digitized records. Essentially, those who are charged with developing the standards for online digital preservation impose unrealistic restrictions on the process of digitization. For example, we have long known that the highest resolution is approximately the equivalent of 170 dpi or PPI (pixels per inch) when viewed at 20 inches. In contrast, the average laser printer can print at 300 dpi or roughly double the eye's resolution. See "What is the highest resolution humans can distinguish." Presently, some of the digitization efforts going on around the world are using cameras that have up to 50 Megapixel sensors. Most of the documents being digitized could be adequately preserved with a camera of about 12 Megapixels the resolution of a present smartphone. The U.S. Library of Congress has established a publication called "Guidelines: Technical Guidelines for Digitizing Cultural Heritage Materials." Quoting from that publication concerning documents:
Image capture resolutions above 400 ppi may be appropriate for some materials, but imaging at higher resolutions is not required to achieve 4* compliance.
The practical effect of an artificially imposed higher standard is that many smaller collections are going to be lost because the large online genealogy companies refuse to ingest even images at the Library of Congress standard or make the process of obtaining images so complicated as to make smaller collections unfeasible.

The costs of maintaining ever larger databases including the costs of migrating the file formats over time

Even with the dramatic decreases in the cost of memory storage, huge online genealogical collections, especially those with photos, videos and audio files, can eat up huge amounts of memory into the hundreds of Terabytes. Adding in the cost of acquisition and maintenance makes this an extraordinary effort. Adding new records can have an incrementally higher cost. It is only a matter of time until these huge collections run into an economic and practical limit. However, there is a long way to go before this will happen. Right now, there is a major concern with the need to migrate existing collections as new file formats and operating systems evolve. Apple recently introduced a new file format for its smartphones, HEIC, and this will eventually affect the large online genealogy companies.

The lack of community standards for record formats and the inability of users to move records from one online family tree program to another

This is a major issue and I have written about this recently. Without community standards, each of the large online database companies is essentially an island of their own file formats. Without a standard way to exchange data, if one or more of these companies fail, much of their data could be lost.

Ignorance of the members of the genealogical community as to the identity and availability of online digital record collections

Let's face it. There is a constant loss of genealogical data due to genealogists who ignorantly or even intentionally fail to share their data and adequately prepare for its preservation upon their deaths. This attrition of records will always be a drag on preservation efforts.

There is always hope in the future and it is always possible that some or all of these issues will be resolved, but right now they stand as genealogy's greatest challenges. 

Sunday, December 10, 2017

Can your public library help you with your genealogy?
It may not occur to you but your local public library may be an excellent source of information for genealogical research. For example, the Hedberg Public Library in Janesville, Wisconsin has a long list of databases available both for use in the library and online with a library card. Some local public libraries, such as the Allen County Public Library headquartered in Fort Wayne, Indiana has one of the most extensive genealogical collections in the United States.

Here is a screenshot of the Allen County Public Library Genealogy Center website.

Your local library may be sponsored by your town, city, or county or all three. In Mesa, Arizona where I lived for many years, we had an excellent local Mesa Public Library. We also had an excellent county library system, the Maricopa County Public Library System, and a State Library in Phoenix. We also had an extensive system of Family History Centers around the Salt River Valley including the one where I was a volunteer, the Mesa FamilySearch Library.

It was interesting to me that many of the people I met in the Phoenix area who professed to be interested in genealogical research had never visited the Mesa FamilySearch Library and some had not even heard of its existence. There are over 5000 Family History Centers around the world and it is likely that there is one near you. See the Get Help menu for a location near you.

Sometimes we tend to judge a library by whether or not it has a particular book or other items we are searching for. But libraries can be surprising in the resources they have in their collections. If you are going to travel to an area where your family lived to do research, take the time to contact a local library in the area and ask about their resources.

Saturday, December 9, 2017

Artificial Intelligence, Chess, Voice Recognition and Genealogy
It may be one of the more obscure "news" events of the year, I found a reference to this "news" event in a blog post by my friend, Louis Kessler, up in Canada. The post the piqued my interest was entitled, "Chess and Artificial Intelligence: The Future Changed Today." This post talks about the Alphabet (Google) owned company, DeepMind.

You can get the details and watch the videos on Louis's blog post. If you have any appreciation at all for the advancements in technology, you will realize that this particular development is probably the most important change in our collective future to come along for quite a while. To understand the perspective here you need to focus on these paragraphs from Lewis's blog post:
Long, long ago, when I was a student at the University of Manitoba, I had a hobby I had dabbled in: programming a computer to play chess. I had reached a point where my program, Brute Force, was then one of the best in the world. I went to Seattle, Washington in 1977 for the 8th North American Computer Chess Championship, and followed that up in 1978 in Washington, D.C. for the 9th NACCC. (If you’re interested, see my writeup on my chess program, Brute Force).

The program was called Brute Force because I concentrated on doing the minimum possible to evaluate positions, and simply let the program iterate as many moves as possible to determine the best move. I had the full use of the University of Manitoba’s IBM 370/168 mainframe, which likely was as powerful then as your smartphone is today. Smartphones today can play better chess than the big computers did back then in the Computer Chess Championships of the ‘70s.
Here is a description of the DeepMind company from their website:
DeepMind was founded in London in 2010 and back by some of the most successful technology entrepreneurs in the world. Having been acquired by Google in 2014, we are now part of the Alphabet group. We continue to be based in our hometown of London, with additional research centres in Edmonton and Montreal, Canada, and a DeepMind Applied team in Mountain View, California.
 As usual, I must ask the question how does this affect genealogy? I can think of a number of areas, to begin with. For example the following:

  • Handwriting recognition
  • Intelligent indexing
  • Document recognition and cataloging
  • Correction of existing family tree entries i.e. standardization
  • Increasingly accurate record hints
  • Increasingly accurate duplicate record resolutions

From my own standpoint, the main area of the I would be concerned with the voice recognition software. As I have written recently, the current domination of the commercial market for individual voice recognition software is dominated by one company. The current product referred to as "Dragon" from is based on research done by IBM. Unfortunately, because of the lack of competition,  the products have been upgraded slowly and still contain numerous bugs. The biggest problem with all of the voice recognition software over the years has been its inability to improve performance by learning from its mistakes. The programs require individualized human intervention in order to learn new terms. User corrections to the text are not incorporated into the program. In the case of Dragon, new words must be individually entered. With the new developments in artificial intelligence outlined by Lewis's blog, someone or some company may be able to create a voice recognition program that actually works.

Friday, December 8, 2017

How Accurate is DNA Testing? Really?
DNA testing is not without controversy and unforeseeable consequences. The article shown above highlights some of the serious issues facing the forensic use of DNA evidence. With the popularity of genealogical DNA testing, it is important to understand the differences and similarities between forensic DNA testing and that done by the large genealogically motivated testing companies.

To begin to understand those differences and the inherent limitations of DNA testing, here is a quote from the above JSTOR Daily article:
DNA (deoxyribonucleic acid) is a code that programs how we will develop, grow, and function. Humans are thought to have DNA that is 99.9% identical, but the remaining 0.1% makes us individuals, marking us out as unique. The fact that humans and chimpanzees have just a 1% difference in their DNA further highlights how meaningful a small difference can be. Generally, the more closely related we are to someone, the more similar our DNA will be to theirs.
Continuing with a discussion of the limits of DNA testing in a criminal investigation, the article states:
Realistically, then, DNA profiles should only be thought of as being likely to have come from a specific individual. Statistical approaches such as “match probability,” which is based on comparisons between crime scene DNA and a hypothetical “random” person, often are misunderstood. A more rigorous statistical approach is likelihood ratio, which directly compares two hypotheses: the likelihood of the DNA coming from the suspect vs. the likelihood of the DNA coming from someone else. If the likelihood ratio is less than one, the defense position (the DNA is not the suspect’s) is better supported; if it is greater than one, there is more support for the prosecution case. Still, the ratio at most provides scientific support for a theory, not a yes-or-no answer.
One issue that has been ignorantly raised in several online articles is that the genealogical DNA testing being done, could ultimately be used in a court of law for a criminal prosecution. I wrote about this issue previously in two posts entitled, "Is genealogically submitted DNA discoverable in a criminal investigation?" and "A Little More About DNA and Criminal Investigations."

In reality, the standards for criminal justice in the United States would be extremely unlikely to admit genealogical data as evidence in a criminal trial. Here is one of the most limiting of the standards for DNA Evidence from the American Bar Association:
Standard 2.4 Collecting DNA samples from Persons in a group by consent 
A law enforcement officer should be permitted to obtain a DNA sample from a person by consent, except that: 
(a) consent should not be sought from persons based primarily upon their membership in a constitutionally protected class;
(b) consent should not be sought from a large number of persons based on grounds other than individualized suspicion that each committed the crime under investigation unless seeking such consent has been authorized by the head of a law enforcement agency or the chief prosecutor in that jurisdiction; and
(c) when consent is sought as provided in subdivision (b) of this standard, each person should be informed of the reason for the request and of the right to refuse it, and the consent should be obtained in writing.
Note that when DNA testing is directed at a group, those tested have a right to refuse testing based on the proposed forensic use. So, if you have your DNA test done by one of the genealogy companies, the results are not useable in court unless your consent was obtained in writing prior to the test. Standard 3.1 goes on to state the standards applied to forensic DNA testing laboratories.
Standard 3.1 Testing laboratories 
(a) A laboratory testing DNA evidence should:
(i) be accredited every two years under rigorous accreditation standards by a nonprofit professional association actively involved in forensic science and nationally recognized;
(ii) be governed by written policies and procedures, including protocols for testing and interpreting test results, and permit deviation from protocols only by a technical leader or other appropriate supervisor;
(iii) use quality assurance and quality control procedures, including audits, proficiency testing, and corrective action protocols, that are consistent with generally accepted practices and in writing;
(iv) use protocols for testing and interpreting DNA evidence that are scientifically validated through studies that are described in writing;
(v) follow procedures designed to minimize bias when interpreting test results;
(vi) timely report credible evidence of laboratory misconduct or serious negligence to the accrediting body; and
(vii) make available to the public the written material required by this standard.
(b) A laboratory testing DNA evidence should make available to the prosecution the information and material that the prosecutor must disclose to the defense pursuant to Standard 4.1, and to defense counsel the information and material that the defense must disclose to the prosecutor pursuant to that standard.
(c) When an accrediting body receives notice of credible evidence of laboratory misconduct or serious negligence concerning DNA evidence at the testing laboratory, either as provided in subdivision (a) (vi) of this standard or through other means, it should audit laboratory procedures and cases that may have been affected by the misconduct or serious negligence and issue a written report.
I could probably give many more examples of the limitations imposed on forensic DNA testing but here are a few links to get you started if you are interested:
For genealogists, the issue is the accuracy of the DNA data supplied by any testing company. Coupled with the use of online family trees, DNA testing can be quite accurate for near realatives of no less than three or four levels of separation, i.e. 2nd or perhaps 3rd cousins. Every level of "removal" or separation decreases the accuracy and thereby the reliability of the results. However, when reliable, traditional genealogical research is coupled with reliable DNA testing, the results may be extended further. 

Presently, testing done by different genealogically oriented testing companies will differ because of the fact that their testing procedures and databases are, in some cases, significantly different. These differences are presently unresolvable. 

From a legal standpoint, as a retired trial attorney, I would not feel it possible to use genealogically obtained DNA testing for any type of court proceeding.  

Thursday, December 7, 2017

BYU Family History Technology Lab Interview on BYURadio

Julie Rose's regularly broadcast interview program aired a segment interviewing guests Bill Barrett, Ph.D., Professor, Computer Science, Brigham Young University; Curtis Wigington, Masters’ Student, Brigham Young University of the Brigham Young University Family History Technology Lab. Julie Rose is a winner of multiple Edward R. Murrow Awards, and a seasoned broadcast journalist and interviewer. Prior to joining BYU Radio, Rose worked as a reporter and produced spots and feature news stories for NPR's Morning Edition and All Things Considered.

The segment is called, "Making Family History Fun (and Addicting?!)" highlights several of the programs developed by the Family History Technology Lab including the popular RelativeFinder App. Professor Barrett also mentions the great strides made by the Lab in developing handwriting recognition software. Click here to listen to the segment of the Interview.

Wednesday, December 6, 2017

Will Identical twins have the same DNA test results?

The answer to the question in the title of this post is more complex than it might seem. Here is a glimpse into the problems associated with DNA testing of identical twins from Identigene at
Until recently, the consensus has been that identical twins share completely identical DNA, but recent studies show that isn’t necessarily true. Rather than looking at the standard 15 markers analyzed in today’s paternity tests, highly-advanced and impossibly-expensive DNA tests that analyze the entire genome sequence-as many as six billion markers-are able to identify at least a single mutation in one of the identical twins’ genetics that has been passed on to the child (Sapiro). However, DNA tests that are presently accessible to the public do not analyze enough markers to distinguish the two, presenting a serious problem in court cases to establish paternity for child support. 
Hopefully, next-generation technology will be able to identify the differences between identical DNA in a way that’s affordable as well as accessible to the general public. Until then, paternity involving identical twins remains unsolvable.
One major consideration about genealogical DNA testing is that the databases are rapidly increasing in size and technology is also advancing. The difference between forensic and genealogical DNA testing is largely and matter of degree of specificity. Here is a comment on ancestry testing (genealogical DNA testing) from the Government of New South Wales, Australia.
Ancestry testing is commonly offered as an online test through private companies. As different companies compare test results to different databases, ethnicity may not be consistent.

In addition, findings about ethnicity may be different from an individual’s expectations as humans have mixed with different populations throughout history and consequently individuals may have many different variations in their DNA. It is helpful to know what type of testing is being undertaken for ancestry to ensure testing will provide some clues to the questions being asked.
Using identical triplets or twins for a comparison of genealogical DNA testing will only reflect the differences in the databases used by the different companies. These sorts of publicized tests say little about the utility or accuracy of genealogical testing.