Sei sulla pagina 1di 5

Out of the Box Thinking for Artificial Intelligence (AI)

By Korkut VATA

Artificial Intelligence (AI) is the apogee of the human intellectual legacy. Combining algebra,
trigonometry, calculus, statistics, computer science, neuroscience as well as text based literature of
choice with ever increasing computational power of today’s hardware -systems Artificial Intelligence
(AI) has the potential to solve many problems of today and to come up with many novel problems to
be answered in the future.

In this short review we are aiming to push the boundries of Artificial Intelligence (AI) methodology
out from conventional methods towards novel approaches in murky waters by extending the classical
understanding of the p- Value, control group, loss of function, double negative concepts (1).

Towards that end we are suggesting well defined datasets for machine learning algorithms working
on healthcare, image recognition and financial Industries for beter outcomes.

No other example in the history descibes the importance of evaluation of statiscal results better
than the story of the WW II planes. During WWII planes are examined for bullet damage after every
sortie.

Bullet distribution statistics on fighter-planes are drawn to put extra armour on the most vulnerable
parts of the planes. It was the extraordinary attention of Hungarian-Jewish statistican Abraham Wald
realizing that the fighter planes studied were the surviving planes and the bullet hits on that
particular on the map were not lethal. The plane parts excluded from that map such as the engines,
cockpits should be strengthed as those planes with a lethat hit on these parts never returned to
home. Abraham Wald went further and said in a war there are more patient-soldiers in the
hospitals with bullet hits on arms and legs than on heart and brain. Eventually, this intuitive forseeing
of Abraham Wald rescued countless fighter planes.

This was the prime example of intuitive data analysis.


An Artificial Intelligence (AI) field, future election prediction based on the past election results does
also require intuitive reasoning. As such, “accoring to the votes of the poor / according to the votes
of the rich / votes of educated people, etc,etc.” is tough to dissect. Futherther, the statistical cutoff-
values defining these groups are vaguely defined.

Another example from DNA-data mining field is the classical concept of Hotspot analysis that is often
within a gene or within a regulatory region with a vaughly defined biology. Often the concept of cold-
spot, i.e. the absence of a particular DNA-base (one of ATGC) is not considered as the culprit of a
disease. Further, there is a lack of a comprehensive “One linkage score per DNA letter Approach” for
the whole genome that is well within the limits of todays computational power (2,3).

As examplified above todays Artificial Intelligence (AI) Algorithms are apperantly beyond the “one
thing at a time” understanding of the control / experimental set up of recent data mining
approaches. Their autonomous learning structure makes basic control / experimental set up murkier
everday. Indeed big data based autonoumous learning came to a point, where the algorithm
designers can no longer trace the final decision of the algorithm back to the initial settings.

The field of image recognition is particularly murky as studio image-datasets with unnatural-artificial
illimunation patterns often misinterpret the images of the real world.

A suggestive dataset for machine learning algorithm working on image recognition would be the
colorometric negatives of the same image dataset to train the algorithm. By training AI on the
colorimetrically negative dataset while keeping the countours of the images the same, we may
compansate for the errors due to the illumination of the images and understand the illumination
versus countor problem of the image recognition better. The human brain has seen every
recognizable object both at night and at day with cones and rods of the retina .

Alternatively, the imagedataset should be further tagged with keywords of illumination as “ ..in the
dark, ..at night, dim light…”

Analogusly, massive datasets of CERN can be analysed for positive charged particles and negative
charged particles separetly or else the charges of the particles can be reversed and the predicted
outcomes can be reassesed.
Further suggestions on the database structure are the following:

Suggested Datasets for Machine Learning Algorithms of Image Recognition:*

1. Book Figures- Photographs of Googlebooks with whole figure descriptions (legends) as the
tag-phrase. (This sample set with whole figure legends is particularly important as it best describes the pictures
in every dimension)

2. Book Figures- photographs of Googlebooks with whole figure descriptions (legends) as the
tag-phrase (Colorometric negatives of those images-photographs in 1. For color-countour resolution) .

3. Book Figures- drawings of Googlebooks with whole figure description-(legends) as the tag-
phrase.

4. Book Figures- drawings of Googlebooks with whole figure description-(legends )as the tag-
phrase. (Colorometric negatives of those images in 3.).

5. Graphic Figures of scientific literature with whole figure description-(legends )as the tag-
phrase. ( Machines may learn graphic reading and form a visual bridge to data analysis with all the other
graphics software. They may read and understand the graphics they produce.)

Suggested Datasets for Machine Learning Algorithms of Text Mining:

1. A compendium of Science Fiction Novels.

2. Newspapers.

3. Romantic Novels.

4. Paperbacks.

5. A compendium of the Works of a single author e.g. HG. Wells.

6. A compendium of Marvel Comic books with captions as texts and drawing as images.

-Any other book category reflecting the nature of the particular literateral fileds.
Suggested Database Structure and Methodology for Whole-Genome Studies:

1. Whole Genome Datasets along with healthcare records.

One linkage score / DNA position approach.

Solution for DNA Redundancy and Probespecificity is described as (2,3).

Suggested Datasets for Financial Analysis:

1. Transaction Database of a well performing Bank.

2. Transaction Database of a poorly performing Bank.

(in the same financial athmosphere, in the same country, under the same law)

*The methodology and database suggestions made in this brief review can be extended intuitively. Every
Dataset analysed and every approach of better data resolution will improve AI algorithms and our
understanding of AI for beter outcomes, globally. No financial interest is intended.

References:

1. Three pitfalls to avoid in machine learning. 30 July 2019 / NATURE.

https://www.nature.com/articles/d41586-019-02307-y?
error=cookies_not_supported&code=9b5fabf7-7ee0-4c47-9856-bd770143fe40

2. https://www.youtube.com/watch?v=l0tOlZRC6CY

3. https://www.scribd.com/document/409051107/Genome-Data-Mining-One-Linkage-Score-per-DNA-
Letter

Potrebbero piacerti anche