Log InRegister
Quick Links : The Mindat ManualThe Rock H. Currier Digital LibraryMindat Newsletter [Free Download]
Home PageAbout MindatThe Mindat ManualHistory of MindatCopyright StatusWho We AreContact UsAdvertise on Mindat
Donate to MindatCorporate SponsorshipSponsor a PageSponsored PagesMindat AdvertisersAdvertise on Mindat
Learning CenterWhat is a mineral?The most common minerals on earthInformation for EducatorsMindat ArticlesThe ElementsThe Rock H. Currier Digital LibraryGeologic Time
Minerals by PropertiesMinerals by ChemistryAdvanced Locality SearchRandom MineralRandom LocalitySearch by minIDLocalities Near MeSearch ArticlesSearch GlossaryMore Search Options
Search For:
Mineral Name:
Locality Name:
Keyword(s):
 
The Mindat ManualAdd a New PhotoRate PhotosLocality Edit ReportCoordinate Completion ReportAdd Glossary Item
Mining CompaniesStatisticsUsersMineral MuseumsClubs & OrganizationsMineral Shows & EventsThe Mindat DirectoryDevice SettingsThe Mineral Quiz
Photo SearchPhoto GalleriesSearch by ColorNew Photos TodayNew Photos YesterdayMembers' Photo GalleriesPast Photo of the Day GalleryPhotography

GeneralSome fun graphs and statistics I made using the data collected from mindat.

28th Apr 2024 20:42 UTCTaylor Schneider

04039460017143370305648.png
I made a tool to copy all approved species off mindat and format them into clean usable JSON objects... then I used and filtered that data into a graphing tool. Here are a few of the ones I made using said data. ( https://github.com/MisterSirCode/Mindat-Data-Collector )

(These graphs do not contain all 6040+ approved species, only the ones that had the appropriate information for each graph. Most species are missing one or more of the properties below)

These arent meant to be scientifically accurate or perfectly scaled or anything... more or less just me having fun with access to a large dataset.

Also some fun stats I obtained from all approved species:

With 4,788 species and 18,638.5 total, the average Mohs hardness was 3.893.
With 5,242 species and 22,029.3 total, the average Density was 4.202.
With 800 species and 1,394.1 total, the average Index of Refraction was 1.743.
With 1,025 species and 431,143.45 total, the average Vickers Indentation Hardness was 420.6.

The hardest on the mohs scale was diamond at 10 (obviously)
The densest mineral was Iridium at 22.65 - 22.84 g/cm^3
The most refractive mineral was Arsenudinaite at a maximum RI of 10
The hardest on the vickers scale was Oxy-Chromium-Dravite which was recorded at 14540 (Though I know there are much harder ones... just missing from mindat)

Heres all the graphs I made:

https://github.com/MisterSirCode/Mindat-Data-Collector/tree/main/img/graphs

29th Apr 2024 08:05 UTCDave Griffiths

Interesting work - I've tried a few similar things recently. Do you know if it's possible to access photo (meta)data with the API? It would be great to do more plots based on uploaded specimens.

29th Apr 2024 11:02 UTCTaylor Schneider

Probably. However I imagine that data would be far denser then the whole of approved species on mindat

Guess it would depend on what exactly you're trying to record (Just the dimensions or amount from certain locales or something) but if you are planning on using the API for images, that'll be quite heavy on it..

2nd May 2024 08:54 UTCDave Griffiths

I think simply having the ability to access the metadata for the photos (species, locality and description) would allow for a lot of interesting work without access to the actual image. This could be heavily rate limited/paginated if needed, but yes it might be asking too much. 

I wonder at any one time how many people are scraping data like this though!

2nd May 2024 10:42 UTCDavid Von Bargen Manager

Not many. They would need to do a lot of scraping for 1.3M photos and will be noticed and have their IPs banned.

3rd May 2024 12:50 UTCTaylor Schneider

So long as they're not scraping the photos and downloading each page of json data one at a time, there shouldn't be any issues. 

That's what I did to make sure I was not overloading the bandwidth / api. Each api call over 10 items has its own page and a page count to help automatically download each one... So when I was grabbing all 6040+ approved species, I just downloaded it 10 species at a time, with a wait call between each one

3rd May 2024 14:43 UTCDave Griffiths

Yes I guess responsible API usage is pretty much the same as responsible scraping - which if done correctly isn't really a problem for anyone.

29th Apr 2024 11:54 UTCHarold Moritz 🌟 Expert

Interesting, fun stuff that can now be relatively easily accessed. I'd be more interested in medians than averages because the curves are not linear, so averages really don't mean much (oh, a pun there!). The medians would tell us what values are more typical and are probably lower than the averages because a few high values skew the averages upward but have little effect on medians.

30th Apr 2024 12:56 UTCTaylor Schneider

Yeah. I gotta play around with it some more some time.

29th Apr 2024 12:12 UTCUwe Kolitsch Manager

The most refractive mineral was Arsenudinaite at a maximum RI of 10
A wrong value was entered - will fix this.

30th Apr 2024 12:53 UTCTaylor Schneider

thanks. I indeed came across quite a few errors in the api thanks to this project. Lots of stuff with formatting errors, even found a species with a specific gravity at 76.3 (though that was fixed) 

29th Apr 2024 12:23 UTCUwe Kolitsch Manager

The hardest on the vickers scale was Oxy-Chromium-Dravite which was recorded at 14540 (Though I know there are much harder ones... just missing from mindat)
There is also something wrong here.
The descriptive pape states
"The VHN microhardness is 14 540 MPa (load 50 g), equivalent to
a Mohs hardness of approximately 7½ (Reznitsky et al. 2001)."

However, the value 14540 was added to the mineral page, where the unit is automatically given as kg/mm³. This contradicts


30th Apr 2024 12:55 UTCTaylor Schneider

That one was definitely a bit odd to me. Are there a lot of species with incorrect units? It definitely makes it a lot harder to work with since I can't see the units on my end (Or maybe I can but I'm just not seeing something in the API)

2nd May 2024 09:05 UTCJolyon Ralph Founder

This sort of analysis is excellent for tracking down potential data entry errors on mindat, thank you very much Taylor!

3rd May 2024 12:52 UTCTaylor Schneider

No problem. Pretty soon I wanted to make a tool to detect which entries (and specific values) had formatting errors, so I could potentially create a list of all ones that need fixing 

But for now I'm glad we poked a few very bad ones out (the 76.3 g/cm mineral, and the one with an RI of 10 lol)

3rd May 2024 13:01 UTCUwe Kolitsch Manager

Pretty soon I wanted to make a tool to detect which entries (and specific values) had formatting errors, so I could potentially create a list of all ones that need fixing 
 That would be outstanding!
 
and/or  
Mindat Discussions Facebook Logo Instagram Logo Discord Logo
Mindat.org is an outreach project of the Hudson Institute of Mineralogy, a 501(c)(3) not-for-profit organization.
Copyright © mindat.org and the Hudson Institute of Mineralogy 1993-2024, except where stated. Most political location boundaries are © OpenStreetMap contributors. Mindat.org relies on the contributions of thousands of members and supporters. Founded in 2000 by Jolyon Ralph.
Privacy Policy - Terms & Conditions - Contact Us / DMCA issues - Report a bug/vulnerability Current server date and time: May 14, 2024 10:10:31
Go to top of page