Identify problem rows on import
System dynamics program with additional features for economics
Brought to you by:
hpcoder,
profstevekeen
PNG
Here's the movie of trying to import this file.
IdentifyProblemRowsOnImport20230130.mp4
https://drive.google.com/file/d/1IED7l0TMLOffNZ9IZ0IWgHfdQSmXEniW/view?usp=drive_web
That was the idea of the "error report" functionality.
If I look at the "error report", I don't thjnk it is as simple as saying choose one entry over the other. How would you choose, anyway?
HP is showing two entries for 2017-2020. I happen to know that HP split into two companies in late 2015 - HP Enterprise and HP Inc. I suspect these two different companies received the same name in this database. But that doesn't explain the other entries. Raytheon is still just the one company, as is ITT in the 50s and 60s, as far as I know (it was split in 1995, and then again in 2011). And what happened to Fairchild in 1989? That last one might be a mistake.
I loaded the files by taking the average of duplicate values (maybe sum is better, if these truly correspond to corporate breakups), and plotted revenue vs rank.
But only one year works :(
Need to find out why.
Yeah, a lot of the time it is just going to be sloppy data entry that causes hassles like these. Since so many such files are created in spreadsheets rather than databases, errors like this aren't picked up on entry--and it's obvious that no-one has attempted to clean this data for decades!
So an automated process might work (such as taking the Max here, which I would do in this instance on the assumption that the biggest market size is the correct one), but it would be better still to show the offending lines to the user in an edit window, where changes could be made (delete rows, edit names, etc.) and then written back to the source file (or a choice could be made like my assumption, to just load the Max value in those cases, or the average, etc.).
This is more elegant than giving the user another CSV file to manually scan in Excel, especially in cases where Excel can't load all the file anyway.
Ticket moved from /p/minsky/ravel/315/
Can't be converted:
Ticket moved from /p/minsky/tickets/1826/
Can't be converted: