History for tools/cvs2fossil/lib/c2f_pinitcsets.tcl

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

History for tools/cvs2fossil/lib/c2f_pinitcsets.tcl

[14fdb6d3] part of check-in [6559f323] New command 'state foreachrow' for incremental result processing, using less memory. Converted a number of places in pass InitCSet to this command, and marked a number of othre places for possible future use. (check-in: [6559f323] user: aku branch: trunk, size: 11534)
[37791a27] part of check-in [b3d61d78] Fixed bug made in [f46458d5bd] which prevented the saving of the changesets generated by the breaking of the internal dependencies. (check-in: [b3d61d78] user: aku branch: trunk, size: 11388)
[8351f6b6] part of check-in [f46458d5] Reworked the basic structure of pass InitCSets to keep memory consumption down. Now incremental creates, breaks, saves, and releases changesets, instead of piling them on before saving all at the end. Memory tracking confirms that this changes the accumulating mountain into a near-constant usage, with the expected spikes from the breaking. (check-in: [f46458d5] user: aku branch: trunk, size: 11237)
[9999ef81] part of check-in [27ed4f7d] Extended pass InitCsets and underlying code with more log output geared towards memory introspection, and added markers for special locations. Extended my notes with general observations from the first test runs over my example CVS repositories. (check-in: [27ed4f7d] user: aku branch: trunk, size: 11364)
[eda30d7e] part of check-in [66235f24] Updated the copyright information of all files touched in the new year. (check-in: [66235f24] user: aku branch: trunk, size: 10806)
[97c1e143] part of check-in [9e1b461b] Broke package dependency cycle introduced when moving the cset load code from the InitCsets pass to the cset class. (check-in: [9e1b461b] user: aku branch: trunk, size: 10801)
[0ab46f96] part of check-in [49dd66f6] Moved the code loading changesets from state to its proper class. (check-in: [49dd66f6] user: aku branch: trunk, size: 10763)
[87cc4829] part of check-in [e288af39] Fluff: Renamed state methods use/reading/writing to usedb/use/extend for clarity. Updated all callers. Extended state module with code to dump the SQL statements it receives to a file for analysis. Extended the 'use' declarations of several passes. (check-in: [e288af39] user: aku branch: trunk, size: 11213)
[ced4d957] part of check-in [00bf8c19] The performance was still not satisfying, even with faster recomputing of successors. Doing it multiple times (Building the graph in each breaker and sort passes) eats time. Caching in memory blows the memory. Chosen solution: Cache this information in the database.

Created a new pass 'CsetDeps' which is run between 'InitCsets' and 'BreakRevCsetCycles' (i.e. changeset creation and first breaker pass). It computes the changeset dependencies from the file-level dependencies once and saves the result in the state, in the new table 'cssuccessor'. Now the breaker and sort passes can get the information quickly, with virtually no effort. The dependencies are recomputed incrementally when a changeset is split by one of the breaker passes, for its fragments and its predecessors.

The loop check is now trivial, and integrated into the successor computation, with the heavy lifting for the detailed analysis and reporting moved down into the type-dependent SQL queries. The relevant new method is 'loops'. Now that the loop check is incremental the pass based checks have been removed from the integrity module, and the option '--loopcheck' has been eliminated. For paranoia the graph setup and modification code got its loop check reinstated as an assert, redusing the changeset report code.

Renumbered the breaker and sort passes. A number of places, like graph setup and traversal, loading of changesets, etc. got feedback indicators to show their progress.

The selection of revision and symbol changesets for the associated breaker passes was a bit on the slow side. We now keep changeset lists sorted by type (during loading or general construction) and access them directly. (check-in: [00bf8c19] user: aku branch: trunk, size: 11217)

[f56c12b5] part of check-in [9c570550] Performance bugfix. nextmap/premap can still be performance killers and memory hogs. Moved the computation of sucessor changesets down to the type-dependent code (new methods) and the SQL database, i.e. the C level. In the current setup it was possible that the DB would deliver us millions of file-level dependency pairs which the Tcl level would then reduce to tens of actual changeset dependencies. Tcl did not cope well with that amount of data. Now the reduction happens in the query itself. A concrete example was a branch in the Tcl CVS generating nearly 9 million pairs, which reduced to roughly 200 changeset dependencies. This blew the memory out of the water and the converter ground to a halt, busily swapping. Ok, causes behind us, also added another index on 'csitem(iid)' to speed the search for changesets from the revisions, tags, and branches. (check-in: [9c570550] user: aku branch: trunk, size: 11181)
[39a67db2] part of check-in [b42cff97] Replaced the checks for self-referential changesets in the cycle breaker with a scheme in the changeset class doing checks when splitting a changeset, which is also called by the general changeset integrity code, after each pass. Extended log output at high verbosity levels. Thorough checking of the fragments a changeset is to be split into. (check-in: [b42cff97] user: aku branch: trunk, size: 11122)
[b1c3a78e] part of check-in [80b1e893] Renamed state table 'csrevision' to 'csitem' to reflect the new internals of changesets. Updated all places where it is used. (check-in: [80b1e893] user: aku branch: trunk, size: 11103)
[37a8f5de] part of check-in [27f093d2] More realignment of variable names with their content, in pass 5. (check-in: [27f093d2] user: aku branch: trunk, size: 11120)
[cec097c0] part of check-in [215d2f1a] Brought knowledge of the new types to the state definition, changed the creation of the initial changesets to use tags and branches. (check-in: [215d2f1a] user: aku branch: trunk, size: 11156)
[30453369] part of check-in [2e07cd71] Bugfix in the generation of the initial symbol changesets. Keep entries apart per line-of-development. (check-in: [2e07cd71] user: aku branch: trunk, size: 10948)
[3b39c5f0] part of check-in [8c6488de] Continued work on the integrity checks for changesets. Moved callers out of transactions. Two checks are already tripping on bad changesets made by InitCSets (pass 5). (check-in: [8c6488de] user: aku branch: trunk, size: 10779)
[47d97866] part of check-in [bf83201c] Outline for more integrity checks, focusing on the changesets. (check-in: [bf83201c] user: aku branch: trunk, size: 10787)
[217d875a] part of check-in [b679ca33] Code cleanup. Removed trailing whitespace across the board. (check-in: [b679ca33] user: aku branch: trunk, size: 10624)
[940095b7] part of check-in [65be27aa] Modified the API for the construction of changesets a bit, now allowing their construction with the correct id, instead of correcting it later. Updated pass 5 to use this, and fixed bug where the id counter for changesets was left uninitialized, allowing the improper generation of duplicate ids. (check-in: [65be27aa] user: aku branch: trunk, size: 10626)
[10710518] part of check-in [96b7bfb8] Added convenience command to the state package when the sql returns a single row. Added more statistics about revisions, tags, branches, symbols, changesets to various passes. (check-in: [96b7bfb8] user: aku branch: trunk, size: 10614)
[bb434d64] part of check-in [341d96be] Bugfix. In pass 5, loading the changesets used the type codes instead of the type names. Modified the SQL selecting the data to return the proper names. (check-in: [341d96be] user: aku branch: trunk, size: 10577)
[960f3c67] part of check-in [24c0b662] Reworked the in-memory storage of changesets in pass 5 and supporting classes, and added loading of changesets from the persistent state for when the pass is skipped. (check-in: [24c0b662] user: aku branch: trunk, size: 10537)
[046ff5e2] part of check-in [08ebab80] Rewrote the algorithm for breaking internal dependencies to my liking. The complex part handling multiple splits has moved from the pass code to the changeset class itself, reusing the state computed for the first split. The state is a bit more complex to allow for its incremental update after a break has been done. Factored major pieces into separate procedures to keep the highlevel code readable. Added lots of official log output to help debugging in case of trouble. (check-in: [08ebab80] user: aku branch: trunk, size: 10315)
[8f42ee8f] part of check-in [95af789e] Oops. pass 5 is not complete. Missed the breaking of internal dependencies, this is done in this pass already. Extended pass _2_ and file revisions with code to save the branchchildren (possible dependencies), and pass 5 and changesets with the proper algorithm. From cvs2svn, works, do not truly like it, as it throws away and recomputes a lot of state after each split of a cset. Could update and reuse the state to perform all splits in one go. Will try that next, for now we have a working form in the code base. (check-in: [95af789e] user: aku branch: trunk, size: 11223)
[aae0715d] part of check-in [5f7acef8] Completed pass 5, computing the initial set of changesets. Defined persistent structure and filled out the long-existing placeholder class (project::rev). (check-in: [5f7acef8] user: aku branch: trunk, size: 9397)
[60ccdc28] part of check-in [54d1e353] Started on pass 5, computing the initial approximate set of project level revisions, aka 'ChangeSets'. Skeleton of the pass added. (check-in: [54d1e353] user: aku branch: trunk, size: 2863) Added