Fossil

Check-in [27ed4f7d]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Extended pass InitCsets and underlying code with more log output geared towards memory introspection, and added markers for special locations. Extended my notes with general observations from the first test runs over my example CVS repositories.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 27ed4f7dc3a0c032f255bc0d6a3734c00f68a65d
User & Date: aku 2008-02-16 06:46:41
Context
2008-02-17
02:06
Reworked the basic structure of pass InitCSets to keep memory consumption down. Now incremental creates, breaks, saves, and releases changesets, instead of piling them on before saving all at the end. Memory tracking confirms that this changes the accumulating mountain into a near-constant usage, with the expected spikes from the breaking. check-in: f46458d5 user: aku tags: trunk
2008-02-16
06:46
Extended pass InitCsets and underlying code with more log output geared towards memory introspection, and added markers for special locations. Extended my notes with general observations from the first test runs over my example CVS repositories. check-in: 27ed4f7d user: aku tags: trunk
06:45
Integrated memory tracking into the option processor for activation and configuration, and into the log system for use. The latter means that each actual output to the log is an introspection point. check-in: 7b71f647 user: aku tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to cvs2fossil.txt.

5
6
7
8
9
10
11














































12
13
14
15
16
17
18
..
35
36
37
38
39
40
41
42
*	Not yet able to handle the specification of multiple projects
	for one CVS repository. I.e. I can, for example, import all of
	tcllib, or a single subproject of tcllib, like tklib, but not
	multiple sub-projects in one go.

*	We have to look into the pass 'InitCsets' and hunt for the
	cause of the large amount of memory it is gobbling up.















































*	Look at the dependencies on external packages and consider
	which of them can be moved into the importer, either as a
	simple utility command, or wholesale.

	struct::list
		assign, map, reverse, filter
................................................................................
	struct::graph
		In toto

	snit
		In toto

	sqlite3
		In tota







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
..
81
82
83
84
85
86
87
88
*	Not yet able to handle the specification of multiple projects
	for one CVS repository. I.e. I can, for example, import all of
	tcllib, or a single subproject of tcllib, like tklib, but not
	multiple sub-projects in one go.

*	We have to look into the pass 'InitCsets' and hunt for the
	cause of the large amount of memory it is gobbling up.

	Results from the first look using the new memory tracking
	subsystem:

	(1) The general architecture, workflow, is a bit wasteful. All
	    changesets are generated and kept in memory before getting
	    persisted. This means that allocated memory piles up over
	    time, with later changesets pushing the boundaries. This
	    is made worse that some of the preliminary changesets seem
	    to require a lot of temporary memory as part of getting
	    broken down into the actual ones. InititializeBreakState
	    seems to be the culprit here. Its memory usage is possibly
	    quadratic in the number of items in the changeset.

	(2) A number of small inefficiencies. Like 'state eval' always
	    pulling the whole result into memory before processing it
	    with 'foreach'. Here potentially large lists.

	(3) We maintain an in-memory map from tagged items to their
	    changesets. While this is needed later in the sorting
	    passes during the creation this is wasted space. And also
	    wasted time, to maintain it during the creation and
	    breaking.

	Changes:

	(a) Re-architect to create, break, and persist changesets one
	    by one, completely releasing all associated in-memory data
	    before going to the next. Should be low-hanging fruit with
	    high impact, as we have all the necessary operations
	    already, just not in that order, and that alone should
	    already keep the pile from forming, making the spikes of
	    (2) more manageable.

	(b) Look into the smaller problems described in (2), and
	    especially (3). These should still be low-hanging fruit,
	    although of lesser effect than (a). For (3) disable the
	    map and its maintenace during construction, and put it
	    into a separate command, to be used when loading the
	    created changesets at the end.

	(c) With larger effect, but more difficult to achieve, go into
	    command 'InitializeBreakState' and the preceding
	    'internalsuccessors', and rearchitect it. Definitely not a
	    low-hanging fruit. Possibly also something we can skip if
	    doing (a) had a large enough effect.

*	Look at the dependencies on external packages and consider
	which of them can be moved into the importer, either as a
	simple utility command, or wholesale.

	struct::list
		assign, map, reverse, filter
................................................................................
	struct::graph
		In toto

	snit
		In toto

	sqlite3
		In toto

Changes to tools/cvs2fossil/lib/c2f_pinitcsets.tcl.

17
18
19
20
21
22
23

24
25
26
27
28
29
30
...
177
178
179
180
181
182
183


184
185
186
187
188
189


190
191
192
193


194


195
196
197
198
199
200
201
202
203
204
205


206


207



208
209
210
211
212
213
214

215
216
217
218
219
220
221
...
277
278
279
280
281
282
283

284
285
286
287
288
289
290
291
292
293
294
295
296

297
298
299
300
301
302
303
304
305

306
307
308
309
310
311
312
...
332
333
334
335
336
337
338



339
340
341
342
343
344
345
346
347
348
349
# # ## ### ##### ######## ############# #####################
## Requirements

package require Tcl 8.4                               ; # Required runtime.
package require snit                                  ; # OO system.
package require vc::tools::misc                       ; # Text formatting.
package require vc::tools::log                        ; # User feedback.

package require vc::fossil::import::cvs::repository   ; # Repository management.
package require vc::fossil::import::cvs::state        ; # State storage.
package require vc::fossil::import::cvs::integrity    ; # State integrity checks.
package require vc::fossil::import::cvs::project::rev ; # Project level changesets

# # ## ### ##### ######## ############# #####################
## Register the pass with the management
................................................................................

	# Note: We could have written this loop to create the csets
	#       early, extending them with all their revisions. This
	#       however would mean lots of (slow) method invokations
	#       on the csets. Doing it like this, late creation, means
	#       less such calls. None, but the creation itself.



	foreach {mid rid pid} [state run {
	    SELECT M.mid, R.rid, M.pid
	    FROM   revision R, meta M   -- R ==> M, using PK index of M.
	    WHERE  R.mid = M.mid
	    ORDER  BY M.mid, R.date
	}] {


	    if {$lastmeta != $mid} {
		if {[llength $revisions]} {
		    incr n
		    set  p [repository projectof $lastproject]


		    project::rev %AUTO% $p rev $lastmeta $revisions


		    set revisions {}
		}
		set lastmeta    $mid
		set lastproject $pid
	    }
	    lappend revisions $rid
	}

	if {[llength $revisions]} {
	    incr n
	    set  p [repository projectof $lastproject]


	    project::rev %AUTO% $p rev $lastmeta $revisions


	}




	log write 4 initcsets "Created [nsp $n {revision changeset}]"
	return
    }

    proc CreateSymbolChangesets {} {
	log write 3 initcsets {Create changesets based on symbols}


	# Tags and branches induce changesets as well, containing the
	# revisions they are attached to (tags), or spawned from
	# (branches).

	set n 0

................................................................................
	if {[llength $branches]} {
	    incr n
	    set  p [repository projectof $lastproject]
	    project::rev %AUTO% $p sym::branch $lastsymbol $branches
	}

	log write 4 initcsets "Created [nsp $n {symbol changeset}]"

	return
    }

    proc BreakInternalDependencies {} {
	# This code operates on the revision changesets created by
	# 'CreateRevisionChangesets'. As such it has to follow after
	# it, before the symbol changesets are made. The changesets
	# are inspected for internal conflicts and any such are broken
	# by splitting the problematic changeset into multiple
	# fragments. The results are changesets which have no internal
	# dependencies, only external ones.

	log write 3 initcsets {Break internal dependencies}

	set old [llength [project::rev all]]

	foreach cset [project::rev all] {
	    $cset breakinternaldependencies
	}

	set n [expr {[llength [project::rev all]] - $old}]
	log write 4 initcsets "Created [nsp $n {additional revision changeset}]"
	log write 4 initcsets Ok.

	return
    }

    proc PersistTheChangesets {} {
	log write 3 initcsets "Saving [nsp [llength [project::rev all]] {initial changeset}] to the persistent state"

	foreach cset [project::rev all] {
................................................................................
    namespace eval initcsets {
	namespace import ::vc::fossil::import::cvs::repository
	namespace import ::vc::fossil::import::cvs::state
	namespace import ::vc::fossil::import::cvs::integrity
	namespace eval project {
	    namespace import ::vc::fossil::import::cvs::project::rev
	}



	namespace import ::vc::tools::misc::*
	namespace import ::vc::tools::log
	log register initcsets
    }
}

# # ## ### ##### ######## ############# #####################
## Ready

package provide vc::fossil::import::cvs::pass::initcsets 1.0
return







>







 







>
>






>
>




>
>

>
>











>
>

>
>

>
>
>







>







 







>













>









>







 







>
>
>











17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
...
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
...
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
...
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
# # ## ### ##### ######## ############# #####################
## Requirements

package require Tcl 8.4                               ; # Required runtime.
package require snit                                  ; # OO system.
package require vc::tools::misc                       ; # Text formatting.
package require vc::tools::log                        ; # User feedback.
package require vc::tools::mem                        ; # Memory tracking.
package require vc::fossil::import::cvs::repository   ; # Repository management.
package require vc::fossil::import::cvs::state        ; # State storage.
package require vc::fossil::import::cvs::integrity    ; # State integrity checks.
package require vc::fossil::import::cvs::project::rev ; # Project level changesets

# # ## ### ##### ######## ############# #####################
## Register the pass with the management
................................................................................

	# Note: We could have written this loop to create the csets
	#       early, extending them with all their revisions. This
	#       however would mean lots of (slow) method invokations
	#       on the csets. Doing it like this, late creation, means
	#       less such calls. None, but the creation itself.

	log write 14 initcsets meta_begin
	mem::mark
	foreach {mid rid pid} [state run {
	    SELECT M.mid, R.rid, M.pid
	    FROM   revision R, meta M   -- R ==> M, using PK index of M.
	    WHERE  R.mid = M.mid
	    ORDER  BY M.mid, R.date
	}] {
	    log write 14 initcsets meta_next

	    if {$lastmeta != $mid} {
		if {[llength $revisions]} {
		    incr n
		    set  p [repository projectof $lastproject]
		    log write 14 initcsets meta_cset_begin
		    mem::mark
		    project::rev %AUTO% $p rev $lastmeta $revisions
		    log write 14 initcsets meta_cset_done
		    mem::mark
		    set revisions {}
		}
		set lastmeta    $mid
		set lastproject $pid
	    }
	    lappend revisions $rid
	}

	if {[llength $revisions]} {
	    incr n
	    set  p [repository projectof $lastproject]
	    log write 14 initcsets meta_cset_begin
	    mem::mark
	    project::rev %AUTO% $p rev $lastmeta $revisions
	    log write 14 initcsets meta_cset_done
	    mem::mark
	}

	log write 14 initcsets meta_done
	mem::mark

	log write 4 initcsets "Created [nsp $n {revision changeset}]"
	return
    }

    proc CreateSymbolChangesets {} {
	log write 3 initcsets {Create changesets based on symbols}
	mem::mark

	# Tags and branches induce changesets as well, containing the
	# revisions they are attached to (tags), or spawned from
	# (branches).

	set n 0

................................................................................
	if {[llength $branches]} {
	    incr n
	    set  p [repository projectof $lastproject]
	    project::rev %AUTO% $p sym::branch $lastsymbol $branches
	}

	log write 4 initcsets "Created [nsp $n {symbol changeset}]"
	mem::mark
	return
    }

    proc BreakInternalDependencies {} {
	# This code operates on the revision changesets created by
	# 'CreateRevisionChangesets'. As such it has to follow after
	# it, before the symbol changesets are made. The changesets
	# are inspected for internal conflicts and any such are broken
	# by splitting the problematic changeset into multiple
	# fragments. The results are changesets which have no internal
	# dependencies, only external ones.

	log write 3 initcsets {Break internal dependencies}
	mem::mark
	set old [llength [project::rev all]]

	foreach cset [project::rev all] {
	    $cset breakinternaldependencies
	}

	set n [expr {[llength [project::rev all]] - $old}]
	log write 4 initcsets "Created [nsp $n {additional revision changeset}]"
	log write 4 initcsets Ok.
	mem::mark
	return
    }

    proc PersistTheChangesets {} {
	log write 3 initcsets "Saving [nsp [llength [project::rev all]] {initial changeset}] to the persistent state"

	foreach cset [project::rev all] {
................................................................................
    namespace eval initcsets {
	namespace import ::vc::fossil::import::cvs::repository
	namespace import ::vc::fossil::import::cvs::state
	namespace import ::vc::fossil::import::cvs::integrity
	namespace eval project {
	    namespace import ::vc::fossil::import::cvs::project::rev
	}
	namespace eval mem {
	    namespace import ::vc::tools::mem::mark
	}
	namespace import ::vc::tools::misc::*
	namespace import ::vc::tools::log
	log register initcsets
    }
}

# # ## ### ##### ######## ############# #####################
## Ready

package provide vc::fossil::import::cvs::pass::initcsets 1.0
return

Changes to tools/cvs2fossil/lib/c2f_prev.tcl.

131
132
133
134
135
136
137
138

139
140
141
142
143
144
145
...
164
165
166
167
168
169
170
171


172
173

174
175
176
177
178
179
180
....
1057
1058
1059
1060
1061
1062
1063


1064
1065
1066
1067
1068
1069
1070
....
1119
1120
1121
1122
1123
1124
1125


1126
1127
1128
1129
1130
1131
1132
....
1139
1140
1141
1142
1143
1144
1145


1146
1147
1148
1149
1150
1151
1152
....
1609
1610
1611
1612
1613
1614
1615

1616
1617
1618
1619

1620
1621
1622
1623

1624
1625
1626
1627
1628
1629
1630
1631
1632
    # item -> list (item)
    method nextmap {} {
	$mytypeobj successors tmp $myitems
	return [array get tmp]
    }

    method breakinternaldependencies {} {


	##
	## NOTE: This method, maybe in conjunction with its caller
	##       seems to be a memory hog, especially for large
	##       changesets, with 'large' meaning to have a 'long list
	##       of items, several thousand'. Investigate where the
	##       memory is spent and then look for ways of rectifying
	##       the problem.
................................................................................
	# b -> a).

	# Array of dependencies (parent -> child). This is pulled from
	# the state, and limited to successors within the changeset.

	array set dependencies {}
	$mytypeobj internalsuccessors dependencies $myitems
	if {![array size dependencies]} {return 0} ; # Nothing to break.



	log write 5 csets ...[$self str].......................................................


	# We have internal dependencies to break. We now iterate over
	# all positions in the list (which is chronological, at least
	# as far as the timestamps are correct and unique) and
	# determine the best position for the break, by trying to
	# break as many dependencies as possible in one go. When a
	# break was found this is redone for the fragments coming and
................................................................................
	}]]
    }

    # var(dv) = dict (revision -> list (revision))
    typemethod internalsuccessors {dv revisions} {
	upvar 1 $dv dependencies
	set theset ('[join $revisions {','}]')



	# See 'successors' below for the main explanation of
	# the various cases. This piece is special in that it
	# restricts the successors we look for to the same set of
	# revisions we start from. Sensible as we are looking for
	# changeset internal dependencies.

................................................................................
	# will greatly reduces the risk of getting far separated
	# revisions of the same file into one changeset.

	# We allow revisions to be far apart in time in the same
	# changeset, but in turn need the pseudo-dependencies to
	# handle this.



	array set fids {}
	foreach {rid fid} [state run [subst -nocommands -nobackslashes {
	    SELECT R.rid, R.fid
            FROM   revision R
            WHERE  R.rid IN $theset
	}]] { lappend fids($fid) $rid }

................................................................................
		    if {[info exists dep($b,$a)]} continue
		    lappend dependencies($a) $b
		    set dep($a,$b) .
		    set dep($b,$a) .
		}
	    }
	}


	return
    }

    # result = 4-list (itemtype itemid nextitemtype nextitemid ...)
    typemethod loops {revisions} {
	# Note: Tags and branches cannot cause the loop. Their id's,
	# being of a fundamentally different type than the revisions
................................................................................
	namespace import ::vc::tools::log
	log register csets

	# Set up the helper singletons
	namespace eval rev {
	    namespace import ::vc::fossil::import::cvs::state
	    namespace import ::vc::fossil::import::cvs::integrity

	}
	namespace eval sym::tag {
	    namespace import ::vc::fossil::import::cvs::state
	    namespace import ::vc::fossil::import::cvs::integrity

	}
	namespace eval sym::branch {
	    namespace import ::vc::fossil::import::cvs::state
	    namespace import ::vc::fossil::import::cvs::integrity

	}
    }
}

# # ## ### ##### ######## ############# #####################
## Ready

package provide vc::fossil::import::cvs::project::rev 1.0
return







|
>







 







|
>
>


>







 







>
>







 







>
>







 







>
>







 







>




>




>









131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
...
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
....
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
....
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
....
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
....
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
    # item -> list (item)
    method nextmap {} {
	$mytypeobj successors tmp $myitems
	return [array get tmp]
    }

    method breakinternaldependencies {} {
	log write 14 csets {[$self str] BID}
	vc::tools::mem::mark
	##
	## NOTE: This method, maybe in conjunction with its caller
	##       seems to be a memory hog, especially for large
	##       changesets, with 'large' meaning to have a 'long list
	##       of items, several thousand'. Investigate where the
	##       memory is spent and then look for ways of rectifying
	##       the problem.
................................................................................
	# b -> a).

	# Array of dependencies (parent -> child). This is pulled from
	# the state, and limited to successors within the changeset.

	array set dependencies {}
	$mytypeobj internalsuccessors dependencies $myitems
	if {![array size dependencies]} {
	    return 0
	} ; # Nothing to break.

	log write 5 csets ...[$self str].......................................................
	vc::tools::mem::mark

	# We have internal dependencies to break. We now iterate over
	# all positions in the list (which is chronological, at least
	# as far as the timestamps are correct and unique) and
	# determine the best position for the break, by trying to
	# break as many dependencies as possible in one go. When a
	# break was found this is redone for the fragments coming and
................................................................................
	}]]
    }

    # var(dv) = dict (revision -> list (revision))
    typemethod internalsuccessors {dv revisions} {
	upvar 1 $dv dependencies
	set theset ('[join $revisions {','}]')

	log write 14 cset internalsuccessors

	# See 'successors' below for the main explanation of
	# the various cases. This piece is special in that it
	# restricts the successors we look for to the same set of
	# revisions we start from. Sensible as we are looking for
	# changeset internal dependencies.

................................................................................
	# will greatly reduces the risk of getting far separated
	# revisions of the same file into one changeset.

	# We allow revisions to be far apart in time in the same
	# changeset, but in turn need the pseudo-dependencies to
	# handle this.

	log write 14 cset pseudo-internalsuccessors

	array set fids {}
	foreach {rid fid} [state run [subst -nocommands -nobackslashes {
	    SELECT R.rid, R.fid
            FROM   revision R
            WHERE  R.rid IN $theset
	}]] { lappend fids($fid) $rid }

................................................................................
		    if {[info exists dep($b,$a)]} continue
		    lappend dependencies($a) $b
		    set dep($a,$b) .
		    set dep($b,$a) .
		}
	    }
	}

	log write 14 cset complete
	return
    }

    # result = 4-list (itemtype itemid nextitemtype nextitemid ...)
    typemethod loops {revisions} {
	# Note: Tags and branches cannot cause the loop. Their id's,
	# being of a fundamentally different type than the revisions
................................................................................
	namespace import ::vc::tools::log
	log register csets

	# Set up the helper singletons
	namespace eval rev {
	    namespace import ::vc::fossil::import::cvs::state
	    namespace import ::vc::fossil::import::cvs::integrity
	    namespace import ::vc::tools::log
	}
	namespace eval sym::tag {
	    namespace import ::vc::fossil::import::cvs::state
	    namespace import ::vc::fossil::import::cvs::integrity
	    namespace import ::vc::tools::log
	}
	namespace eval sym::branch {
	    namespace import ::vc::fossil::import::cvs::state
	    namespace import ::vc::fossil::import::cvs::integrity
	    namespace import ::vc::tools::log
	}
    }
}

# # ## ### ##### ######## ############# #####################
## Ready

package provide vc::fossil::import::cvs::project::rev 1.0
return