Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
by
Krerkchai Kusolchu
Student Visitor
Innovative Computing Laboratory
Department of Electrical Engineering and
Computer Science
University of Tennessee
2010
Table of Contents
Acknowledgement 1
Abstract 1
Objective 1
Introduction 2
About Organization 2
Background 3
Hardware Locality 5
What is Hardware Locality? 5
Design and interface of Hardware Locality. 7
A. Abstracting the Hardware Topology
B. Abstracting the Hardware Topology
Application and Performance Example 6
Affinity-aware Thread Scheduling
Implementation, Problem, Solving & Result 1
9
Conclusion 2
1
References 2
2
Appendix 2
3
Acknowledgement
Abstract
Objective
Background
ScaLAPACK Top500
in the tree.
4. Implementation
int dplasma_hwlock_nb_levels();
dplasma_hwlock_nb_cores(0, 0) = 8
dplasma_hwlock_nb_cores(0, 4) = 8
dplasma_hwlock_cache_size(42, 1) = 5118
KB,
dplasma_hwlock_cache_size(44, 2) =
512 KB
5. Conclusion
By apply and integrating hardware architecture
knowledge to the Distributed PLASMA project using
hwloc. We can use the hwloc to take advantage of
the hardware architecture so that the schedule
related threads can by schedule in a portable way,
consulting the topology to determine which
processing units share cache memory. To benefit
from cache memory reuse and also to steal work
from the most local cores to benefit from shared
cache memory. So we can improve the
performance of Distributed PLASMA.
References
{
if(set=1){
hwloc_topology_init(&topology);
hwloc_topology_ignore_type_keep_structure(topology,
HWLOC_OBJ_NODE);
hwloc_topology_ignore_type_keep_structure(topology,
HWLOC_OBJ_SOCKET);
hwloc_topology_load(topology);
}
else if
hwloc_topology_destroy(topology);
else
return(0);
}
if(hwloc_cpuset_isset(obj->cpuset, master_id)){
return hwloc_cpuset_weight(obj->cpuset);
}
}
return 0;
}
real_cores = hwloc_get_nbobjs_by_type(topology,
HWLOC_OBJ_CORE);
cores = real_cores;
div = cores;
if(processor_id/cores>0){
while(processor_id){
if(processor_id%div==0){
processor_id = count;
break;
}
count++;
div++;
if(real_cores==count) count = 0;
}
}
for(i = 0; i < hwloc_get_nbobjs_by_depth(topology, level);
i++){
hwloc_obj_t obj = hwloc_get_obj_by_depth(topology,
level, i);
if(hwloc_cpuset_isset(obj->cpuset, processor_id)){
return hwloc_cpuset_first(obj->cpuset);
}
}
return -1;
}
if(hwloc_cpuset_isset(obj->cpuset, master_id)){
return hwloc_cpuset_weight(obj->cpuset);
}
}
return 0;
}
while (obj) {
if(obj->depth == level){
if(obj->type == HWLOC_OBJ_CACHE){
return obj->attr->cache.memory_kB;
}
else {
return 0;
}
}
obj = obj->father;
}
while (obj) {
if (obj==obj2)
return jump = count+count;
obj = obj->father;
obj2 = obj2->father;
count++;
}
}
}